Can you tweak script to remove 'bad names' as 'auto' is, and sort years in A-Z order?
It now deletes the default string "FIRST AUTHOR <EMAIL@...181...>, YEAR". Other strings may be blacklisted, too, but the proper solution is to remove them from .po files or to tweak the credit line format regex. E.g. currently it catches this line: "Notoj: 1) mi dankas Sebastian Cyprych pro liaj sugestoj [Antonio, 2008." Similarly, there might be lines which are ignored because they're not in the expected format.
Regards, ~~helix84
#!/usr/bin/python # # extract TRANSLATORS from .po files # # Ivan Masar <helix84@...150...>, 2008. # # This script extracts translator lines from .po files # It merges duplicate translator names to one and adds all years # found for that name from all files. Not optimized for speed. # # Input: none, reads .po files recursively from current directory # Output: created list of TRANSLATORS to stdout # processed .po files to stderr (that line may be removed)
import os, sys, re
credits = {} year = re.compile("[0-9]{4}")
def add_creditline(s): s = s[2:] index = s.find(",")
years = s[index:] # set of years l = set([years[m.start()+2:m.start()+6] for m in re.finditer(re.escape(", "), years)])
if credits.has_key(s[:index]): # if name already exists, add years to set credits[s[:index]] = credits[s[:index]].union(l) else: # else append new name credits[s[:index]] = l
def main(): creditline = re.compile(r"^# .*, [0-9]{4}.$")
# traverse looking for .po files from current directory for root, dirs, files in os.walk(".", topdown=False): for name in files: if name[-3:] == ".po": # debug: write .po file name to stderr sys.stderr.write(os.path.join(root, name)+'\n')
# read credit lines from .po f = open(os.path.join(root, name), "r") s = f.readline() while s != 'msgid ""\n' and s != '': s = f.readline() if creditline.match(s): add_creditline(s) f.close()
# remove unwanted entries del credits["FIRST AUTHOR <EMAIL@...181...>"]
# sort by translator name and print credit_names = credits.keys() credit_names = [(x.upper(), x) for x in credit_names] credit_names.sort()
for upper, key in credit_names: credits[key] = list(credits[key]) credits[key].sort() print key + ", " + ", ".join(credits[key]) + "."
if __name__ == "__main__": main()