9 Feb 2009
Converting accentuated characters to plain ASCII
Today, I wanted to improve our blog-title-to-permalink function, so that (French) accentuated characters are not simply stripped but rather converted to their non accentuated version. For example, “é” would be converted to “e”.
After some googling and (slightly) tweaking what I found, here is the function I use:
noaccents_table = ''.join(map(chr, range(192))) + \
"AAAAAAACEEEEIIIIDNOOOOOxOUUUUYTsaaaaaaaceeeeiiiidnooooo/ouuuuyty"
def latin1_to_ascii(u_str):
return u_str.encode('latin1', 'replace').translate(noaccents_table)
As you can see, it takes a unicode string as argument. Here how you use it:
>>> latin1_to_ascii(u'évidemment') 'evidemment'
Note for later: if I ever need to do it in a more generalized way (not only for latin1), the iconv module (http://pypi.python.org/pypi/iconv) might (or might not) be useful.