9 Feb 2009

Converting accentuated characters to plain ASCII

Posted by ged

Today, I wanted to improve our blog-title-to-permalink function, so that (French) accentuated characters are not simply stripped but rather converted to their non accentuated version. For example, “é” would be converted to “e”.

After some googling and (slightly) tweaking what I found, here is the function I use:

noaccents_table = ''.join(map(chr, range(192))) + \
"AAAAAAACEEEEIIIIDNOOOOOxOUUUUYTsaaaaaaaceeeeiiiidnooooo/ouuuuyty"
def latin1_to_ascii(u_str):
    return u_str.encode('latin1', 'replace').translate(noaccents_table)

As you can see, it takes a unicode string as argument. Here how you use it:

>>> latin1_to_ascii(u'évidemment')
'evidemment'

Note for later: if I ever need to do it in a more generalized way (not only for latin1), the iconv module (http://pypi.python.org/pypi/iconv) might (or might not) be useful.

Tags:

Leave a Reply

Message: