Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
578 views
in Technique[技术] by (71.8m points)

Sorting list of string with specific locale in python

I work on an application that uses texts from different languages, so, for viewing or reporting purposes, some texts (strings) need to be sorted in a specific language.

Currently I have a workaround messing with the global locale settings, which is bad, and I don't want to put it in production:

default_locale = locale.getlocale(locale.LC_COLLATE)

def sort_strings(strings, locale_=None):
    if locale_ is None:
        return sorted(strings)

    locale.setlocale(locale.LC_COLLATE, locale_)
    sorted_strings = sorted(strings, cmp=locale.strcoll)
    locale.setlocale(locale.LC_COLLATE, default_locale)

    return sorted_strings

The official python locale documentation explicitly says that saving and restoring is a bad idea, but does not give any suggestions: http://docs.python.org/library/locale.html#background-details-hints-tips-and-caveats

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You could use a PyICU's collator to avoid changing global settings:

import icu # PyICU

def sorted_strings(strings, locale=None):
    if locale is None:
       return sorted(strings)
    collator = icu.Collator.createInstance(icu.Locale(locale))
    return sorted(strings, key=collator.getSortKey)

Example:

>>> L = [u'sandwiches', u'angel delight', u'custard', u'éclairs', u'glühwein']
>>> sorted_strings(L)
['angel delight', 'custard', 'glühwein', 'sandwiches', 'éclairs']
>>> sorted_strings(L, 'en_US')
['angel delight', 'custard', 'éclairs', 'glühwein', 'sandwiches']

Disadvantage: dependency on PyICU library; the behavior is slightly different from locale.strcoll.


I don't know how to get locale.strxfrm function given a locale name without changing it globally. As a hack you could run your function in a different child process:

pool = multiprocessing.Pool()
# ...
pool.apply(locale_aware_sort, [strings, loc])

Disadvantage: might be slow, resource hungry


Using ordinary threading.Lock won't work unless you can control every place where locale aware functions (they are not limited to locale module e.g., re) could be called from multiple threads.


You could compile your function using Cython to synchronize access using GIL. GIL will make sure that no other Python code can be executed while your function is running.

Disadvantage: not pure Python


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...