Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
357 views
in Technique[技术] by (71.8m points)

php - Fastest implementation to do multiple string substitutions in Python

Is there any recommended way to do multiple string substitutions other than doing replace chaining on a string (i.e. text.replace(a, b).replace(c, d).replace(e, f)...)? How would you, for example, implement a fast function that behaves like PHP's htmlspecialchars in Python?

I compared (1) multiple replace method, (2) the regular expression method, and (3) Matt Anderson's method.

With n=10 runs, the results came up as follows:

On 100 characters:

TIME: 0 ms [ replace_method(str) ]
TIME: 5 ms [ regular_expression_method(str, dict) ]
TIME: 1 ms [ matts_multi_replace_method(list, str) ]

On 1000 characters:

TIME: 0 ms [ replace_method(str) ]
TIME: 3 ms [ regular_expression_method(str, dict) ]
TIME: 2 ms [ matts_multi_replace_method(list, str) ]

On 10000 characters:

TIME: 3 ms [ replace_method(str) ]
TIME: 7 ms [ regular_expression_method(str, dict) ]
TIME: 5 ms [ matts_multi_replace_method(list, str) ]

On 100000 characters:

TIME: 36 ms [ replace_method(str) ]
TIME: 46 ms [ regular_expression_method(str, dict) ]
TIME: 39 ms [ matts_multi_replace_method(list, str) ]

On 1000000 characters:

TIME: 318 ms [ replace_method(str) ]
TIME: 360 ms [ regular_expression_method(str, dict) ]
TIME: 320 ms [ matts_multi_replace_method(list, str) ]

On 3687809 characters:

TIME: 1.277524 sec [ replace_method(str) ]
TIME: 1.290590 sec [ regular_expression_method(str, dict) ]
TIME: 1.116601 sec [ matts_multi_replace_method(list, str) ]

So kudos to Matt for beating the multi replace method on a fairly large input string.

Anyone got ideas for beating it on a smaller string?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Something like the following maybe? Split the text into pieces with the first "from" item to be replaced, then recursively split each of those parts into sub-parts with the next "from" item to be replaced, and so on, until you've visited all your replacements. Then join with the "to" replacement item for each as recursive function completes.

A little hard to wrap your head around the following code perhaps (it was for me, and I wrote it), but it seems to function as intended. I didn't benchmark it, but I suspect it would be reasonably fast.

def multi_replace(pairs, text):
    stack = list(pairs)
    stack.reverse()
    def replace(stack, parts):
        if not stack:
            return parts
        # copy the stack so I don't disturb parallel recursions
        stack = list(stack) 
        from_, to = stack.pop()
        #print 'split (%r=>%r)' % (from_, to), parts
        split_parts = [replace(stack, part.split(from_)) for part in parts]
        parts = [to.join(split_subparts) for split_subparts in split_parts]
        #print 'join (%r=>%r)' % (from_, to), parts
        return parts
    return replace(stack, [text])[0]


print multi_replace(
    [('foo', 'bar'), ('baaz', 'foo'), ('quux', 'moop')], 
    'foobarbaazfooquuxquux')

for:

barbarfoobarmoopmoop

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...