Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
217 views
in Technique[技术] by (71.8m points)

python - Replacing the empty strings in a string

I accidentally found that in python, an operation of the form

string1.join(string2)

Can be equivalently expressed as

string2.replace('', string1)[len(string1):-len(string1)]

Furthermore, after trying timeit with a few different sized inputs, this weird way to join seems to be more than twice as fast.

  1. Why should the join method be slower?
  2. Is replacing the empty string like this a safe/well-defined thing to do?
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

So first of all, let's break down why this works.

>>> string1 = "foo"
>>> string2 = "bar"
>>> string1.join(string2)
'bfooafoor'

This is the operation of putting string1 between every item (character) of string2.

So replacing the empty string does something kind of interesting, it counts the gap between empty characters as the empty string and therefore does essentially the same task, except with an extra separator at the start and end:

>>> string2.replace('', string1)
'foobfooafoorfoo'

So slicing out these produces the same result as str.join():

>>> string2.replace('', string1)[len(string1):-len(string1)]
'bfooafoor'

Obviously, this solution is much, much less readable than str.join(), and so I'd always recommend against it. str.join() has also been developed to be efficient on all platforms. Replacing the empty string might be far less efficient on some versions of Python (I don't know if that's the case, but it's a possibility - just as repeated concatenation is reasonably fast in CPython, but that's not necessarily the case elsewhere.)

I can't even find anything in the documentation that suggests that this behaviour of replacing the empty string should function this way, the docs for str.replace() simply say:

Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.

I see no reason why we should presume that the gaps in between letters should count as an occurrence of the empty string (arguably, you could fit infinite empty strings anywhere in the string), and as such, relying on this behaviour might be a bad idea.

This operation is also pretty rare - it's more common to have a sequence of strings to join together - joining individual characters of a string isn't something I have personally had to do often.

Interestingly, this x.replace("", y) appears to be special cased in the Python source:

2347 /* Algorithms for different cases of string replacement */
2348
2349 /* len(self)>=1, from="", len(to)>=1, maxcount>=1 */
2350 Py_LOCAL(PyStringObject *)
2351 replace_interleave(PyStringObject *self,
2352 const char *to_s, Py_ssize_t to_len,
2353 Py_ssize_t maxcount)
2354 {
...

It may well be this special casing causes it to perform well. Again, as it's not mentioned in the documentation, this is an implementation detail, and assuming it will work as quickly (or at all) in other Python versions would be a mistake.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...