Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
167 views
in Technique[技术] by (71.8m points)

python - What's exactly happening in infinite nested lists?

It's possible to create an infinite nested list in Python. That's clear and, although not popular and definitely not useful is a known fact.

>>> a = [0]
>>> a[0] = a
>>> a
[[...]]
>>> a[0] == a
True

My question is, what is happening here:

>>> a = [0]
>>> b = [0]
>>> a[0], b[0] = b, a
>>> a
[[[...]]]
>>> b
[[[...]]]
>>> a == b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: maximum recursion depth exceeded in cmp
>>> a[0] == b
True
>>> a[0][0] == b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: maximum recursion depth exceeded in cmp
>>> a[0][0][0] == b
True
>>> 

Which each way deeper, when I'm trying to understand it, I feel much more like my brain is going to explode. I see, that a contains b, that contains a and so on...

Now my questions about this one. Do we really have two lists here, or only one? How does a thing like this gets stored in memory? What could be a purpose to enable programers implement something so strange like this?

Please, don't treat this question super-serious. And don't forget, that programming can be fun sometimes.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Disclaimer: I don't use Python, so some things I say may be wrong. Python experts, feel free to correct me.

Great question. I think the central misconception (if I can't even call it that; it's perfectly reasonable how you arrived at the thought process you used) you're having that prompts you to ask the question is this:

When I write b[0] = a, it does not mean that a is in b. It means that b includes a reference that points to the thing that a points to.

The variables a and b themselves aren't even "things" themselves, and they themselves are also merely pointers to otherwise anonymous "things" in memory.

The concept of references is a major leap from the non-programming world, so let's step through your program with this in mind:

>>> a = [0]

You create a list that happens to have something in it (ignore that for now). What matters is it's a list. That list gets stored in memory. Let's say it's stored in memory location 1001. Then, the assignment = creates a variable a that the programming language allows you to use later. At this point, there's some list object in memory and a reference to it that you can access with the name a.

>>> b = [0]

This does the same thing for b. There is a new list that gets stored in memory location 1002. The programming language creates a reference b that you can use to refer to the memory location and in turn the list object.

>>> a[0], b[0] = b, a

This does two things that are identical, so let's focus on one: a[0] = b. What this does is pretty fancy. It first evaluates the right side of the equality, sees the variable b and fetches the corresponding object in memory (memory object #1002) since b is a reference to it. What happens on the left side is equally fancy. a is a variable that points to an list (memory object #1001), but memory object #1001 itself has a number of references of its own. Instead of those references having names like a and b, which you use, those references have numerical indices like 0. So, now, what this does is a pulls up memory object #1001, which is a pile of indexed references, and it goes to the reference with index 0 (previously, this reference pointed to the actual number 0, which is something you did in line 1) and then repoints that reference (i.e., the first and only reference in memory object #1001) to what the thing on the right side of the equation evaluates to. So now, object #1001's 0th reference points to object #1002.

>>> a
[[[...]]]
>>> b
[[[...]]]

This is just fanciness done by the programming language. When you just ask it to evaluate a, it pulls up the memory object (the list at location #1001), detects using its own magic that it's infinite and renders itself as such.

>>> a == b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: maximum recursion depth exceeded in cmp

The failure of this statement has to do with how Python does comparisons. When you compare an object to itself, it immediately evaluates to true. When you compare and object to another object, it uses "magic" to determine whether the equality should be true or false. In the case of lists in Python, it looks at every item in each list and checks if they are equal (in turn using the items' own equality-checking methods). So, when you try a == b. What it does is first dig up b (object #1002) and a (object #1001) and then realizes that they are different things in memory so goes to its recursive list checker. It does this by iterating through the two lists. Object #1001 has one element with index 0 that points to object #1002. Object #1002 has one element with index 0 that points to object #1001. Therefore, the program concludes that object #1001 and #1002 are equal if all their references point to the same thing, ergo if #1002 (what #1001's only reference points to) and #1001 (what #1002's only reference points to) are the same thing. This equality check can never stop. The same thing would happen in any list that doesn't stop. You could do c = [0]; d = [0]; c[0] = d; d[0] = c and a == c would raise the same error.

>>> a[0] == b
True

As I hinted to in the previous paragraph, this immediately resolves to true because Python takes a shortcut. It doesn't need to compare list contents because a[0] points to object #1002 and b points to object #1002. Python detects that they are identical in the literal sense (they are the same "thing") and doesn't even bother checking contents.

>>> a[0][0] == b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: maximum recursion depth exceeded in cmp

This goes back to being an error because a[0][0] ends up pointing to object #1001. The identity check fails and falls back on the recursive content check, which never ends.

>>> a[0][0][0] == b
True

Once again, a[0][0][0] points to object #1002, as does b. The recursive check is skipped and the comparison immediately returns true.


Higher level jibber jabber not directly related to your specific code snippet:

  • Since all there is is references referring to other objects, even though there is what appears to be "infinite" nesting, the object referred to by a (as I've called object #1001) and the object referred to be b (#1002) are both the same size in memory. And that size is actually incredibly small since all they are are lists that point to the respective other memory locations.
  • It's also worth note that in less "generous" languages, comparing two references with == returns true only if the memory objects they point to are the same in the sense that both references point to the same spot in memory. Java is an example of this. The stylistic convention that has emerged in such languages is to define a method/function on objects themselves (for Java, it is conventionally called equals()) to do custom equality testing. Python does this out of the box for lists. I don't know about Python in particular, but at least in Ruby, == is overloaded in the sense that when you do someobject == otherobject, it actually calls a method called == on someobject (that you can overwrite). In theory, there'd be nothing stopping you from making someobject == otherobject return something other than a boolean.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...