Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
266 views
in Technique[技术] by (71.8m points)

theory - Algorithm to print out a shuffled list, in-place and with O(1) memory

After reading this question I started to wonder: is it possible to have a shuffling algorithm which does not modify or copy the original list?

To make it clear:

Imagine you are given a list of objects. The list size can be arbitrary, but assume it's pretty large (say, 10,000,000 items). You need to print out the items of the list in random order, and you need to do it as fast as possible. However, you should not:

  • Copy the original list, because it's very large and copying would waste a LOT of memory (probably hitting the limits of available RAM);
  • Modify the original list, because it's sorted in some way and some other part later on depends on it being sorted.
  • Create an index list, because, again, the list is very large and copying takes all too much time and memory. (Clarification: this is meant any other list, which has the same number of elements as the original list).

Is this possible?

Added: More clarifications.

  1. I want the list to be shuffled in true random way with all permutations equally likely (of course, assuming we have a proper Rand() function to start with).
  2. Suggestions that I make a list of pointers, or a list of indices, or any other list that would have the same number of elements as the original list, is explicitly deemed as inefficient by the original question. You can create additional lists if you want, but they should be serious orders of magnitude smaller than the original list.
  3. The original list is like an array, and you can retrieve any item from it by its index in O(1). (So no doubly-linked list stuff, where you have to iterate through the list to get to your desired item.)

Added 2: OK, let's put it this way: You have a 1TB HDD filled with data items, each 512 bytes large (a single sector). You want to copy all this data to another 1TB HDD while shuffling all the items. You want to do this as fast as possible (single pass over data, etc). You have 512MB of RAM available, and don't count on swap. (This is a theoretical scenario, I don't have anything like this in practice. I just want to find the perfect algorithm.item.)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Well it depends a bit on what kind of randomness you except for the shuffling, i.e. should all shufflings be as probable, or can the distribution be skewed.

There are mathematical ways to produce "random-looking" permutations of N integers, so if P is such a permutation from 0..N-1 to 0..N-1, you can just iterate x from 0 to N-1 and output list item L(P(x)) instead of L(x) and you have obtained a shuffling. Such permutations can be obtained e.g. using modular arithmetics. For example, if N is prime, P(x)=(x * k) mod N is a permutation for any 0 < k < N (but maps 0 to 0). Similary for a prime N, for example P(x)=(x^3) mod N should be a permutation (but maps 0 to 0 and 1 to 1). This solution can be easily expanded to non-prime N by selecting the least prime above N (call it M), permute upto M, and discard the permuted indices above N (similary below).

It should be noted that modular exponentiation is the basis for many cryptographic algorithms (e.g. RSA, Diffie-Hellman) and is considered a strongly pseudorandom operation by the experts in the field.

Another easy way (not requiring prime numbers) is first to expand the domain so that instead of N you consider M where M is the least power of two above N. So e.g. if N=12 you set M=16. Then you use bijective bit operations, e.g.

P(x) = ((x ^ 0xf) ^ (x << 2) + 3) & 0xf

Then when you output your list, you iterate x from 0 to M-1 and output L(P(x)) only if P(x) is actually < N.

A "true, unbiased random" solution can be constructed by fixing a cryptographically strong block cipher (e.g. AES) and a random key (k) and then iterating the sequence

AES(k, 0), AES(k, 1), ...

and outputting the corresponding item from the sequence iff AES(k,i) < N. This can be done in constant space (the internal memory required by the cipher) and is indistinguishable from a random permutation (due to the cryptographic properties of the cipher) but is obviously very slow. In the case of AES, you would need to iterate until i = 2^128.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...