c - Finding out the duplicate element in an array

Question

Welcome To Ask or Share your Answers For Others

c - Finding out the duplicate element in an array

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

c - Finding out the duplicate element in an array

There is an array of size n and the elements contained in the array are between 1 and n-1 such that each element occurs once and just one element occurs more than once. We need to find this element.

Though this is a very FAQ, I still haven't found a proper answer. Most suggestions are that I should add up all the elements in the array and then subtract from it the sum of all the indices, but this won't work if the number of elements is very large. It will overflow. There have also been suggestions regarding the use of XOR gate dup = dup ^ arr[i] ^ i, which are not clear to me.

I have come up with this algorithm which is an enhancement of the addition algorithm and will reduce the chances of overflow to a great extent!

for i=0 to n-1
  begin :
    diff = A[i] - i;
    sum  = sum + diff;
  end

diff contains the duplicate element, but using this method I am unable to find out the index of the duplicate element. For that I need to traverse the array once more which is not desirable. Can anyone come up with a better solution that does not involve the addition method or the XOR method works in O(n)?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

1.4m articles

1.4m replys

5 comments

57.0k users

Most popular tags

javascript python c# java How android c++ php ios html sql r c node.js .net iphone asp.net css reactjs jquery ruby What Android objective mysql linux Is git Python windows Why regex angular swift amazon excel algorithm macos Java visual how bash Can multithreading PHP Using scala angularjs typescript apache spring performance postgresql database flutter json rust arrays C# dart vba django wpf xml vue.js In go Get google jQuery xcode jsf http Google mongodb string shell oop powershell SQL C++ security assembly docker Javascript Android: Does haskell Convert azure debugging delphi vb.net Spring datetime pandas oracle math Django

联盟问答网站-Union QA website

Xstack问答社区

生活宝问答社区

OverStack问答社区

Ostack问答社区

在这了问答社区

在哪了问答社区

Xstack问答社区

无极谷问答社区

TouSu问答社区

SQlite问答社区

Qi-U问答社区

MLink问答社区

Jonic问答社区

Jike问答社区

16892问答社区

Vigges问答社区

55276问答社区

OGeek问答社区

深圳家问答社区

深圳家问答社区

深圳家问答社区

Vigges问答社区

Vigges问答社区

在这了问答社区

DevDocs API Documentations

Xstack问答社区

生活宝问答社区

OverStack问答社区

Ostack问答社区

在这了问答社区

在哪了问答社区

Xstack问答社区

无极谷问答社区

TouSu问答社区

SQlite问答社区

Qi-U问答社区

MLink问答社区

Jonic问答社区

Jike问答社区

16892问答社区

Vigges问答社区

55276问答社区

OGeek问答社区

深圳家问答社区

深圳家问答社区

深圳家问答社区

Vigges问答社区

Vigges问答社区

在这了问答社区

在这了问答社区

DevDocs API Documentations

Xstack问答社区

生活宝问答社区

OverStack问答社区

Ostack问答社区

在这了问答社区

在哪了问答社区

Xstack问答社区

无极谷问答社区

TouSu问答社区

SQlite问答社区

Qi-U问答社区

MLink问答社区

Jonic问答社区

Jike问答社区

16892问答社区

Vigges问答社区

55276问答社区

OGeek问答社区

深圳家问答社区

深圳家问答社区

深圳家问答社区

Vigges问答社区

Vigges问答社区

在这了问答社区

DevDocs API Documentations

广告位招租

深蓝 · Answer 1 · 2021-10-17T00:13:18+0000

There are many ways that you can think about this problem, depending on the constraints of your problem description.

If you know for a fact that exactly one element is duplicated, then there are many ways to solve this problem. One particularly clever solution is to use the bitwise XOR operator. XOR has the following interesting properties:

XOR is associative, so (x ^ y) ^ z = x ^ (y ^ z)
XOR is commutative: x ^ y = y ^ x
XOR is its own inverse: x ^ y = 0 iff x = y
XOR has zero as an identity: x ^ 0 = x

Properties (1) and (2) here mean that when taking the XOR of a group of values, it doesn't matter what order you apply the XORs to the elements. You can reorder the elements or group them as you see fit. Property (3) means that if you XOR the same value together multiple times, you get back zero, and property (4) means that if you XOR anything with 0 you get back your original number. Taking all these properties together, you get an interesting result: if you take the XOR of a group of numbers, the result is the XOR of all numbers in the group that appear an odd number of times. The reason for this is that when you XOR together numbers that appear an even number of times, you can break the XOR of those numbers up into a set of pairs. Each pair XORs to 0 by (3), and th combined XOR of all these zeros gives back zero by (4). Consequently, all the numbers of even multiplicity cancel out.

To use this to solve the original problem, do the following. First, XOR together all the numbers in the list. This gives the XOR of all numbers that appear an odd number of times, which ends up being all the numbers from 1 to (n-1) except the duplicate. Now, XOR this value with the XOR of all the numbers from 1 to (n-1). This then makes all numbers in the range 1 to (n-1) that were not previously canceled out cancel out, leaving behind just the duplicated value. Moreover, this runs in O(n) time and only uses O(1) space, since the XOR of all the values fits into a single integer.

In your original post you considered an alternative approach that works by using the fact that the sum of the integers from 1 to n-1 is n(n-1)/2. You were concerned, however, that this would lead to integer overflow and cause a problem. On most machines you are right that this would cause an overflow, but (on most machines) this is not a problem because arithmetic is done using fixed-precision integers, commonly 32-bit integers. When an integer overflow occurs, the resulting number is not meaningless. Rather, it's just the value that you would get if you computed the actual result, then dropped off everything but the lowest 32 bits. Mathematically speaking, this is known as modular arithmetic, and the operations in the computer are done modulo 2³². More generally, though, let's say that integers are stored modulo k for some fixed k.

Fortunately, many of the arithmetical laws you know and love from normal arithmetic still hold in modular arithmetic. We just need to be more precise with our terminology. We say that x is congruent to y modulo k (denoted x ≡_k y) if x and y leave the same remainder when divided by k. This is important when working on a physical machine, because when an integer overflow occurs on most hardware, the resulting value is congruent to the true value modulo k, where k depends on the word size. Fortunately, the following laws hold true in modular arithmetic:

For example:

If x ≡_k y and w ≡_k z, then x + w ≡_k y + z
If x ≡_k y and w ≡_k z, then xw ≡_k yz.

This means that if you want to compute the duplicate value by finding the total sum of the elements of the array and subtracting out the expected total, everything will work out fine even if there is an integer overflow because standard arithmetic will still produce the same values (modulo k) in the hardware. That said, you could also use the XOR-based approach, which doesn't need to consider overflow at all. :-)

If you are not guaranteed that exactly one element is duplicated, but you can modify the array of elements, then there is a beautiful algorithm for finding the duplicated value. This earlier SO question describes how to accomplish this. Intuitively, the idea is that you can try to sort the sequence using a bucket sort, where the array of elements itself is recycled to hold the space for the buckets as well.

If you are not guaranteed that exactly one element is duplicated, and you cannot modify the array of elements, then the problem is much harder. This is a classic (and hard!) interview problem that reportedly took Don Knuth 24 hours to solve. The trick is to reduce the problem to an instance of cycle-finding by treating the array as a function from the numbers 1-n onto 1-(n-1) and then looking for two inputs to that function. However, the resulting algorithm, called Floyd's cycle-finding algorithm, is extremely beautiful and simple. Interestingly, it's the same algorithm you would use to detect a cycle in a linked list in linear time and constant space. I'd recommend looking it up, since it periodically comes up in software interviews.

For a complete description of the algorithm along with an analysis, correctness proof, and Python implementation, check out this implementation that solves the problem.

Hope this helps!

Categories

c - Finding out the duplicate element in an array

c - Finding out the duplicate element in an array

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags