Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
462 views
in Technique[技术] by (71.8m points)

.net - Calling ToList() on ConcurrentDictionary<TKey, TValue> while adding items

I've run into an interesting issue. Knowing that the ConcurrentDictionary<TKey, TValue> is safely enumerable while being modified, with the (in my case) unwanted side-effect of iterating over elements that may disappear or appear multiple times, I decided to create a snapshot myself, using ToList(). Since ConcurrentDictionary<TKey, TValue> also implements ICollection<KeyValuePair<TKey, TValue>>, this causes the List(IEnumerable<T> collection) to be used, which in turn creates an array in the current size of the dictionary using the current item Count, then attempts to copy over the items using ICollection<T>.CopyTo(T[] array, int arrayIndex), calling into its ConcurrentDictionary<TKey, TValue> implementation, and finally throwing an ArgumentException if elements are added to the dictionary in the meantime.

Locking all over would kill the point of using the collection as it is, so my options seem to be to either keep catching the exception and retrying (which is definitely not the right answer to the problem), or to implement my own version of ToList() specialized for this issue (but then again, simply growing a list then possibly trimming it to the right size for a few elements seems like an overkill, and using a LinkedList would decrease indexing performance).

In addition, it seems like adding certain LINQ methods that create some sort of a buffer in the background (such as OrderBy) do seem to mend the problem at the cost of performance, but the bare ToList() obviously does not, and it's not worth "augmenting" it with another method when no additional functionality is needed.

Could this be an issue with any concurrent collection?

What would be a reasonable workaround to keep performance hits to the minimum while creating such a snapshot? (Preferably at the end of some LINQ magic.)

Edit:

After looking into it I can confirm, ToArray() (to think that I just passed by it yesterday) really does solve the snapshot problem as long as it's just that, a simple snapshot, it does not help when additional functionality is required before taking said snapshot (such as filtering, sorting), and a list/array is still needed at the end. (In this case, an additional call is required, creating the new collection all over again.)

I failed to point out that the snapshot may or may not need to go through these modifications, so it should be taken at the end, preferably, so I'd add this to the questions.

(Also, if anyone has a better idea for a title, do tell.)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Let's answer the broad over-shadowing question here for all the concurrent types:

If you split up an operation that deals with the internals in multiple steps, where all the steps must "be in sync", then yes, definitively you will get crashes and odd results due to thread synchronization.

So if using .ToList() will first ask for .Count, then size an array, and then use foreach to grab the values and place in the list, then yes, definitively you will have the chance of the two parts getting a different number of elements.

To be honest I wish some of those concurrent types did not try to pretend they were normal collections by implementing a lot of those interfaces but alas, that's how it is.

Can you fix your code, now that you know about the issue?

Yes you can, you must take a look at the type documentation and see if it provides any form of snapshotting mechanism that isn't prone to the above mentioned problems.

Turns out ConcurrentDictionary<TKey, TValue> implements .ToArray(), which is documented with:

A new array containing a snapshot of key and value pairs copied from the System.Collections.Concurrent.ConcurrentDictionary.

(my emphasis)

How is .ToArray() currently implemented?

Using locks, see line 697.

So if you feel locking the entire dictionary to get a snapshot is too costly I would question the act of grabbing a snapshot of its contents to begin with.

Additionally, the .GetEnumerator() method follows some of the same rules, from the documentation:

The enumerator returned from the dictionary is safe to use concurrently with reads and writes to the dictionary, however it does not represent a moment-in-time snapshot of the dictionary. The contents exposed through the enumerator may contain modifications made to the dictionary after GetEnumerator was called.

(again, my emhpasis)

So while .GetEnumerator() won't crash, it may not produce the results you want.

Depending on timing, neither may .ToArray(), so it all depends.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...