Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
340 views
in Technique[技术] by (71.8m points)

.net - String immutability in C#

I was curious how the StringBuilder class is implemented internally, so I decided to check out Mono's source code and compare it with Reflector's disassembled code of the Microsoft's implementation. Essentially, Microsoft's implementation uses char[] to store a string representation internally, and a bunch of unsafe methods to manipulate it. This is straightforward and did not raise any questions. But I was confused, when I found that Mono uses a string inside StringBuilder:

private int _length;
private string _str;

The first thought was: "What a senseless StringBuilder". But then I figured out that it is possible to mutate a string using pointers:

public StringBuilder Append (string value) 
{
     // ...
     String.CharCopy (_str, _length, value, 0, value.Length);
}

internal static unsafe void CharCopy (char *dest, char *src, int count) 
{
    // ...
    ((short*)dest) [0] = ((short*)src) [0]; dest++; src++;
}    

I used to program in C/C++ a little, so I can't say that this code confused me much, but I thought that strings are completely immutable (i.e there is absolutely no way to mutate it). So the actual questions are:

  • Can I create a completely immutable type?
  • Is there any reason to use such code apart from performance concerns? (unsafe code to change immutable types)
  • Are strings then inherently thread-safe or not?
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Can i create a completely immutable type?

You can create a type where the CLR enforces immutability on it. You can then use "unsafe" to turn off the CLR enforcement mechanisms. That's why "unsafe" is called "unsafe" - because it turns off the safety system. In unsafe code every single byte of memory in the process can be writable if you try hard enough, including both the immutable bytes and the code in the CLR which enforces immutability.

You can also use Reflection to break immutability. Both Reflection and unsafe code require an extremely high level of trust to be granted.

Is there any reason to use such code apart from performance concerns?

Sure, there are lots of reasons to use immutable data structures. Immutable data structures rock. Some good reasons to use immutable data structures:

  • immutable data structures are easier to reason about than mutable data structures. When you ask "is this list empty?" and you get an answer then you know that answer is correct not just now, but forever. With mutable data structures you cannot actually ask "is this list empty?" All you can ask is "is this list empty right now?" and then the answer logically answers the question "was this list empty at some point in the past?"

The fact that the answer to a question about an immutable type stays true forever has security implications. Suppose you have code like this:

void Frob(Bar bar)
{
    if (!IsSafe(bar)) throw something;
    DoSomethingDangerous(bar);
}

If Bar is a mutable type then there is a race condition here; bar could be made unsafe on another thread after the check but before something dangerous happens. If Bar is an immutable type then the answer to the question stays the same throughout, which is much safer. (Imagine if you could mutate a string containing a path after the security check but before the file was opened, for example.)

  • methods which take immutable data structures as their arguments and return them as their results and perform no side effects are called "pure methods". Pure methods can be memoized, which trades increased memory use for increased speed, often enormously increased speed.

  • immutable data structures can often be used on multiple threads simultaneously without locking. Locking is there to prevent creation of inconsistent state of an object in the face of a mutation, but immutable objects don't have mutations. (Some so-called immutable data structures are logically immutable but actually do mutations inside themselves; imagine for example a lookup table which does not change its contents, but does reorganize its internal structure if it can deduce what the next query is likely to be. Such a data structure would not be automatically threadsafe.)

  • immutable data structures that efficiently re-use their internal parts when a new structure is built from an old one make it easy to "take a snapshot" of the state of a program without wasting lots of memory. That makes undo-redo operations trivial to implement. It makes it easier to write debugging tools that can show you how you got to a particular program state.

  • and so on.

Are strings then inherently thread-safe or not?

If everyone plays by the rules, they are. If someone uses unsafe code or private reflection then there is no rule enforcement anymore. You have to trust that if someone is using high-privilege code then they are doing so correctly and not mutating a string. Use your power to run unsafe code only for good; with great power comes great responsibility.

So do I need to use locks or not?

That is a strange question. Remember, locks are co-operative. Locks only work if everyone accessing a particular object agrees upon the locking strategy that must be used.

You have to use locks if the agreed-upon locking strategy for accessing particular object in a particular storage location is to use locks. If that isn't the agreed-upon locking strategy then using locks is pointless; you're carefully locking and unlocking the front door while someone else is walking in the open back door.

If you have a string which you know is being mutated by unsafe code, and you don't want to see inconsistent partial mutations, and the code which is doing the unsafe mutation documents that it takes out a particular lock during that mutation, then yes, you need to use locks when accessing that string. But this situation is very rare; ideally no one would use unsafe code to manipulate a string accessible by other code on another thread, because doing so is an incredibly bad idea. That's why we require that code that does so is fully trusted. And that's why we require that the C# source code for such a function wave a big red flag that says "this code is unsafe, review it carefully!"


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...