Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
156 views
in Technique[技术] by (71.8m points)

c++ - std::string in a multi-threaded program

Given that:

1) The C++03 standard does not address the existence of threads in any way

2) The C++03 standard leaves it up to implementations to decide whether std::string should use Copy-on-Write semantics in its copy-constructor

3) Copy-on-Write semantics often lead to unpredictable behavior in a multi-threaded program

I come to the following, seemingly controversial, conclusion:

You simply cannot safely and portably use std::string in a multi-threaded program

Obviously, no STL data structure is thread-safe. But at least, with std::vector for example, you can simply use mutexes to protect access to the vector. With an std::string implementation that uses COW, you can't even reliably do that without editing the reference counting semantics deep within the vendor implementation.

Real-world example:

In my company, we have a multi-threaded application which has been thoroughly unit-tested and run through Valgrind countless times. The application ran for months with no problems whatsoever. One day, I recompile the application on another version of gcc, and all of a sudden I get random segfaults all the time. Valgrind is now reporting invalid memory accesses deep within libstdc++, in the std::string copy constructor.

So what is the solution? Well, of course, I could typedef std::vector<char> as a string class - but really, that sucks. I could also wait for C++0x, which I pray will require implementors to forgo COW. Or, (shudder), I could use a custom string class. I personally always rail against developers who implement their own classes when a preexisting library will do fine, but honestly, I need a string class which I can be sure is not using COW semantics; and std::string simply doesn't guarantee that.

Am I right that std::string simply cannot be used reliably at all in portable, multi-threaded programs? And what is a good workaround?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You cannot safely and portably do anything in a multi-threaded program. There is no such thing as a portable multi-threaded C++ program, precisely because threads throw everything C++ says about order of operations, and the results of modifying any variable, out the window.

There's also nothing in the standard to guarantee that vector can be used in the way you say. It would be legal to provide a C++ implementation with a threading extension in which, say, any use of a vector outside the thread in which it was initialized results in undefined behavior. The instant you start a second thread, you aren't using standard C++ any more, and you must look to your compiler vendor for what is safe and what is not.

If your vendor provides a threading extension, and also provides a std::string with COW that (therefore) cannot be made thread-safe, then I think for the time being your argument is with your vendor, or with the threading extension, not with the C++ standard. For example, arguably POSIX should have barred COW strings in programs which use pthreads.

You could possibly make it safe by having a single mutex, which you take while doing any string mutation whatsoever, and any reads of a string that's the result of a copy. But you'd probably get crippling contention on that mutex.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...