Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
167 views
in Technique[技术] by (71.8m points)

c# - Bug in the string comparing of the .NET Framework

It's a requirement for any comparison sort to work that the underlying order operator is transitive and antisymmetric.

In .NET, that's not true for some strings:

static void CompareBug()
{
  string x = "u002Du30A2";  // or just "-ア" if charset allows
  string y = "u3042";        // or just "あ" if charset allows

  Console.WriteLine(x.CompareTo(y));  // positive one
  Console.WriteLine(y.CompareTo(x));  // positive one
  Console.WriteLine(StringComparer.InvariantCulture.Compare(x, y));  // positive one
  Console.WriteLine(StringComparer.InvariantCulture.Compare(y, x));  // positive one

  var ja = StringComparer.Create(new CultureInfo("ja-JP", false), false);
  Console.WriteLine(ja.Compare(x, y));  // positive one
  Console.WriteLine(ja.Compare(y, x));  // positive one
}

You see that x is strictly greater than y, and y is strictly greater than x.

Because x.CompareTo(x) and so on all give zero (0), it is clear that this is not an order. Not surprisingly, I get unpredictable results when I Sort arrays or lists containing strings like x and y. Though I haven't tested this, I'm sure SortedDictionary<string, WhatEver> will have problems keeping itself in sorted order and/or locating items if strings like x and y are used for keys.

Is this bug well-known? What versions of the framework are affected (I'm trying this with .NET 4.0)?

EDIT:

Here's an example where the sign is negative either way:

x = "u4E00u30A0";         // equiv: "一?"
y = "u4E00u002Du0041";   // equiv: "一-A"
question from:https://stackoverflow.com/questions/13254153/bug-in-the-string-comparing-of-the-net-framework

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If correct sorting is so important in your problem, just use ordinal string comparison instead of culture-sensitive. Only this one guarantees transitive and antisymmetric comparing you want.

What MSDN says:

Specifying the StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase value in a method call signifies a non-linguistic comparison in which the features of natural languages are ignored. Methods that are invoked with these StringComparison values base string operation decisions on simple byte comparisons instead of casing or equivalence tables that are parameterized by culture. In most cases, this approach best fits the intended interpretation of strings while making code faster and more reliable.

And it works as expected:

    Console.WriteLine(String.Compare(x, y, StringComparison.Ordinal));  // -12309
    Console.WriteLine(String.Compare(y, x, StringComparison.Ordinal));  // 12309

Yes, it doesn't explain why culture-sensitive comparison gives inconsistent results. Well, strange culture — strange result.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...