Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
547 views
in Technique[技术] by (71.8m points)

the same regex but different results on Linux and Windows only C++

I have this pattern for my command-line program:
^s?([/|@#])(?:(?!1).)+1(?:(?!1).)*1(?:(?:gi?|ig)?(?:1dd?)?|i)?$
based on ECMAScript 262 for C++.

This is a special pattern to check if the user have entered a correct command or not. It is a test against a string like this:
optional-s/one-or-more/anything/optional-g-or-i/optional-2-digits

Here is my previous question why I need this pattern.
Although it works fine on Linux, but does not work on Windows. Also I know about line-break on the two machines and I have read this: How are and handled differently on Linux and Windows?

My program does work with any files, it only gets the first argument of the command-line argv[ 1 ] and the std::regex_match tests if the entered-user-synopsis is correct or not.
Like: ./program 's/one/two/' *.txt that simply renames one to two for all txt files

the C++ code:

std::string argv_1 = argv[ 1 ]; // => s/one/two/
bool rename_is_correct =
std::regex_match( argv_1, std::basic_regex< char >
( "s?([/|@#])(?:(?!\1).)+\1(?:(?!\1).)*\1(?:(?:gi?|ig)?(?:\1-?[1-9]\d?)?|i)?" ) );

The Problem:
Although the pattern is non-greedy; on Windows it becomes greedy and matches more then 4 delimiters. Therefore it should not match /one/two/three/four/five/ but this string is matched!


NOTE:

  • I deliberately have dropped ^ and $ assertions since in the C++ regex the std::regex_match by default has them and it no need to use them
  • Also the two backslashes \; one of them is escape character
  • javescript code says no

const regex = /^s?([/|@#])(?:(?!1).)+1(?:(?!1).)*1((?:gi?|gi)1-?[1-9]d|i)?$/gm;
var str = 's/one/two/gi/-33/';
if( str.match( regex ) ){
  console.log( "okay" );
} else {
  console.log( "no" );
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There seems to have been a bug in GCC that got fixed in version 5.4. My guess is you are running an older version on your Windows set-up.

See the difference in output in:

It does not seem to make a difference whether boost is included or not.

The bug is related to (?!\1), as replacing it by (?![/]) (in both instances) solves the issue, but obviously that would limit the regular expression for use with the / delimiter only:

Also, the bug appears with this simple regular expression: (.)((?!\1).) which should reject an input like aa:

Conclusion: make sure to install GCC version 5.4 or higher.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...