Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
424 views
in Technique[技术] by (71.8m points)

php - non-breaking utf-8 0xc2a0 space and preg_replace strange behaviour

In my string I have utf-8 non-breaking space (0xc2a0) and I want to replace it with something else.

When I use

$str=preg_replace('~xc2xa0~', 'X', $str);

it works OK.

But when I use

$str=preg_replace('~x{C2A0}~siu', 'W', $str);

non-breaking space is not found (and replaced).

Why? What is wrong with second regexp?

The format x{C2A0} is correct, also I used u flag.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Actually the documentation about escape sequences in PHP is wrong. When you use xc2xa0 syntax, it searches for UTF-8 character. But with x{c2a0} syntax, it tries to convert the Unicode sequence to UTF-8 encoded character.

A non breaking space is U+00A0 (Unicode) but encoded as C2A0 in UTF-8. So if you try with the pattern ~x{00a0}~siu, it will work as expected.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...