Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
630 views
in Technique[技术] by (71.8m points)

php - Regex to only allow alphanumeric, comma, hyphen, underscore and semicolon

I've already got a bit of working code but I need someone to help explain why it works if they can!

I am using PHP to replace anything in a string if it is not either a-z, A-Z, 0-9, a comma, a semicolon, an underscore or a hyphen (which ultimately should represent either a single username, or a comma/semicolon separated list of usernames).

The following works:

$data = preg_replace('/[^,;a-zA-Z0-9_-]/s', '', $data);

But the following does not:

$data = preg_replace('/[^a-zA-Z0-9_-,;]/s', '', $data);

Why will this only work when the comma and semicolon are at the start? Putting them at the end seems to break things (this is what I tried initially when I came across /[^a-zA-Z0-9_-]/s.

As an aside, I am also using the following to trim any trailing semicolons (plural) or commas (plural) and someone may be able to suggest a more efficient and/or elegant way to do this?:

if(preg_match('/;$/', $data))
{
    $data = rtrim($data, ';' );
}
if(preg_match('/,$/', $data))
{
    $data = rtrim($data, ',' );
}

Thanks for any help :)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It's not the comma and semicolon causing your problem; it's the hyphen. Look at the parts of your character class and consider what they mean:

0-9 # Anything from '0' to '9', meaning 0, 1, 2, ... 9
A-Z # Anything from 'A' to 'Z', meaning A, B, C, ... Z
_-, # Anything from '_' to ',', meaning...uh...hmmm.

There's no clear progression from _ to ,, so the regex engine isn't sure what to make of this. In character classes, if you want a hyphen to be interpreted literally, it needs to be at the very beginning or end of the class (or escaped with a backslash). So any of these will work:

[^,;a-zA-Z0-9_-]
[^-,;a-zA-Z0-9_]
[^a-zA-Z0-9_-,;]

As for trimming off the end, you can do all of this in one regex replace:

$data = preg_replace('/[^,;a-zA-Z0-9_-]|[,;]$/s', '', $data);

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...