Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
921 views
in Technique[技术] by (71.8m points)

javascript - Regex to match certain characters and exclude certain characters but without negative lookahead

I want a regex that matches all emojis (or most of them) but excludes certain characters (such as “|”|‘|’|…|—).

This regex does the job via negative lookahead:

/(?!u201C|u201D|u2018|u2019|u2026|u2014)(u00a9|u00ae|[u2000-u3300]|ud83c[ud000-udfff]|ud83d[ud000-udfff]|ud83e[ud000-udfff])/

But apparently Google Scripts doesn't support this. Error:

Invalid regular expression pattern (?!“|”|‘|’|…|—)(?|?|[?-?]|?[?-?]|?[?-?]|?[?-?])

Is there another way to achieve my goal (a regex that works with Google Script's findText)?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Option 1

Maybe,

[u{1f300}-u{1f5ff}u{1f900}-u{1f9ff}u{1f600}-u{1f64f}u{1f680}-u{1f6ff}u{2600}-u{26ff}u{2700}-u{27bf}u{1f1e6}-u{1f1ff}u{1f191}-u{1f251}u{1f004}u{1f0cf}u{1f170}-u{1f171}u{1f17e}-u{1f17f}u{1f18e}u{3030}u{2b50}u{2b55}u{2934}-u{2935}u{2b05}-u{2b07}u{2b1b}-u{2b1c}u{3297}u{3299}u{303d}u{00a9}u{00ae}u{2122}u{23f3}u{24c2}u{23e9}-u{23ef}u{25b6}u{23f8}-u{23fa}]

might be working OK for your desired emojis.

Demo

Option 2

Otherwise, you might want to negate those undesired chars using char classes, such as:

[these unicode ranges &&[^these unicodes]]

which would become pretty complicated, yet possible.

Option 3

Using this option you can most likely solve your problem much simpler. I guess, your problem is that those undesired punctuations are already among the desired unicodes. Check to see if that'd be the case. For example, in

[u100-u200]

you might have u150 and u175 as undesired chars, which you want them to be removed from your desired ranges of unicodes that you already have.

You can then simply remove those from the range, such as with:

[u100-u149u151-u174u176-u200]

and as simple as that the problem would be solved.

Source

javascript unicode emoji regular expressions


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...