Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
904 views
in Technique[技术] by (71.8m points)

regex - Atomic groups clarity

Consider this regex.

a*b

This will fail in case of aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac

This takes 67 steps in debugger to fail.

Now consider this regex.

(?>a*)b

This will fail in case of aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac

This takes 133 steps in debugger to fail.

And lastly this regex:

a*+b  (a variant of atomic group)

This will fail in case of aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac

This takes 67 steps in debugger to fail.

When I check the benchmark atomic group (?>a*)b performs 179% faster.

Now atomic groups disable backtracking. So performance in match is good.

  1. But why are the number of steps more? Can somebody explain on this?

  2. Why is there a diff. in steps between two atomic groups (?>a*)b and a*+b.

Do they work differently?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Author note:

? ? This answer targets question 1 as delivered by the bounty text "I am looking forward to the exact reason why more steps are being needed by the debugger.I dont need answers explaining how atomic groups work.";
? ? Jerry's answer addresses the other concerns very well, while my other answer takes a ride through the mentioned constructs, how they work, and why they are important. For full knowledge, simply reading this post is not enough!

Every group in a regular expression takes a step to step into and out of the group.

? ? WHAT?!
Yeah, I'm serious, read on...

Firstly, I would like to present you with quantified non-capturing groups, over without the group:

Pattern 1: (?:c)at
Pattern 2: cat

So what exactly happens here? We'll match the patterns with the test string "concat" on a regex engine with optimizations disabled:

(?:c)atcat

While we're at it, I present you some more groups:

groups

? ? Oh no! I'm going to avoid using groups!

But wait! Please note that the number of steps taken to match has no correlation with the performance of the match. engines optimizes away most of the "unnecessary steps" as I've mentioned. Atomic groups are still the most efficient, despite more steps taken on an engine with optimizations disabled.

Perhaps relevant:


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...