Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
364 views
in Technique[技术] by (71.8m points)

raku - Regex speed in Perl 6

I've been previously working only with bash regular expressions, grep, sed, awk etc. After trying Perl 6 regexes I've got an impression that they work slower than I would expect, but probably the reason is that I handle them incorrectly. I've made a simple test to compare similar operations in Perl 6 and in bash. Here is the Perl 6 code:

my @array = "aaaaa" .. "fffff";
say +@array; # 7776 = 6 ** 5

my @search = <abcde cdeff fabcd>;

my token search {
    @search
}

my @new_array = @array.grep({/ <search> /});
say @new_array;

Then I printed @array into a file named array (with 7776 lines), made a file named search with 3 lines (abcde, cdeff, fabcd) and made a simple grep search.

$ grep -f search array

After both programs produced the same result, as expected, I measured the time they were working.

$ time perl6 search.p6
real    0m6,683s
user    0m6,724s
sys     0m0,044s
$ time grep -f search array
real    0m0,009s
user    0m0,008s
sys     0m0,000s

So, what am I doing wrong in my Perl 6 code?

UPD: If I pass the search tokens to grep, looping through the @search array, the program works much faster:

my @array = "aaaaa" .. "fffff";
say +@array;

my @search = <abcde cdeff fabcd>;

for @search -> $token {
  say ~@array.grep({/$token/});
}
$ time perl6 search.p6
real    0m1,378s
user    0m1,400s
sys     0m0,052s

And if I define each search pattern manually, it works even faster:

my @array = "aaaaa" .. "fffff";
say +@array; # 7776 = 6 ** 5

say ~@array.grep({/abcde/});
say ~@array.grep({/cdeff/});
say ~@array.grep({/fabcd/});
$ time perl6 search.p6
real    0m0,587s
user    0m0,632s
sys     0m0,036s
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The grep command is much simpler than Perl 6's regular expressions, and it has had many more years to get optimized. It is also one of the areas that hasn't seen as much optimizing in Rakudo; partly because it is seen as being a difficult thing to work on.


For a more performant example, you could pre-compile the regex:

my $search = "/@search.join('|')/".EVAL;
#  $search =  /abcde|cdeff|fabcd/;
say ~@array.grep($search);

That change causes it to run in about half a second.

If there is any chance of malicious data in @search, and you have to do this it may be safer to use:

"/@search?.Str?.perl.join('|')/".EVAL

The compiler can't quite generate that optimized code for /@search/ as @search could change after the regex gets compiled. What could happen is that the first time the regex is used it gets re-compiled into the better form, and then cache it as long as @search doesn't get modified.
(I think Perl?5 does something similar)

One important fact you have to keep in mind is that a regex in Perl?6 is just a method that is written in a domain specific sub-language.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...