Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
285 views
in Technique[技术] by (71.8m points)

php - Match and replace emoticons in string - what is the most efficient way?

Wikipedia defines a lot of possible emoticons people can use. I want to match this list to words in a string. I now have this:

$string = "Lorem ipsum :-) dolor :-| samet";
$emoticons = array(
  '[HAPPY]' => array(' :-) ', ' :) ', ' :o) '), //etc...
  '[SAD]'   => array(' :-( ', ' :( ', ' :-| ')
);
foreach ($emoticons as $emotion => $icons) {
  $string = str_replace($icons, " $emotion ", $string);
}
echo $string;

Output:

Lorem ipsum [HAPPY] dolor [SAD] samet

so in principle this works. However, I have two questions:

  1. As you can see, I'm putting spaces around each emoticon in the array, such as ' :-) ' instead of ':-)' This makes the array less readable in my opinion. Is there a way to store emoticons without the spaces, but still match against $string with spaces around them? (and as efficiently as the code is now?)

  2. Or is there perhaps a way to put the emoticons in one variable, and explode on space to check against $string? Something like

    $emoticons = array( '[HAPPY]' => ">:] :-) :) :o) :] :3 :c) :> =] 8) =) :} :^)", '[SAD]' => ":'-( :'( :'-) :')" //etc...

  3. Is str_replace the most efficient way of doing this?

I'm asking because I need to check millions of strings, so I'm looking for the most efficient way to save processing time :)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here’s the idea using the Perl 3rd-party Regexp::Assemble module from CPAN. For example, given this program:

#!/usr/bin/env perl
use utf8;
use strict;
use warnings;

use Regexp::Assemble;

my %faces = (
    HAPPY => [qw? :-) :) :o) :-} ;-} :-> ;-} ?],
    SAD   => [qw? :-( :( :-| ;-) ;-( ;-< |-{ ?],
);

for my $name (sort keys %faces) {
    my $ra = Regexp::Assemble->new();
    for my $face (@{ $faces{$name} }) {
        $ra->add(quotemeta($face));
    }
    printf "%-12s => %s
", "[$name]", $ra->re;
}

It will output this:

[HAPPY]      => (?-xism:(?::(?:-(?:[)>]|})|o?))|;-}))
[SAD]        => (?-xism:(?::(?:-(?:||()|()|;-[()<]||-{))

There’s a bit of extra stuff there you don’t really probably need, so those would reduce to just:

[HAPPY]      => (?:-(?:[)>]|})|o?))|;-}
[SAD]        => (?:-(?:||()|()|;-[()<]||-{

or so. You could build that into your Perl program to trim the extra bits. Then you could place the righthand sides straight into your preg_replace.

The reason I did the use utf8 was so I could use ? as my qw// delimiter, because I didn’t want to mess with escaping things inside there.

You wouldn’t need to do this if the whole program were in Perl, because modern versions of Perl already know to do this for you automatically. But it’s still useful to know how to use the module so you can generate patterns to use in other languages.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...