Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
194 views
in Technique[技术] by (71.8m points)

php - Want a regex which matches > char not containing in any tag

I want a regex which matches '>' char in the text such that it should not match > in the tags

For example -

"<span>some >text< again some<some tag></some tag>vfs>vf</span>"

Should match - <span>some >text< again some<some tag></some tag>vfs>vf</span>
..............................................|..............................................................|

Where the | indicates the > to be matched.

For reference I have prepared a regex which does the same thing for <

Here is my regex - "/(?!<[^<>]*>)**<**/" (here '<' is just in bold to show here)

Thanks in advance!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If your requirements are simple - don't include quoted or escaped angle brackets, nor nested angle bracket pairs, the problem of finding a STARTING unmatched bracket is the first position of a string starting with an open bracket, containing no internal brackets, and ending with either another open bracket or the end of the string.

In regex speak, that would be:

/(<)[^<>]*(?:$|<)/

Because you want to capture all of them, and will be using preg_match_all, you need to add in look ahead to catch the overlapping matches:

/(?=(<)[^<>]*(?:$|<))/

Similarly, the unmatched right bracket problem simplifies to the last character of a string starting with either the beginning of a string or a close bracket, and ending with the close bracket, with no bracket in between. Adding in look ahead, you get:

/(?=(?:^|>)[^<>]*(>))/

I added a couple of extra brackets to your test strings to make sure we catch the end and overlapping cases, and a replacement example:

<?php
// Left angle brackets
$x = "<span>some >text< again<< some<some tag><</some tag>vfs>vf</span><<";
$y = preg_match_all('/(?=(<)[^<>]*(?:$|<))/', $x, $match, PREG_OFFSET_CAPTURE);
echo "Test: '{$x}'
";
echo "Repl: '" . locate_replace($x, $match[1], '<') . "'
";
echo "There are {$y} extra left angle brackets at character positions:
";
echo "  " . implode(", ", array_column($match[1], 1)) . "

";

// Right angle brackets

$x = "abc><span>some >text< again some<some tag></some tag>vfs>>vf</span>";
$y = preg_match_all('/(?=(?:^|>)[^<>]*(>))/', $x, $match, PREG_OFFSET_CAPTURE);
echo "Test: '{$x}'
";
echo "Repl: '" . locate_replace($x, $match[1], '>') . "'
";
echo "There are {$y} extra right angle brackets at character positions:
";
echo "  " . implode(", ", array_column($match[1], 1)) . "
";

function locate_replace($x, $match_oc, $repl) {
    while ($mt = array_pop($match_oc)) {
        $sloc = $mt[1];
        $eloc = $sloc + strlen($mt[0]);
        $x = substr($x, 0, $sloc) . $repl . substr($x, $eloc);
    }
    return $x;
}
?>

And this produces:

Test: '<span>some >text< again<< some<some tag><</some tag>vfs>vf</span><<'
Repl: '<span>some >text< again<< some<some tag><</some tag>vfs>vf</span><<'
There are 6 extra left angle brackets at character positions:
  16, 23, 24, 40, 65, 66

Test: 'abc><span>some >text< again some<some tag></some tag>vfs>>vf</span>'
Repl: 'abc><span>some >text< again some<some tag></some tag>vfs>>vf</span>'
There are 4 extra right angle brackets at character positions:
  3, 15, 56, 57

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...