Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
427 views
in Technique[技术] by (71.8m points)

html - Regex to match words or phrases in string but NOT match if part of a URL or inside <a> </a> tags. (php)

I am aware that regex is not ideal for use with HTML strings and I have looked at the PHP Simple HTML DOM Parser but still believe this is the way to go. All the HTML tags will be generated by my forum software so they will be consistent and valid HTML.

What I am trying to do is make a plugin that will find a list of keywords (or phrases) in a string of HTML and replace them with a link I specify. For example if someone types:

I use Amazon for that.

it would replace it with:

I use <a href="http://www.amazon.com">Amazon</a> for that.

The problem is of course is that if "amazon" is in the URL it would also get replaced. I solved that issue with a callback function found on this site, slightly modified.

But now I still have an issue, it still replaces words between opening and closing tags.

<a href="http://www.amazon.com">My Amazon Link</a>

It will match the "Amazon" in "My Amazon Link"

What I really need is a regex to match say "amazon" anywhere except between <a href and </a>

Any ideas?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Using the DOM would certainly be preferable.

However, you might get away with this:

$result = preg_replace('%Amazon(?![^<]*</a>)%i', '<a href="http://www.amazon.com">Amazon</a>', $subject);

It matches Amazon only if

  1. it's not followed by a closing </a> tag,
  2. it's not itself part of a tag,
  3. there are no intervening tags, i. e. it will be thrown off if tags can be nested inside <a> tags.

It will therefore change this:

I use Amazon for that.
I use <a href="http://www.amazon.com">Amazon</a> for that.
<a href="http://www.amazon.com">My Amazon Link</a>
It will match the "Amazon" in "My Amazon Link"

into this:

I use <a href="http://www.amazon.com">Amazon</a> for that.
I use <a href="http://www.amazon.com">Amazon</a> for that.
<a href="http://www.amazon.com">My Amazon Link</a>
It will match the "<a href="http://www.amazon.com">Amazon</a>" in "My <a href="http://www.amazon.com">Amazon</a> Link"

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...