Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
919 views
in Technique[技术] by (71.8m points)

php - Regex replace text outside script tag

I have this HTML:

"This is simple html text <script language="javascript">simple simple text text</script> text"

I need to match only words that are outside script tag. I mean if I want to match “simple” and “text” I should get the results only from “This is simple html text” and the last part “text” — the result will be “simple” 1 match, “text” 2 matches. Could anyone help me with this? I’m using PHP.

I found a similar answer for match text outside a tag:

(text|simple)(?![^<]*>|[^<>]*</)

Regex replace text outside html tags

But couln't put to work for a specific tag (script):

(text|simple)(?!(^<script*>)|[^<>]*</)

ps: This question is not a duplicate (strip_tags, remove javascript). 'Cause i′m not trying to strip tags, or select the content inside the script tag. i′m trying replace content outside the tag "script".

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

My pattern will use (*SKIP)(*FAIL) to disqualify matched script tags and their contents.

text and simple will be match on every qualifying occurrence.

Regex Pattern: ~<script.*?/script>(*SKIP)(*FAIL)|text|simple~

Pattern / Replacement Demo Link

Code: (Demo)

$strings=['This has no replacements',
    'This simple text has no script tag',
    'This simple text ends with a script tag <script language="javascript">simple simple text text</script>',
    'This is simple html text is split by a script tag <script language="javascript">simple simple text text</script> text',
    '<script language="javascript">simple simple text text</script> this text starts with a script tag'
];

$strings=preg_replace('~<script.*?/script>(*SKIP)(*FAIL)|text|simple~','***replaced***',$strings);

var_export($strings);

Output:

array (
  0 => 'This has no replacements',
  1 => 'This ***replaced*** ***replaced*** has no script tag',
  2 => 'This ***replaced*** ***replaced*** ends with a script tag <script language="javascript">simple simple text text</script>',
  3 => 'This is ***replaced*** html ***replaced*** is split by a script tag <script language="javascript">simple simple text text</script> ***replaced***',
  4 => '<script language="javascript">simple simple text text</script> this ***replaced*** starts with a script tag',
)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...