Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
249 views
in Technique[技术] by (71.8m points)

php - Remove specific tags from html while avoiding iframes

I need to remove some specific tag from HTML example. To avoid using iframe, I am getting an HTML page in my PHP file using curl and using getJSON. I am getting the result in my.js, but I am not taking the whole HTML and pasting it in my own div. I guess the reason for this is that I can not have more than one HTML, HEAD, and BODY tag in one HTML structure.

<!DOCTYPE html>
<html>
    <head>
        <style>some style</style>
        <title>Title of the document</title>
    </head>    
    <body>
        The content of the document......
    </body>
</html>

Now in the above structure I do not need HTML, BODY, and HEAD tags, but I do need a STYLE tag for CSS so I just want to remove HTML, BODY, and HEAD tags. After removing I need to append this to my div (all this trouble is because I do not want to use iframes). How do I remove it? I thought of strip_tags() and preg_replace or some regex function, but couldn't understand the best way to do it. Please help me find the best way to do this. It could be in PHP, JavaScript, or JQuery. but i would appriciate if answer are in javascript and jquery since i would like to do this manupulation in my JS but if needed PHP will also work

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use a DOM Parser, regex is not to be used for parsing HTML.

The following example uses the DOMDocument parser to extract the elements you want. $html is the HTLM document retrieved with cURL.

libxml_use_internal_errors(true); //Prevents Warnings, remove if desired
$dom = new DOMDocument();
$dom->loadHTML($html);
$styleNode = $dom->getElementsByTagName("style")->item(0);
$style = $dom->saveHTML($styleNode);
$body = "";
foreach($dom->getElementsByTagName("body")->item(0)->childNodes as $child) {
    $body .= $dom->saveHTML($child);
}

echo $style;
echo $body;

Assuming this script is being called with getJson create a json object with $style and $body and pass it back to the javascript to be inserted into the page.

As I understand your question, this should be your application flow:

Client loads page -> .getJSON invokes a php script -> said php script loads content from somewhere else with cURL -> this code runs -> json object is passed back to .getJSON -> the success callback from .getJSON adds the new HTML to the page


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...