Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
729 views
in Technique[技术] by (71.8m points)

problem with adding root path using php domdocument

I would like to add root path of the site for those anchor tag which have not root path using php dom document, Till now a have made a function to do this with str_replace function but for some links its adding three and for times root path. Then what i should to edit in this function.

Problem:= The problem is its adding three and for times root path for every anchor tag, and not for some. $HTML variable has many anchor tags, about above 200 links. And also same for images.

I know that its very dirty question, but what i have missed, i cant getting.

function addRootPathToAnchor($HTML)
{
    $tmpHtml = '';
    $xml = new DOMDocument();
    $xml->validateOnParse = true;
    $xml->loadHTML($HTML);

   foreach ($xml->getElementsByTagName('a') as $a )
   {
      $href = $a->getAttribute('href');
      if(strpos($href,'www' > 0))
        continue;
      else
        $HTML = str_replace($href,"http://www.mysite.com/".$href,$HTML);  

   }

   return $HTML;
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I see some problems in your code:

  1. The decision whether or not an URI has a full root path (is a fully qualified URI) or not.
  2. You're not resolving relative URLs to the base URL. Just appending does not do the job.
  3. The function returns a DomDocument Object and not a string. I assume you don't want that but I don't know, you have not written in your question.

How to detect if a URL is a relative one.

Relative URLs don't specifiy a protocol. So I would check for that to determine whether or not a href attribute is a fully qualified (absolute) URI or not (Demo):

$isRelative = (bool) !parse_url($url, PHP_URL_SCHEME);

Resolving a relative URL to a base URL

However this won't help you to properly resolve a relative URL to the base URL. What you do is conceptually broken. It's specified in an RFC how to resolve a relative URI to the base URL (RFC 1808 and RFC 3986). You can use an existing library to just let the work do for you, a working one is Net_URL2:

require_once('Net/URL2.php'); # or configure your autoloader

$baseUrl = 'http://www.example.com/test/images.html';

$hrefRelativeOrAbsolute = '...';

$baseUrl = new Net_URL2($baseUrl);

$urlAbsolute = (string) $baseUrl->resolve($hrefRelativeOrAbsolute);

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...