Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
552 views
in Technique[技术] by (71.8m points)

php - Screen scraping JS page

I'm trying to scrape this page http://www.buddytv.com/trivia/game-of-thrones-trivia.aspx and it's not working.

I tried

$html = new simple_html_dom();
  $html->load_file($url);

But for the question I'm looking to grab (.trivia-question) can't be found. Can anybody tell me what I'm doing wrong ?

Thanks a lot!

And I tried

  <?php
  $Page = file_get_contents('http://www.buddytv.com/trivia/game-of-thrones-trivia.aspx');
  $dom_document = new DOMDocument();
  //errors suppress because it is throwing errors due to mismatched html tags
  @$dom_document->loadHTML($Page);
  $dom_xpath_admin = new DOMXpath($dom_document_admin);
  $elements = $dom_xpath->query('//*[@id="id60questionText"]');
  var_dump($elements);
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Ok then here is phantomjs example:

You need to download phantomjs from: http://phantomjs.org/, put somewhere where you can easily access by a script.

Test it by running {installationdir}/bin/phantomjs (phantomjs.exe on windows) --version

Then create JS file somewhere in your project, ex browser.js

var page = require('webpage').create();

page.open('http://www.buddytv.com/trivia/game-of-thrones-trivia.aspx', function() {

page.includeJs("http://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js", function() {

    search = page.evaluate(function() { 
        return  $('#id60questionText').text();
    });

    console.log(search);

    phantom.exit()
  });
})

Then in your PHP script read it like:

$pathToPhatomJs = '/home/aurimas/Downloads/phantomjs/phantomjs-1.9.1-linux-x86_64/bin/phantomjs';

$pathToJsScript = '/home/aurimas/Downloads/phantomjs/phantomjs-1.9.1-linux-x86_64/browser.js';

$stdOut = exec(sprintf('%s %s', $pathToPhatomJs,  $pathToJsScript), $out);

echo $stdOut;

Change $pathToPhatomJs and $pathToJsScript according to your configuration.

If you are on windows this may not work. You can then change PHP script to:

$pathToPhatomJs = '/home/aurimas/Downloads/phantomjs/phantomjs-1.9.1-linux-x86_64/bin/phantomjs';

$pathToJsScript = '/home/aurimas/Downloads/phantomjs/phantomjs-1.9.1-linux-x86_64/browser.js';

exec(sprintf('%s %s > phatom.txt', $pathToPhatomJs,  $pathToJsScript), $out);

$fileContents = file_get_contents('phatom.txt');

echo $fileContents;

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...