Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
281 views
in Technique[技术] by (71.8m points)

php - Make a JavaScript-aware Crawler

I want to make a script that's crawling a website and it should return the locations of all the banners showed on that page.

The locations of banners are most of the time from known domains. But banners are not in the HTML as an easy image or swf-file. Most of the times a Javascript is used to show the banner.

So if a .swf-file or image-file is loaded from a banner-domain, it should return that url.

Is that possible to do? And how could I do that roughly?

Best would be if it can also returns the landing page of that ad. How to solve that?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You could use selenium to open the pages in a real browser and then access the DOM.

PhantomJS might also be worth a look - it's a headless version of WebKit (the engine behind Chrome, Safari, etc.).

However, none of those solutions are pure php - if that's a requirement, you'll probably have to write your own JavaScript engine in PHP (which is nothing I'd ask my worst enemy to do ;))


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...