I would like to scrape the content of this Google search result page using curl.
I've been trying setting different user agents, and setting other options but I just can't seem to get the content of that page, as I often get redirected or I get a "page moved" error.
I believe it has something to do with the fact that the query string gets encoded somewhere but I'm really not sure how to get around that.
//$url is the same as the link above
$ch = curl_init();
$user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0'
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt ($ch, CURLOPT_HEADER, 0);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch,CURLOPT_CONNECTTIMEOUT,120);
curl_setopt ($ch,CURLOPT_TIMEOUT,120);
curl_setopt ($ch,CURLOPT_MAXREDIRS,10);
curl_setopt ($ch,CURLOPT_COOKIEFILE,"cookie.txt");
curl_setopt ($ch,CURLOPT_COOKIEJAR,"cookie.txt");
echo curl_exec ($ch);
What do I need to do to get my php code to show the exact content of the page as I would see it on my browser? What am I missing? Can anyone point me to the right direction?
I've seen similar questions on SO, but none with an answer that could help me.
EDIT:
I tried to just open the link using the Selenium WebDriver, that gives the same results as cURL. I am still thinking that this has to do with the fact that there are special characters in the query string which are getting messed up somewhere in the process.
question from:
https://stackoverflow.com/questions/14953867/how-to-get-page-content-using-curl 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…