Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.8k views
in Technique[技术] by (71.8m points)

r - RCurl Cookies on Debian

I am downloading large batches of pdfs from parliaments. I scraped the pdf addresses and now try to download them.

To do this, I set up a debian instance on a university cloud.

It worked fine for most of them, but for 4 parliaments, I downloaded an error page of having to accept cookies. The result is an html page with pdf file ending that contains mainly the question if I accept cookies.

This error does not happen on either Ubuntu or Windows 10. I figure this works because I accepted the cookies here in the Browser. I changed my code to RCurl and exported the cookies as txt files based on the 2 entries I found on stackoverflow.

I used the following example, as I mentioned it works on windows and ubuntu, but also without the cookiefile.

library(RCurl)

# the pdf to dl
appURL<-"http://www.dokumentation.landtag-mv.de/parldok/dokument/44970/eu_ratspraesidentschaft.pdf"

curl = getCurlHandle()
curlSetOpt(cookiefile="cookiesmv.txt"
           , curl=curl, followLocation = TRUE)
pdfData <- getBinaryURL(appURL, curl = curl)
writeBin(pdfData, "test2.pdf")

to reproduce, the cookiefile:

www.landtag-mv.de FALSE / FALSE 1641900313 cookieconsent_status dismiss www.landtag-mv.de FALSE / FALSE 1641900313 dp_cookieconsent_status {"dp--cookie-statistics":true,"dp--cookie-marketing":true} www.dokumentation.landtag-mv.de FALSE / FALSE 1641907216 cookieconsent_dismissed yes www.dokumentation.landtag-mv.de FALSE / FALSE 0 ASP.NET_SessionId ejtlcpjr0saw40ahceu4akb1

Maybe somebody has insights about where RCurl draws the cookies from...

best regards and thank you in advance, I hope I gave all the info necessary!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...