python - Inherent way to save web page source

Question

Welcome To Ask or Share your Answers For Others

python - Inherent way to save web page source

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Inherent way to save web page source

I have read a lot of answers regarding web scraping that talk about BeautifulSoup, Scrapy e.t.c. to perform web scraping.

Is there a way to do the equivalent of saving a page's source from a web brower?

That is, is there a way in Python to point it at a website and get it to save the page's source to a text file with just the standard Python modules?

Here is where I got to:

import urllib

f = open('webpage.txt', 'w')
html = urllib.urlopen("http://www.somewebpage.com")

#somehow save the web page source

f.close()

Not much I know - but looking for code to actually pull the source of the page so I can write it. I gather that urlopen just makes a connection.

Perhaps there is a readlines() equivalent for reading lines of a web page?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:50:09+0000

You may try urllib2:

import urllib2

page = urllib2.urlopen('http://stackoverflow.com')

page_content = page.read()

with open('page_content.html', 'w') as fid:
    fid.write(page_content)

Categories

python - Inherent way to save web page source

python - Inherent way to save web page source

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags