python - Submit data via web form and extract the results

Question

Welcome To Ask or Share your Answers For Others

python - Submit data via web form and extract the results

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Submit data via web form and extract the results

My python level is Novice. I have never written a web scraper or crawler. I have written a python code to connect to an api and extract the data that I want. But for some the extracted data I want to get the gender of the author. I found this web site http://bookblog.net/gender/genie.php but downside is there isn't an api available. I was wondering how to write a python to submit data to the form in the page and extract the return data. It would be a great help if I could get some guidance on this.

This is the form dom:

<form action="analysis.php" method="POST">
<textarea cols="75" rows="13" name="text"></textarea>
<div class="copyright">(NOTE: The genie works best on texts of more than 500 words.)</div>
<p>
<b>Genre:</b>
<input type="radio" value="fiction" name="genre">
fiction&nbsp;&nbsp;
<input type="radio" value="nonfiction" name="genre">
nonfiction&nbsp;&nbsp;
<input type="radio" value="blog" name="genre">
blog entry
</p>
<p>
</form>

results page dom:

<p>
<b>The Gender Genie thinks the author of this passage is:</b>
male!
</p>

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:45:37+0000

No need to use mechanize, just send the correct form data in a POST request.

Also, using regular expression to parse HTML is a bad idea. You would be better off using a HTML parser like lxml.html.

import requests
import lxml.html as lh


def gender_genie(text, genre):
    url = 'http://bookblog.net/gender/analysis.php'
    caption = 'The Gender Genie thinks the author of this passage is:'

    form_data = {
        'text': text,
        'genre': genre,
        'submit': 'submit',
    }

    response = requests.post(url, data=form_data)

    tree = lh.document_fromstring(response.content)

    return tree.xpath("//b[text()=$caption]", caption=caption)[0].tail.strip()


if __name__ == '__main__':
    print gender_genie('I have a beard!', 'blog')

Categories

python - Submit data via web form and extract the results

python - Submit data via web form and extract the results

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags