Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
475 views
in Technique[技术] by (71.8m points)

multithreading - How to get a faster speed when using multi-threading in python

Now i am studying how to fetch data from website as fast as possible. To get faster speed, im considering using multi-thread. Here is the code i used to test the difference between multi-threaded and simple post.

import threading
import time
import urllib
import urllib2


class Post:

    def __init__(self, website, data, mode):
        self.website = website
        self.data = data

        #mode is either "Simple"(Simple POST) or "Multiple"(Multi-thread POST)
        self.mode = mode

    def post(self):

        #post data
        req = urllib2.Request(self.website)
        open_url = urllib2.urlopen(req, self.data)

        if self.mode == "Multiple":
            time.sleep(0.001)

        #read HTMLData
        HTMLData = open_url.read()



        print "OK"

if __name__ == "__main__":

    current_post = Post("http://forum.xda-developers.com/login.php", "vb_login_username=test&vb_login_password&securitytoken=guest&do=login", 
                        "Simple")

    #save the time before post data
    origin_time = time.time()

    if(current_post.mode == "Multiple"):

        #multithreading POST

        for i in range(0, 10):
           thread = threading.Thread(target = current_post.post)
           thread.start()
           thread.join()

        #calculate the time interval
        time_interval = time.time() - origin_time

        print time_interval

    if(current_post.mode == "Simple"):

        #simple POST

        for i in range(0, 10):
            current_post.post()

        #calculate the time interval
        time_interval = time.time() - origin_time

        print time_interval

just as you can see, this is a very simple code. first i set the mode to "Simple", and i can get the time interval: 50s(maybe my speed is a little slow :(). then i set the mode to "Multiple", and i get the time interval: 35. from that i can see, multi-thread can actually increase the speed, but the result isnt as good as i imagine. i want to get a much faster speed.

from debugging, i found that the program mainly blocks at the line: open_url = urllib2.urlopen(req, self.data), this line of code takes a lot of time to post and receive data from the specified website. i guess maybe i can get a faster speed by adding time.sleep() and using multi-threading inside the urlopen function, but i cannot do that because its the python's own function.

if not considering the prossible limits that the server blocks the post speed, what else can i do to get the faster speed? or any other code i can modify? thx a lot!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The biggest thing you are doing wrong, that is hurting your throughput the most, is the way you are calling thread.start() and thread.join():

for i in range(0, 10):
   thread = threading.Thread(target = current_post.post)
   thread.start()
   thread.join()

Each time through the loop, you create a thread, start it, and then wait for it to finish Before moving on to the next thread. You aren't doing anything concurrently at all!

What you should probably be doing instead is:

threads = []

# start all of the threads
for i in range(0, 10):
   thread = threading.Thread(target = current_post.post)
   thread.start()
   threads.append(thread)

# now wait for them all to finish
for thread in threads:
   thread.join()

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...