python - Multithreaded file copy is far slower than a single thread on a multicore CPU

Question

Welcome To Ask or Share your Answers For Others

python - Multithreaded file copy is far slower than a single thread on a multicore CPU

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Multithreaded file copy is far slower than a single thread on a multicore CPU

I am trying to write a multithreaded program in Python to accelerate the copying of (under 1000) .csv files. The multithreaded code runs even slower than the sequential approach. I timed the code with profile.py. I am sure I must be doing something wrong but I'm not sure what.

The Environment:

Quad core CPU.
2 hard drives, one containing source files. The other is the destination.
1000 csv files ranging in size from several KB to 10 MB.

The Approach:

I put all the file paths in a Queue, and create 4-8 worker threads pull file paths from the queue and copy the designated file. In no case is the multithreaded code faster:

sequential copy takes 150-160 seconds
threaded copy takes over 230 seconds

I assume this is an I/O bound task, so multithreading should help the operation speed.

The Code:

    import Queue
    import threading
    import cStringIO 
    import os
    import shutil
    import timeit  # time the code exec with gc disable
    import glob    # file wildcards list, glob.glob('*.py')
    import profile # 

    fileQueue = Queue.Queue() # global
    srcPath  = 'C:\temp'
    destPath = 'D:\temp'
    tcnt = 0
    ttotal = 0

    def CopyWorker():
        while True:
            fileName = fileQueue.get()
            fileQueue.task_done()
            shutil.copy(fileName, destPath)
            #tcnt += 1
            print 'copied: ', tcnt, ' of ', ttotal

    def threadWorkerCopy(fileNameList):
        print 'threadWorkerCopy: ', len(fileNameList)
        ttotal = len(fileNameList)
        for i in range(4):
            t = threading.Thread(target=CopyWorker)
            t.daemon = True
            t.start()
        for fileName in fileNameList:
            fileQueue.put(fileName)
        fileQueue.join()

    def sequentialCopy(fileNameList):
        #around 160.446 seconds, 152 seconds
        print 'sequentialCopy: ', len(fileNameList)
        cnt = 0
        ctotal = len(fileNameList)
        for fileName in fileNameList:
            shutil.copy(fileName, destPath)
            cnt += 1
            print 'copied: ', cnt, ' of ', ctotal

    def main():
        print 'this is main method'
        fileCount = 0
        fileList = glob.glob(srcPath + '\' + '*.csv')
        #sequentialCopy(fileList)
        threadWorkerCopy(fileList)

    if __name__ == '__main__':
        profile.run('main()')

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T20:04:44+0000

Of course it's slower. The hard drives are having to seek between the files constantly. Your belief that multi-threading would make this task faster is completely unjustified. The limiting speed is how fast you can read data from or write data to the disk, and every seek from one file to another is a loss of time that could have been spent transferring data.

Categories

python - Multithreaded file copy is far slower than a single thread on a multicore CPU

python - Multithreaded file copy is far slower than a single thread on a multicore CPU

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags