Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
512 views
in Technique[技术] by (71.8m points)

multiprocessing - Fastest way to extract tar files using Python

I have to extract hundreds of tar.bz files each with size of 5GB. So tried the following code:

import tarfile
from multiprocessing import Pool

files = glob.glob('D:\*.tar.bz') ##All my files are in D
for f in files:

   tar = tarfile.open (f, 'r:bz2')
   pool = Pool(processes=5)

   pool.map(tar.extractall('E:\') ###I want to extract them in E
   tar.close()

But the code has type error: TypeError: map() takes at least 3 arguments (2 given)

How can I solve it? Any further ideas to accelerate extracting?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Define a function that extract a single tar file. Pass that function and a tar file list to multiprocessing.Pool.map:

from functools import partial
import glob
from multiprocessing import Pool
import tarfile


def extract(path, dest):
    with tarfile.open(path, 'r:bz2') as tar:
        tar.extractall(dest)

if __name__ == '__main__':
    files = glob.glob('D:\*.tar.bz')
    pool = Pool(processes=5)
    pool.map(partial(extract, dest='E:\'), files)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...