Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
291 views
in Technique[技术] by (71.8m points)

python - Multiprocessing: Assign values to N-dimensional array/matrix parallelly

I was trying to convert for-loop to multiprocessing.Pool().map function. Here, I have created an empty csr_matrix and assigning values based on an index parallelly. But this not working as expected. It is taking a couple of minutes to execute the code, but byte_bigram_matrix is still empty.

byte_bigram_matrix = csr_matrix((10868,66049))

def calculate_bigram(file):
    with open('byteFiles/'+file,"r") as byte_file:
        byte_bigram_matrix[files.index(file)] = csr_matrix(#someprocessing to calculate bigrams)


from multiprocessing import Pool

#Using multiprocessing to calculate bi-grams 
files = os.listdir('filesPath/')
p = Pool() #Using max cores as processors
p.map(calculate_bigram, files)
p.close()
p.join()

Question:

Can't we index values of N-D array/matrix parallelly using map function from Multiprocessing? or how to do this task using multiprocessing?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

firstly files is a 1 dimensional python list of the names of the files in "filePath/"
from what I can tell the problem lies in calculate_bigram as you are opening a file using read rather than write, therefore you will get an error when trying to write to it. I tried this:

def calculate_bigram(file):
    if os.path.isfile(file):
        with open(file, "w") as byte_file:
            byte_file.write("this is a test")

import os
from multiprocessing import Pool

if __name__ == "__main__":
    #Using multiprocessing to calculate bi-grams 
    files = os.listdir('files/')
    path = os.path.dirname(__file__)
    for idx, file in enumerate(files):
        files[idx] = os.path.join(path, "files", file)

    with Pool(processes=4) as pool:
        pool.map(calculate_bigram, files)

and the files dir looks like this

files
|-> a.txt
|-> b.txt
|-> sub
     |-> c.txt

additionaly you have to suply the full path not the path in relation to the file your executing hence the

path = os.path.dirname(__file__)
for idx, file in enumerate(files):
    files[idx] = os.path.join(path, "files", file)

because pool changes the execution direcory so the files end up someware you dont want it

Edit: to your comment:you still have to specify the full path and not the path in relation to the current execution. at least that's how it works for me


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...