python - Improve speed of reading and converting from binary file?

Question

Welcome To Ask or Share your Answers For Others

python - Improve speed of reading and converting from binary file?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Improve speed of reading and converting from binary file?

I know there have been some questions regarding file reading, binary data handling and integer conversion using struct before, so I come here to ask about a piece of code I have that I think is taking too much time to run. The file being read is a multichannel datasample recording (short integers), with intercalated intervals of data (hence the nested for statements). The code is as follows:

# channel_content is a dictionary, channel_content[channel]['nsamples'] is a string
for rec in xrange(number_of_intervals)):
    for channel in channel_names:
        channel_content[channel]['recording'].extend(
            [struct.unpack( "h", f.read(2))[0]
            for iteration in xrange(int(channel_content[channel]['nsamples']))])

With this code, I get 2.2 seconds per megabyte read with a dual-core with 2Mb RAM, and my files typically have 20+ Mb, which gives some very annoying delay (specially considering another benchmark shareware program I am trying to mirror loads the file WAY faster).

What I would like to know:

If there is some violation of "good practice": bad-arranged loops, repetitive operations that take longer than necessary, use of inefficient container types (dictionaries?), etc.
If this reading speed is normal, or normal to Python, and if reading speed
If creating a C++ compiled extension would be likely to improve performance, and if it would be a recommended approach.
(of course) If anyone suggests some modification to this code, preferrably based on previous experience with similar operations.

Thanks for reading

(I have already posted a few questions about this job of mine, I hope they are all conceptually unrelated, and I also hope not being too repetitive.)

Edit: channel_names is a list, so I made the correction suggested by @eumiro (remove typoed brackets)

Edit: I am currently going with Sebastian's suggestion of using array with fromfile() method, and will soon put the final code here. Besides, every contibution has been very useful to me, and I very gladly thank everyone who kindly answered.

Final Form after going with array.fromfile() once, and then alternately extending one array for each channel via slicing the big array:

fullsamples = array('h')
fullsamples.fromfile(f, os.path.getsize(f.filename)/fullsamples.itemsize - f.tell())
position = 0
for rec in xrange(int(self.header['nrecs'])):
    for channel in self.channel_labels:
        samples = int(self.channel_content[channel]['nsamples'])
        self.channel_content[channel]['recording'].extend(
                                                fullsamples[position:position+samples])
        position += samples

The speed improvement was very impressive over reading the file a bit at a time, or using struct in any form.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T02:50:07+0000

You could use array to read your data:

import array
import os

fn = 'data.bin'
a = array.array('h')
a.fromfile(open(fn, 'rb'), os.path.getsize(fn) // a.itemsize)

It is 40x times faster than struct.unpack from @samplebias's answer.

Categories

python - Improve speed of reading and converting from binary file?

python - Improve speed of reading and converting from binary file?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags