python - Optimising HDF5 dataset for Read/Write speed

Question

Welcome To Ask or Share your Answers For Others

python - Optimising HDF5 dataset for Read/Write speed

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Optimising HDF5 dataset for Read/Write speed

I'm currently running an experiment where I scan a target spatially and grab an oscilloscope trace at each discrete pixel. Generally my trace lengths are 200Kpts. After scanning the entire target I assemble these time domain signals spatially and essentially play back a movie of what was scanned. My scan area is 330x220 pixels in size so the entire dataset is larger than RAM on the computer I have to use.

To start with I was just saving each oscilloscope trace as a numpy array and then after my scan completed downsampling/filtering etc and then piecing the movie together in a way that didn't run into memory problems. However, I'm now at a point where I cant downsample as aliasing will occur and thus need to access the raw data.

I've started looking into storing my large 3d data block in an HDF5 dataset using H5py. My main issue is with my chunk size allocation. My incoming data is orthogonal to the plane that i'd like to read it out in. My main options (to my knowledge) of writing my data are:

    #Fast write Slow read
    with h5py.File("test_h5py.hdf5","a") as f:
        dset = f.create_dataset("uncompchunk",(height,width,dataLen),chunks = (1,1,dataLen), dtype = 'f')
        for i in range(height):
            for j in range(width):
                dset[i,j,:] = np.random.random(200000)

or

    #Slow write Fast read
    with h5py.File("test_h5py.hdf5","a") as f:
        dset = f.create_dataset("uncompchunk",(height,width,dataLen),chunks = (height,width,1), dtype = 'f')
        for i in range(height):
            for j in range(width):
                dset[i,j,:] = np.random.random(200000)

Is there some way I can optimize the two cases so that neither is horribly inefficient to run?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:57:46+0000

If you want to optimise your I/O performance with chunking you should read these two articles from unidata:

chunking general

optimising for access pattern

And if you are only going for raw I/O performance consider @titusjan advice

Categories

python - Optimising HDF5 dataset for Read/Write speed

python - Optimising HDF5 dataset for Read/Write speed

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags