Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
575 views
in Technique[技术] by (71.8m points)

dask - Writing a netCDF file is extremely slow

I am trying to perform a fairly simplistic operation on a dataset involving editing of variable and global attributes on individual netcdf files of 3.5GB each. The files load instantly using xr.open_dataset but dataset.to_netcdf() is too slow to export after the modifications. I have tried :

  1. Without rechunking and dask invocations.
  2. Varying chunk sizes followed by :
  3. Using load() before to_netcdf
  4. Using persist() or compute () before to_netcdf

I am working on a HPC with 10 distributed workers . In all cases, the time taken is more than 15 minutes per file. Is it expected? What else can I try to speed up this process apart from further parallelizing the single file operations using dask delayed?

question from:https://stackoverflow.com/questions/66061903/writing-a-netcdf-file-is-extremely-slow

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

First a quick note:

The files load instantly using xr.open_dataset

You probably did not actually load the data at this point, only the metadata. Depending on your IO and compression/encoding, it might take considerable CPU and memory to load your data. You should have an idea of how much time you think it ought to take with a single CPU thread.

To answer our question: netCDF (HDF5) does not play nicely with parallel writing. You will likely find that only one task is writing at a time because of locking, or even that the output data is all going to a single task before writing, regardless of your chunking. Please check your dask dashboard!

May I recommend that you try the zarr format, which works well for parallel applications, because each chunk is in a different file. You still need to make decisions on the correct chunking of your data (example).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...