Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
328 views
in Technique[技术] by (71.8m points)

python - Add 'constant' dimension to xarray Dataset

I have a series of monthly gridded datasets in CSV form. I want to read them, add a few dimensions, and then write to netcdf. I've had great experience using xarray (xray) in the past so thought I'd use if for this task.

I can easily get them into a 2D DataArray with something like:

data = np.ones((360,720))
lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng':lngs}
da = xr.DataArray(data, coords=coords)

But when I try to add another dimension, which would convey information about time (all data is from the same year/month), things start to go sour.

I've tried two ways to crack this:

1) expand my input data to m x n x 1, something like:

data = np.ones((360,720))
lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng':lngs}
data = data[:,:,np.newaxis]

Then I follow the same steps as above, with coords updated to contain a third dimension.

lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng':lngs}
coords['time'] = pd.datetime(year, month, day))
da = xr.DataArray(data, coords=coords)
da.to_dataset(name='variable_name')

This is fine for creating a DataArray -- but when I try to convert to a dataset (so I can write to netCDF), I get an error about 'ValueError: Coordinate objects must be 1-dimensional'

2) The second approach I've tried is taking my dataarray, casting it to a dataframe, setting the index to ['lat','lng', 'time'] and then going back to a dataset with xr.Dataset.from_dataframe(). I've tried this -- but it takes 20+ min before I kill the process.

Does anyone know how I can get a Dataset with a monthly 'time' dimension?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Your first example is pretty close:

lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng': lngs}
coords['time'] = [datetime.datetime(year, month, day)]
da = xr.DataArray(data, coords=coords, dims=['lat', 'lng', 'time'])
da.to_dataset(name='variable_name')

You'll notice a few changes in my version:

  1. I'm passing in a first for the 'time' coordinate instead of a scalar. You need to pass in a list or 1d array to get a 1D coordinate variable, which is what you need if you also use 'time' as a dimension. That's what the error ValueError: Coordinate objects must be 1-dimensional is trying to tell you (by the way -- if you have ideas for how to make that error message more helpful, I'm all ears!).
  2. I'm providing a dims argument to the DataArray constructor. Passing in a (non-ordered) dictionary is a little dangerous because the iteration order is not guaranteed.
  3. I also switched to datetime.datetime instead of pd.datetime. The later is simply an alias for the former.

Another sensible approach is to use concat with a list of one item once you've added 'time' as a scalar coordinate, e.g.,

lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng': lngs, 'time': datetime.datetime(year, month, day)}
da = xr.DataArray(data, coords=coords, dims=['lat', 'lng'])
expanded_da = xr.concat([da], 'time')

This version generalizes nicely to joining together data from a bunch of days -- you simply make the list of DataArrays longer. In my experience, most of the time the reason why you want the extra dimension in the first place is to be able to able to concat along it. Length 1 dimensions are not very useful otherwise.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...