I'd like to interleave TFRecordDatasets of different sizes after having processed them indipendently.
In other words, I need the output to be equal to the concatenatation of all parts. While parts may be shuffled, I need to preserve the order within each part. For instance, if parts are of the form abc
, def
, ghi
, the ouput should be abcdefghi
(or any combination like abcghidef
that still preserves the internal order of each part).
I played a bit with the cycle_length
and block_length
options of the interleave API but I'm confused about how to use them with datasets of different sizes.
import tensorflow as tf
def _interleave_data(filename):
ds_part = tf.data.TFRecordDataset(filename).map(_parse_features)
ds_part = ds_part.map(_do_some_processing)
return ds_part
dataset = tf.data.Dataset.list_files(filenames)
ds = dataset.interleave(_interleave_data)
Is there any elegant way to do it? Thanks!
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…