python - Interleave TFRecordDatasets of different lengths - OGeek|极客中国-技术改变生活,极客改变未来

I'd like to interleave TFRecordDatasets of different sizes after having processed them indipendently.

In other words, I need the output to be equal to the concatenatation of all parts. While parts may be shuffled, I need to preserve the order within each part. For instance, if parts are of the form abc, def, ghi, the ouput should be abcdefghi (or any combination like abcghidef that still preserves the internal order of each part).

I played a bit with the cycle_length and block_length options of the interleave API but I'm confused about how to use them with datasets of different sizes.

import tensorflow as tf

def _interleave_data(filename):
    ds_part = tf.data.TFRecordDataset(filename).map(_parse_features)
    ds_part = ds_part.map(_do_some_processing) 
    return ds_part

dataset = tf.data.Dataset.list_files(filenames)
ds = dataset.interleave(_interleave_data)

Is there any elegant way to do it? Thanks!

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

python - Interleave TFRecordDatasets of different lengths

python - Interleave TFRecordDatasets of different lengths

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags