tf.data.experimental.make_batched_features_dataset

Returns a Dataset of feature dictionaries from Example protos.

Used in the notebooks

Used in the tutorials

If label_key argument is provided, returns a Dataset of tuple comprising of feature dictionaries and label.

Example:

serialized_examples = [
  features {
    feature { key: "age" value { int64_list { value: [ 0 ] } } }
    feature { key: "gender" value { bytes_list { value: [ "f" ] } } }
    feature { key: "kws" value { bytes_list { value: [ "code", "art" ] } } }
  },
  features {
    feature { key: "age" value { int64_list { value: [] } } }
    feature { key: "gender" value { bytes_list { value: [ "f" ] } } }
    feature { key: "kws" value { bytes_list { value: [ "sports" ] } } }
  }
]

We can use arguments:

features: {
  "age": FixedLenFeature([], dtype=tf.int64, default_value=-1),
  "gender": FixedLenFeature([], dtype=tf.string),
  "kws": VarLenFeature(dtype=tf.string),
}

And the expected output is:

{
  "age": [[0], [-1]],
  "gender": [["f"], ["f"]],
  "kws": SparseTensor(
    indices=[[0, 0], [0, 1], [1, 0]],
    values=["code", "art", "sports"]
    dense_shape=[2, 2]),
}

file_patternList of files or patterns of file paths containing Example records. See tf.io.gfile.glob for pattern rules.
batch_sizeAn int representing the number of records to combine in a single batch.
featuresA dict mapping feature keys to FixedLenFeature or VarLenFeature values. See tf.io.parse_example.
readerA function or class that can be called with a filenames tensor and (optional) reader_args and returns a Dataset of Example tensors. Defaults to tf.data.TFRecordDataset.
label_key(Optional) A string corresponding to the key labels are stored in tf.Examples. If provided, it must be one of the features key, otherwise results in ValueError.
reader_argsAdditional arguments to pass to the reader class.
num_epochsInteger specifying the number of times to read through the dataset. If None, cycles through the dataset forever. Defaults to None.
shuffleA boolean, indicates whether the input should be shuffled. Defaults to True.
shuffle_buffer_sizeBuffer size of the ShuffleDataset. A large capacity ensures better shuffling but would increase memory usage and startup time.
shuffle_seedRandomization seed to use for shuffling.
prefetch_buffer_sizeNumber of feature batches to prefetch in order to improve performance. Recommended value is the number of batches consumed per training step. Defaults to auto-tune.
reader_num_threadsNumber of threads used to read Example records. If >1, the results will be interleaved. Defaults to 1.
parser_num_threadsNumber of threads to use for parsing Example tensors into a dictionary of Feature tensors. Defaults to 2.
sloppy_orderingIf True, reading performance will be improved at the cost of non-deterministic ordering. If False, the order of elements produced is deterministic prior to shuffling (elements are still randomized if shuffle=True. Note that if the seed is set, then order of elements after shuffling is deterministic). Defaults to False.
drop_final_batchIf True, and the batch size does not evenly divide the input dataset size, the final smaller batch will be dropped. Defaults to False.

A dataset of dict elements, (or a tuple of dict elements and label). Each dict maps feature keys to Tensor or SparseTensor objects.

TypeErrorIf reader is of the wrong type.
ValueErrorIf label_key is not one of the features keys.