Data Model API

The Synth data model is a composable extension of the JSON data model. It consists of the following familiar types:

  • Object: corresponds to a JSON object, i.e. a key/value store with string keys.

  • Array: correponds to a JSON array. They can be dynamically sized and a single array can host elements of many different types.

  • String: corresponds to a JSON string.

  • Bool: corresponds to a JSON boolean (true/false).

  • Number: corresponds to a JSON number. Numbers can be of any of three primitive types:

    • u64: a 64 bit unsigned integer,

    • i64: a 64 bit signed integer and

    • f64: a 64 bit floating point number.

Together with the above, Synth introduces additional logical types to specify custom logic binding the synthetic data model.

  • OneOf: an enumeration or union type. OneOf’s regroup fixed collections of types. When sampling from those types, one of the variant is picked at random and is used to produce the generated value of the OneOf. The probability of picking the different variants is inferred from training on real data, but can also be overridden.

  • SameAs: a reference or pointer type. SameAs’s are pointers to other types. They allow for specifying that certain fields or elements must be the same at different places in any sample.


Model type for a JSON object

Array([content, length])

Model type for a JSON array


Model type for a JSON string


Model type for a JSON boolean

Number(variant[, subtype])

Model type for a JSON number


Model type for an enumeration of models


Model type for a reference

Faker([generator, locales])

A subtype for string models that generate values from a faker provider

DateTime([format, subtype, begin, end])

A subtype for string models that generate values from a date/time/datetime range