string

Synth's string generator type regroups many different useful functionalities, from the ability to generate sentences from a regular expression to dates and times, through leveraging Faker providers to generate valid addresses, telephone numbers, etc.

pattern#

String values generated by a specified regular expression in the pattern key.

Example#

{
"type": "string",
"pattern": "I am a (man|woman) with [2-9] friends"
}

faker#

Synth integrates with the Python Faker library. To generate a string using Faker, use the "faker": {...} key/value pair. The value of the faker key must be an object with at least a generator key.

Example#

{
"type": "string",
"faker": {
"generator": "sentence"
}
}

Faker offers a variety of different fake data, in different locales and can be leveraged easily from Synth schemas using the generator key. Any of the Faker standard providers generating a string can be used in a Synth schema. The example given below generates fake credit card numbers using the credit_card_number provider.

Some Faker providers allow for parameters to customize the generated data. These parameters can be specified alongside the generator field as arbitrary key/value pairs. In the example below, we used "card_type" : "amex".

Locales can be specified using the optional locales key. Allowed values are the same as locales specified in Faker's locales documentation. In the example below, we use "locales": ["en_GB"]. Multiple locales can be specified and Synth will sample at random among them.

Example#

{
"type": "string",
"faker": {
"generator": "credit_card_number",
"card_type": "amex",
"locales": ["en_GB"]
}
}

date_time#

A date_time is a variant of the string generator type that generates values from a specified range of date_time values. This lets you, for example, generate valid days of the year for an updated_at column or a valid RFC 2822 timestamp for an email header field, among others.

You can specify a string as a date_time by using the "date_time": {...} key/value pair.

Example#

{
"type": "string",
"date_time": {
"format": "%Y-%m-%d",
"subtype": "naive_date",
"begin": "2020-01-01",
"end": "2025-01-01"
}
}

Accepted values for the "date_time" key are objects with the following keys:

  • "format": a strftime-style parameter specifying the string formatting of the underlying date_time value.
  • "subtype": one of the following
    • "naive_date": when the underlying date_time value should be a simple date without timezone specification,
    • "naive_time": when the underlying date_time value should be a simple time without timezone specification,
    • "naive_date_time": when the underlying date_time value should be a combined date and time without timezone specification,
    • "date_time": when the underlying date_time value should be a combined date and time with timezone specification.
  • "begin" and "end": the lower and upper bounds of the date_time value to generate. The formatting of these values must adhere to the strftime-string specified in the "format" field.

Example#

{
"type": "string",
"date_time": {
"format": "%Y-%m-%dT%H:%M:%S",
"subtype": "naive_date_time",
"begin": "2015-01-01T00:00:00",
"end": "2020-01-01T12:00:00"
}
}

serialized#

serialized is a variant of the string generator type which serializes the output of a child generator into a string.

serialized has 2 fields,

  • serializer: The serializer to be used (currently only json)
  • content: The content to be serialized. This can be any valid Synth generator

Example#

{
"type": "string",
"serialized": {
"serializer": "json",
"content": {
"type": "object",
"username": {
"type": "string",
"faker": {
"generator": "name"
}
}
}
}
}

categorical#

A categorical is a variant of the string generator type that generates values from a finite set of user-defined values. You can specify a string as a categorical by using the "categorical": {...} key/value pair.

Example#

{
"type": "string",
"categorical": {
"pawn": 8,
"rook": 2,
"knight": 2,
"bishop": 2,
"queen": 1,
"king": 1
}
}

The value of the "categorical" key must be an object whose:

  • keys are the allowed values of the categorical (e.g. "pawn", "rook", etc.),
  • values are non-negative integers defining the relative weight of the corresponding variant (e.g. 8, 2, etc.).