Date: Jan 22, 2021 Version: 0.2.1
This is Synth! A fast and highly
configurable NoSQL synthetic data engine. It reconciles the two
worlds of synthetic data and test data by letting users generate
realistic synthetic data for testing their applications and ML models.
With Synth you can:
As simple as JSON-in/JSON-out. If you’re not happy with the result,
simply tweak the synthetic data model with a custom JSON metadata
format and Synth will adjust everything on the fly, no
additional ETL required.
For those times when you already have some data but just not enough
of it to do what you need to do. It can extrapolate from patterns
it finds in your data, so you can create as much of it as you want.
You can even add you own set of constraints and logic to create
completely unseen scenario.
It has two components:
synthd: a persistent process that ingests raw (usually
sensitive) training data and trains and builds synthetic data models
from it. Think of it as a NoSQL datastore that never persists actual
data, only anonymized model parameters.
synthpy: a reference Python implementation for the
synthd API. This lets you leverage
synthd in custom scripts and test harnesses.
Here is an end-to-end example using the Python client,
from synthpy import Synth
# Assuming `synthd` is running on `localhost` with default settings
client = Synth("localhost:8182")
with open("my_users_data.json", "r") as data_f:
documents = json.load(data_f)
# Submit your JSON documents to `synthd` for training
client.put_documents(namespace="app", collection="users", batch=documents)
# Generate 100 new synthetic users
synthetic_users = client.get_documents(namespace="app", collection="users", size=100)