Quick Start

After Installation, we can run sqlsynthgen to see the available commands:

$ sqlsynthgen
Usage: sqlsynthgen [OPTIONS] COMMAND [ARGS]...

Options:
  --help                          Show this message and exit.

Commands:
  create-data      Populate schema with synthetic data.
  create-tables    Create schema from Python classes.
  create-vocab     Create tables using the SQLAlchemy file.
  make-generators  Make a SQLSynthGen file of generator classes.
  make-stats       Compute summary statistics from the source database.
  make-tables      Make a SQLAlchemy file of Table classes.
  remove-data      Truncate all non-vocabulary tables in the dst schema.
  remove-tables    Drop all tables in the dst schema.
  remove-vocab     Truncate all vocabulary tables in the dst schema.
  validate-config  Validate the format of a config file.

For the simplest case, we will need make-tables, make-generators, create-tables and create-data but, first, we need to set environment variables to tell sqlsynthgen how to access our source database (where the real data resides now) and destination database (where the synthetic data will go). We can do that in the terminal with the export keyword, as shown below, or in a file called .env. The source and destination may be on the same database server, as long as the database or schema names differ. If the source and destination schemas are the default schema for the user on that database, you should not set those variables. If you are using a DBMS that does not support schemas (e.g. MariaDB), you must not set those variables.

$ export SRC_DSN="postgresql://someuser:somepassword@myserver.mydomain.com"
$ export SRC_SCHEMA='myschema'

$ export DST_DSN="postgresql://someuser:somepassword@myserver.mydomain.com/dst_db"
$ export DST_SCHEMA='myschema'

Next, we make a SQLAlchemy file that defines the structure of your database using the make-tables command:

$ sqlsynthgen make-tables

This will have created a file called orm.py in the current directory, with a SQLAlchemy class for each of your tables.

The next step is to make a sqlsynthgen file that defines one data generator per table in the source database:

$ sqlsynthgen make-generators

This will have created a file called ssg.py in the current directory.

We can use the create-table command to read the orm.py file, create our destination schema (if it doesn’t already exist), and to create empty copies of all the tables that are in the source database.

$ sqlsynthgen create-tables

Now that we have created the schema that will hold synthetic data, we can use the create-data command to read orm.py & ssg.py and generate data:

$ sqlsynthgen create-data

By default, create-data will have inserted one row per table and will have used the column data types to decide how to randomly generate data. To create more data each time we call create-data, we can provide the num-passes argument:

$ sqlsynthgen create-data --num-passes=10

We will have inserted 11 rows per table, with the last two commands.