Configuration Reference

SqlSynthGen is configured using a YAML file, which is passed to several commands with the --config-file option. Throughout the docs, we will refer to this file as config.yaml but it can be called anything (the exception being that there will be a naming conflict if you have a vocabulary table called config).

Below, we see the schema for the configuration file. Note that our config file format includes a section of SmartNoise SQL metadata, which is explained more fully here.

SQLSynthGen Config

SQLSynthGen Config

Type: object

A SQLSynthGen configuration YAML file

No Additional Properties

Type: boolean

Run source-statistics queries using asyncpg.

Type: string

The name of a local Python module of row generators (excluding .py).

Type: string

The name of a local Python module of story generators (excluding .py).

Type: array

An array of source statistics queries.

Each item of this array must be:

Type: object
No Additional Properties

Type: string

A name for the query, which will be used in the stats file.

Type: string

A SQL query.

Type: string

A SmartNoise SQL query.

Type: number

The differential privacy epsilon value for the DP query.

Type: number

The differential privacy delta value for the DP query.

Type: object

See https://docs.smartnoise.org/sql/metadata.html#yaml-format.

All properties whose name matches the following regular expression must respect the following conditions

Property name regular expression: ^(?!(max_ids|row_privacy|sample_max_ids|censor_dims|clamp_counts|clamp_columns|use_dpsu)).*$
Type: object
No Additional Properties

Type: array of object

An array of story generators.

Each item of this array must be:

Type: object
No Additional Properties

Type: string

The full name of a story generator (e.g. mystorygenerators.short_story).

Type: array

Positional arguments to pass to the story generator.

Type: object

Keyword arguments to pass to the story generator.

Type: integer

The number of times to call the story generator per pass.

Type: integer

The maximum number of tries to respect a uniqueness constraint.

Type: object

Table configurations.

All properties whose name matches the following regular expression must respect the following conditions

Property name regular expression: .*
Type: object

A table configuration.

No Additional Properties

Type: boolean

Whether to completely ignore this table.

Type: boolean

Whether to export the table data.

Type: integer

The number of rows to generate per pass.

Type: array of object

An array of row generators to create column values.

Each item of this array must be:

Type: object

Type: string

The name of a (built-in or custom) function (e.g. max or myrowgenerators.my_gen).

Type: array

Positional arguments to pass to the function.

Type: object

Keyword arguments to pass to the function.

Type: array of string or string

One or more columns to assign the return value to.

Each item of this array must be: