Configuration Reference
SqlSynthGen is configured using a YAML file, which is passed to several commands with the --config-file option.
Throughout the docs, we will refer to this file as config.yaml but it can be called anything (the exception being that there will be a naming conflict if you have a vocabulary table called config).
Below, we see the schema for the configuration file. Note that our config file format includes a section of SmartNoise SQL metadata, which is explained more fully here.
SQLSynthGen Config
Type: objectA SQLSynthGen configuration YAML file
No Additional PropertiesRun source-statistics queries using asyncpg.
The name of a local Python module of row generators (excluding .py).
The name of a local Python module of story generators (excluding .py).
An array of source statistics queries.
Each item of this array must be:
No Additional Properties
A name for the query, which will be used in the stats file.
A SQL query.
A SmartNoise SQL query.
The differential privacy epsilon value for the DP query.
The differential privacy delta value for the DP query.
See https://docs.smartnoise.org/sql/metadata.html#yaml-format.
All properties whose name matches the following regular expression must respect the following conditions
Property name regular expression:^(?!(max_ids|row_privacy|sample_max_ids|censor_dims|clamp_counts|clamp_columns|use_dpsu)).*$ Type: object
No Additional Properties
An array of story generators.
Each item of this array must be:
No Additional Properties
The full name of a story generator (e.g. mystorygenerators.short_story).
Positional arguments to pass to the story generator.
Keyword arguments to pass to the story generator.
The number of times to call the story generator per pass.
The maximum number of tries to respect a uniqueness constraint.
Table configurations.
All properties whose name matches the following regular expression must respect the following conditions
Property name regular expression:.* Type: object
A table configuration.
No Additional PropertiesWhether to completely ignore this table.
Whether to export the table data.
The number of rows to generate per pass.
An array of row generators to create column values.
Each item of this array must be:
The name of a (built-in or custom) function (e.g. max or myrowgenerators.my_gen).
Positional arguments to pass to the function.
Keyword arguments to pass to the function.
One or more columns to assign the return value to.