Skip to main content

Quick Start

Each data generation order takes up to four input files in YAML format:

  1. Schema File: [required] It defines the structure of the data to be generated.
  2. Plan File: [required] It defines target data details, such as record number.
  3. Output Control File: [optional] It defines any modifications for data presentation.
  4. Data CSV File: [optional] It provides additional data to be used in the generation process.

For example, we can use this schema file and plan file to generate data, structured as illustrated below post sample Based on the chosen output format type, each entity defined in the schema file will be generated as a separate file. For example, if the output format is set to JSON, as illustrated, three files will be generated: json output file

  • users_0.json
  • follows_1.json
  • posts_2.json

The number trailing the entity name identifies the order in which the data should be applied to the target system, such as MySQL.

Schema File

The schema file is a YAML file that defines the structure of the data to be generated. It contains the following fields, both of which are required:

  1. name: The name of the schema, which serves as an ID to pair with the plan file.
  2. entities: A set of entity definitions. At least one entity is required. Entities with the same name are considered duplicates, and this will cause the data generation to fail.

An entity is the fundamental unit of data generation. The data for each entity will be stored in a distinct output file. An entity consists of a list of properties, each of which includes the following mandatory fields:

  1. name: This represents the name of the property.
  2. type: This denotes the data type of the property.
Sample schema comprise 'users' entity with single 'id' property
name: "quickstart"
entities:
- name: "users"
properties:
- name: "id"
type: "sequence"

Property Data Type

When defining a property under entity, the following data types are supported.

TypeDescription
sequenceAn auto incremental integer number
textA text value
numberA decimal number value, such as 100, -54 or 99.99
dateA date value with day-level accuracy
datetimeA date time value with second level accuracy
timeA time value without date with second level accuracy
info

Please refer to Property Data Type page for detailed usage.

Index, Constraints and Reference

To support complex data structures, users can define indices, constraints, and references in the schema file. This provides the capability to generate data with more intricate relationships.

  • Index: This is used to establish the uniqueness of a property or a combination of properties.
  • Constraints: This is used to set the constraints for a property, such as the minimum and maximum values.
  • Reference: This is used to delineate the relationship between entities.
Property 'id' as unique index
    properties:
- name: "id"
type: "sequence"
index:
id: 0

Plan File

The plan file, defining the data generation strategy, instructs the generator on using the targeted schema, including record count, additional constraints, and potential value distributions. It contains four primary elements:

  1. Name: [required] This is the name of the data plan and can be any text.
  2. Dialect: [optional] This refers to the targeted dialect. Supported values are "general", "mysql" and "salesforce". If not specified, the default value is "general".
  3. Schema: [required] This refers to the targeted schema. Its value must match the name defined in the schema file.
  4. Data: [required] This element comprises a list of entity data definitions, each detailing the instructions for generating the data for that particular entity. For each entry, a ‘count’ is required, which specifies the number of records to generate.
Sample plan file with 'quickstart' schema and 10 records for 'users' entity
name: "quickstart"
dialect: "general"
schema: "quickstart"
data:
- entity: "users"
count: 10

Property Specification

For individual properties in entity data, users can optionally define extra specifications, such as constraints, distributions, and default values.

Sample property specification
    # For the ‘status’ property, assign the value ‘DELETED’ with a 10% probability,
# and use ‘ACTIVE’ as the default value.
properties:
- name: "status"
values:
- values:
- "DELETED"
weight: 0.1
defaultValue: "ACTIVE"

Output Control File

Though optional, the output control file enables auxiliary features to be applied to the generated data, particularly when Unicode or special characters are used in the user’s data model. The output control file provides the user with two capabilities:

  1. Name Replacement: This allows the user to modify the names of entities or properties in the output file.
  2. Property Hiding: This provides the user with the flexibility to hide certain properties in the output file.
Sample Name Replacement
    # Replace the entity name ‘businessAccount’ with ‘account’
# Replace the property name ‘extId’ with ‘external_id’
name: "name_replacement"
entities:
- name: "businessAccount"
alias: "account"
properties:
- name: "extId"
alias: "external_id"
tip

Entity and property names in the schema and plan files should contain only alphanumeric characters, if possible. This can help avoid potential issues when generating data. Utilize the output control file to modify the names as needed.

Data CSV File

A csv file used to inject customer's data into the data generation. Please refer to Apply Your Own Data page for detailed instruction.