Configuration#

Workloads are defined in a single YAML file with the following top-level keys:

# Variables available in all expressions.
globals:

# User-defined expression functions.
expressions:

# Static datasets available to ref_* functions without a database query.
reference:

# Schema creation queries.
up:

# Data population queries.
seed:

# Data cleanup queries.
deseed:

# Schema teardown queries.
down:

# Per-worker initialisation queries (run before workload).
init:

# Weighted transaction mix (optional).
run_weights:

# Workload queries.
run:

Globals#

The globals section defines top-level variables available in all expressions:

globals:
  warehouses: 1
  districts: 10
  customers: 30000
  items: 100000

These can be referenced directly in arg expressions, including in arithmetic:

args:
  - customers / districts   # evaluates to 3000
  - warehouses * 10         # evaluates to 10

Reference#

The reference section defines static datasets that are loaded into the environment at startup, making them available to ref_rand, ref_same, ref_perm, and ref_diff functions without needing an init query or database connection. Each key is a dataset name, and the value is a list of row objects:

reference:
  regions:
    - {name: us, cities: [a, b, c]}
    - {name: eu, cities: [d, e, f]}
    - {name: ap, cities: [g, h, i]}

Reference datasets work exactly like datasets populated by init queries. You can use them in any arg expression:

args:
  # Random region row, access the 'name' field.
  - ref_rand('regions').name

  # Same row reused across all ref_same calls in this query execution.
  - ref_same('regions').name
  - set_rand(ref_same('regions').cities, [])

This is useful when your lookup data is small and known ahead of time, avoiding the need for a database round-trip.

Sections#

Each section (up, seed, deseed, down, init, run) contains a list of named queries:

up:
  - name: create_users
    query: |-
      CREATE TABLE IF NOT EXISTS users (
        id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
        email STRING NOT NULL
      )

seed:
  - name: populate_users
    args:
      - gen_batch(1000, 100, 'email')
    query: |-
      INSERT INTO users (email)
      SELECT unnest(string_to_array('$1', ','))

up and down manage schema (CREATE/DROP).
seed and deseed manage data (INSERT/TRUNCATE).
init runs once per worker before the workload starts, typically to fetch reference data for use in run queries.
run contains the transactional workload queries executed in a loop.

Query Types#

Type	Description
`query` (default)	Executes the SQL and reads result rows. Results are stored in separate memory for each worker by query name, making them available to `ref_*` functions.
`exec`	Executes the SQL without reading results. Use for DDL, DML that returns no rows, or when results aren’t needed.
`query_batch`	Like `query`, but evaluates args repeatedly (controlled by `count` and `size`) and collects values into comma-separated strings per arg position. Each batch becomes a separate query execution whose results are stored.
`exec_batch`	Like `exec`, but evaluates args repeatedly (controlled by `count` and `size`) and collects values into comma-separated strings per arg position. Each batch becomes a separate exec.

Batch Fields#

The query_batch and exec_batch types use two additional fields to control how args are generated and grouped:

Field	Description
`count`	Total number of rows to generate. Evaluated as an expression, so it can reference globals.
`size`	Number of rows per batch. If omitted or zero, defaults to `count` (single batch). Also evaluated as an expression.

Each arg expression is evaluated once per row. The results are collected into comma-separated strings per arg position. For example, with count: 1000 and size: 100, you get 10 batches, each containing a comma-separated string of 100 generated values.

seed:
  - name: populate_users
    type: exec_batch
    count: customers          # expression: uses the 'customers' global
    size: batch_size          # expression: uses the 'batch_size' global
    args:
      - gen('email')
    query: |-
      INSERT INTO users (email)
      SELECT unnest(string_to_array('$1', ','))

Wait#

Queries can specify a wait duration (e.g. wait: 18s) to introduce a keying/think-time delay after execution. This only applies to queries in the run section and is ignored in other sections.

Placeholders#

Arg placeholders ($1, $2, etc.) are passed to the database in one of two ways: inlined or as bind params.

Inlining#

Inlining means edg performs a text replacement on the SQL string before sending it to the database. Every $N in the query is replaced with the literal arg value. For example, given:

args:
  - gen_batch(1000, 100, 'email')
query: |-
  INSERT INTO users (email)
  SELECT unnest(string_to_array('$1', ','))

If $1 evaluates to alice@x.com,bob@y.com,..., the SQL sent to the database becomes:

INSERT INTO users (email)
SELECT unnest(string_to_array('alice@x.com,bob@y.com,...', ','))

The database never sees $1, it receives a fully formed query with the values baked in. This is used for:

query_batch / exec_batch types (always inlined).
Batch-expanded queries using gen_batch, batch, or ref_each (in any section).

Inlining lets you use $N as a universal placeholder syntax across all drivers (pgx, MySQL, Oracle) without worrying about driver-specific bind param formats. It also avoids a pgx-stdlib issue where numeric values are sent as DECIMAL, which CockroachDB can’t mix with INT in arithmetic.

Because the value is embedded in the SQL text, quoted placeholders like '$1' are common in batch patterns, the quotes become part of the final SQL string (e.g. string_to_array('alice@x.com,...', ',')).

Bind params#

All other queries use native driver bind parameters. The placeholder stays in the SQL and the values are sent separately, allowing the database to cache query plans and avoid re-parsing.

Each driver has its own placeholder format:

Driver	Placeholder format
`pgx` (PostgreSQL / CockroachDB)	`$1`, `$2`, `$3`
`mysql`	`?` (positional)
`oracle`	`:1`, `:2`, `:3`

Since run queries always use bind params, their SQL must use the correct format for the target driver.

Column Name Normalisation#

When a query or query_batch result is stored, all column names are lowercased before being added to the environment. This means a SQL column W_ID becomes accessible as ref_rand('fetch_warehouses').w_id, not .W_ID.

Run Weights#

The optional run_weights map controls the transaction mix during workload execution. Each key is a query name from the run section, and the value is a relative weight. On each iteration, a single transaction is chosen by weighted random selection:

run_weights:
  new_order: 45
  payment: 43
  order_status: 4
  delivery: 4
  stock_level: 4

If run_weights is omitted, all run queries execute sequentially on each iteration.