Configuration#
Workloads are defined in a single YAML file with the following top-level keys:
# Variables available in all expressions.
globals:
# User-defined expression functions.
expressions:
# Static datasets available to ref_* functions without a database query.
reference:
# Schema creation queries.
up:
# Data population queries.
seed:
# Data cleanup queries.
deseed:
# Schema teardown queries.
down:
# Per-worker initialisation queries (run before workload).
init:
# Weighted transaction mix (optional).
run_weights:
# Workload queries.
run:Globals#
The globals section defines top-level variables available in all expressions:
globals:
warehouses: 1
districts: 10
customers: 30000
items: 100000These can be referenced directly in arg expressions, including in arithmetic:
args:
- customers / districts # evaluates to 3000
- warehouses * 10 # evaluates to 10Reference#
The reference section defines static datasets that are loaded into the environment at startup, making them available to ref_rand, ref_same, ref_perm, and ref_diff functions without needing an init query or database connection. Each key is a dataset name, and the value is a list of row objects:
reference:
regions:
- {name: us, cities: [a, b, c]}
- {name: eu, cities: [d, e, f]}
- {name: ap, cities: [g, h, i]}Reference datasets work exactly like datasets populated by init queries. You can use them in any arg expression:
args:
# Random region row, access the 'name' field.
- ref_rand('regions').name
# Same row reused across all ref_same calls in this query execution.
- ref_same('regions').name
- set_rand(ref_same('regions').cities, [])This is useful when your lookup data is small and known ahead of time, avoiding the need for a database round-trip.
Sections#
Each section (up, seed, deseed, down, init, run) contains a list of named queries:
up:
- name: create_users
query: |-
CREATE TABLE IF NOT EXISTS users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email STRING NOT NULL
)
seed:
- name: populate_users
args:
- gen_batch(1000, 100, 'email')
query: |-
INSERT INTO users (email)
SELECT unnest(string_to_array('$1', ','))upanddownmanage schema (CREATE/DROP).seedanddeseedmanage data (INSERT/TRUNCATE).initruns once per worker before the workload starts, typically to fetch reference data for use inrunqueries.runcontains the transactional workload queries executed in a loop.
Query Types#
| Type | Description |
|---|---|
query (default) | Executes the SQL and reads result rows. Results are stored in separate memory for each worker by query name, making them available to ref_* functions. |
exec | Executes the SQL without reading results. Use for DDL, DML that returns no rows, or when results aren’t needed. |
query_batch | Like query, but evaluates args repeatedly (controlled by count and size) and collects values into comma-separated strings per arg position. Each batch becomes a separate query execution whose results are stored. |
exec_batch | Like exec, but evaluates args repeatedly (controlled by count and size) and collects values into comma-separated strings per arg position. Each batch becomes a separate exec. |
Batch Fields#
The query_batch and exec_batch types use two additional fields to control how args are generated and grouped:
| Field | Description |
|---|---|
count | Total number of rows to generate. Evaluated as an expression, so it can reference globals. |
size | Number of rows per batch. If omitted or zero, defaults to count (single batch). Also evaluated as an expression. |
Each arg expression is evaluated once per row. The results are collected into comma-separated strings per arg position. For example, with count: 1000 and size: 100, you get 10 batches, each containing a comma-separated string of 100 generated values.
seed:
- name: populate_users
type: exec_batch
count: customers # expression: uses the 'customers' global
size: batch_size # expression: uses the 'batch_size' global
args:
- gen('email')
query: |-
INSERT INTO users (email)
SELECT unnest(string_to_array('$1', ','))Wait#
Queries can specify a wait duration (e.g. wait: 18s) to introduce a keying/think-time delay after execution. This only applies to queries in the run section and is ignored in other sections.
Placeholders#
Arg placeholders ($1, $2, etc.) are passed to the database in one of two ways: inlined or as bind params.
Inlining#
Inlining means edg performs a text replacement on the SQL string before sending it to the database. Every $N in the query is replaced with the literal arg value. For example, given:
args:
- gen_batch(1000, 100, 'email')
query: |-
INSERT INTO users (email)
SELECT unnest(string_to_array('$1', ','))If $1 evaluates to alice@x.com,bob@y.com,..., the SQL sent to the database becomes:
INSERT INTO users (email)
SELECT unnest(string_to_array('alice@x.com,bob@y.com,...', ','))The database never sees $1, it receives a fully formed query with the values baked in. This is used for:
query_batch/exec_batchtypes (always inlined).- Batch-expanded queries using
gen_batch,batch, orref_each(in any section).
Inlining lets you use $N as a universal placeholder syntax across all drivers (pgx, MySQL, Oracle) without worrying about driver-specific bind param formats. It also avoids a pgx-stdlib issue where numeric values are sent as DECIMAL, which CockroachDB can’t mix with INT in arithmetic.
Because the value is embedded in the SQL text, quoted placeholders like '$1' are common in batch patterns, the quotes become part of the final SQL string (e.g. string_to_array('alice@x.com,...', ',')).
Bind params#
All other queries use native driver bind parameters. The placeholder stays in the SQL and the values are sent separately, allowing the database to cache query plans and avoid re-parsing.
Each driver has its own placeholder format:
| Driver | Placeholder format |
|---|---|
pgx (PostgreSQL / CockroachDB) | $1, $2, $3 |
mysql | ? (positional) |
oracle | :1, :2, :3 |
Since run queries always use bind params, their SQL must use the correct format for the target driver.
Column Name Normalisation#
When a query or query_batch result is stored, all column names are lowercased before being added to the environment. This means a SQL column W_ID becomes accessible as ref_rand('fetch_warehouses').w_id, not .W_ID.
Run Weights#
The optional run_weights map controls the transaction mix during workload execution. Each key is a query name from the run section, and the value is a relative weight. On each iteration, a single transaction is chosen by weighted random selection:
run_weights:
new_order: 45
payment: 43
order_status: 4
delivery: 4
stock_level: 4If run_weights is omitted, all run queries execute sequentially on each iteration.