// guide/

Authoring guide

Everything you need to write and submit a pax. Cite well, structure rigorously, version honestly.

Hand these to Claude, ChatGPT, or Gemini and you can author or use a pax without any local tooling.

[ download ] creation guide.md [ download ] usage guide.md

// 00.agent-authored

Author with an agent

recommended path

Most authors don't write a pax by hand. Any frontier language model — Claude, ChatGPT, Gemini, Llama — can draft a well-formed pax when given the creation guide as context. You don't need a runtime, a CLI, or a local install.

Download the creation guide (button below) and the source paper, dataset, or report you want to package.
Open your agent of choice and paste both — the creation guide first, then the source material.
Ask the agent to draft a pax that conforms to the schema. The model writes the JSON files, the YAML manifest, and the playbooks.
Inspect the folder — the structure is human-readable flat files. Fix anything that looks off before you submit.
Submit at submit.pax-market.com for automated lint and peer review.

The agent needs two things in its context window: the creation guide (schema rules, field definitions, type constraints) and your source material. The rest is generation. Review the anatomy section below if you want to understand what the agent produced.

[ download ] creation guide.md [ download ] usage guide.md

// 01.what-is-a-pax

What is a pax

A pax — Portable Analytic eXpertise — is a structured, machine-readable knowledge package that captures the accumulated empirical knowledge of a research domain. It is not a paper, a database, or a model weight file. It is a folder of flat YAML and JSON that any agent, any script, or any human can read without tooling.

The central insight is portability. Frontier language models know surface-level facts about a lot of domains. They do not know which claims are well-identified, which findings have replicated, or what the field's canonical vocabulary even is. A pax answers all three questions: here is the vocabulary, here are the findings, here is who said what and how they measured it.

Paxes are evidence-backed. Every finding must point to a source entry with a resolvable DOI (or a declared caveat when the literature does not provide one). The linter enforces referential integrity at publish time — a finding cannot cite a source that does not exist in sources.json.

Pax is composable. An agent running a playbook can import multiple pax and resolve constructs across them using the shared construct registry. The registry detects near-duplicate ids and warns authors before collision enters the shared namespace.

// 02.anatomy

Anatomy of a pax

Every pax is a directory. The directory name is the pax's kebab-case id. Inside it you will always find a manifest, and usually find the five knowledge files plus a playbooks subdirectory. Files are flat JSON — no proprietary database, no encoding tricks, no binary blobs.

text

my-pack-name/
  pax.yaml                       # manifest (required)
  knowledge/
    domain.json                  # domain definition (required)
    constructs.json              # construct vocabulary (required)
    findings.json                # empirical claims (required if sourced)
    sources.json                 # paper/report metadata (required if findings)
    propositions.json            # theoretical claims (optional)
    construct_relationships.json # causal/correlational links (optional)
  playbooks/
    quick_start.yaml             # analysis workflow (optional)

The simplest valid pax has a manifest, a domain file, and a constructs file. Add findings, sources, and a playbook when you have the evidence. The linter will tell you what is missing for each pax type at validation time.

pax types

type	scope	typical scale
paper	single research paper	5–15 C · 5–25 F · 1 S
topic	research topic, multiple papers	15–30 C · 30–80 F · 10–30 S
field	entire research field	30–60 C · 80–300 F · 30–80 S
enterprise	business or industry domain	15–40 C · 30–80 F
engine	analytical method bundle	0 C · 0 F · 4–15 B

// 03.constructs

Authoring constructs

Constructs are the vocabulary of a pax. Each construct entry defines one concept or variable: what it is, how it is measured, what other names it goes by. Construct ids must be snake_case ASCII, no leading underscore, at most 48 characters. The linter checks the shared registry for near-duplicate ids and warns on collision before publish.

json · knowledge/constructs.json

[
  {
    "id": "gdp_per_capita",
    "display_name": "GDP Per Capita",
    "construct_type": "quantifiable",
    "definition": "Gross domestic product divided by total population, measured in constant 2015 USD PPP.",
    "measurement_level": "ratio",
    "aliases": [
      { "alias": "GDP/capita",       "alias_type": "abbreviation" },
      { "alias": "per capita income", "alias_type": "synonym" }
    ]
  },
  {
    "id": "life_expectancy",
    "display_name": "Life Expectancy at Birth",
    "construct_type": "outcome",
    "definition": "Average number of years a newborn is expected to live under current mortality conditions.",
    "measurement_level": "ratio",
    "aliases": [
      { "alias": "LE", "alias_type": "abbreviation" }
    ]
  }
]

Required fields: id, display_name, construct_type, definition.

construct_type values

type	use when
quantifiable	measured on a numeric scale (ratio or interval)
concept	abstract, not directly measurable
process	a mechanism or causal pathway
composite	an index or aggregated score
outcome	a dependent variable in the field's literature

// 04.findings

Authoring findings

One entry per empirical claim. Never aggregate two claims into one finding.

Findings encode what the literature actually found. Each entry is a single, falsifiable empirical claim tied to specific constructs, a specific source, and a specific confidence level. The finding_text field is a full sentence in plain English. Do not start with "The study found that" — state the relationship directly.

json · knowledge/findings.json

[
  {
    "source_id":           "author_year_2024",
    "domain_id":           "my_domain",
    "finding_text":        "GDP per capita is positively associated with life expectancy (beta=0.42, SE=0.08, p<0.001, N=1200).",
    "construct_ids":       ["gdp_per_capita", "life_expectancy"],
    "direction":           "positive",
    "confidence":          "strong",
    "method_used":         "OLS regression with country fixed effects",
    "effect_size_value":   0.42,
    "effect_size_se":      0.08,
    "effect_size_type":    "beta",
    "p_value":             0.001,
    "sample_size":         1200,
    "model_specification": "OLS with country and year FE, clustered SE",
    "covariates_controlled": ["population", "trade_openness"]
  }
]

stat fields

field	type	required	note
finding_text	string	yes	full sentence, plain English
direction	enum	yes	positive / negative / null / mixed
confidence	enum	yes	weak / moderate / strong
effect_size_value	number	no	beta, OR, Cohen's d, etc.
effect_size_se	number	no	standard error of the estimate
p_value	number	no	set null if not reported — never guess
sample_size	integer	no	N for the reported estimate

study design values

design	confidence ceiling
rct	strong
observational_longitudinal	moderate
observational_cross_sectional	weak–moderate
quasi_experimental	moderate–strong
meta_analysis	strong (with caveats)
simulation	weak (unless calibrated)

// 05.sources

Sources & citations

Every finding must have a source. Sources live in sources.json and are referenced by id from findings. The id convention is author_year — for multi-author papers use first-author only (e.g., fearon_laitin_2003). For reports or grey literature, use org_year (e.g., worldbank_2022).

json · knowledge/sources.json

[
  {
    "id":                   "author_year_2024",
    "source_type":          "academic_paper",
    "title":               "The Full Paper Title Goes Here",
    "authors":             "Last, First; Last2, First2",
    "year":                2024,
    "doi":                 "10.1234/example.doi",
    "methodology_summary": "Panel regression with country/year FE, N=1200",
    "sample_size":         1200,
    "study_design":        "observational_longitudinal",
    "open_access":         true
  }
]

Required: id, title, authors, year. Strongly encouraged: doi, study_design. The linter resolves DOIs against Crossref and flags unresolvable ones as warnings. Open-access papers must set open_access: true and must be linkable at publish time. Paywalled papers must declare open_access: false — do not leave it null.

Grey literature (reports, working papers, blog posts) is permitted when it is the best available evidence, but each such entry must include a methodology_summary explaining why it was included and what its limitations are.

// 06.propositions

Propositions

Theoretical claims linking constructs — the field's reusable rules of thumb.

Propositions are higher-order theoretical statements about how constructs relate. Unlike findings, they are not tied to a single study; they represent the accumulated theoretical consensus (or debate) in the field. Propositions link to the findings that support them via an evidence array.

json · knowledge/propositions.json

[
  {
    "id":       "P001",
    "text":     "In multi-tier networks, betweenness centrality of a node predicts its contribution to cascading failure size, conditional on tier depth.",
    "kind":     "theoretical",
    "supports": ["cascading_failure_size"],
    "evidence": ["F003", "F011", "F042"]
  },
  {
    "id":       "P002",
    "text":     "Geographic concentration of refining (not extraction) is the dominant driver of strategic-mineral supply risk.",
    "kind":     "empirical",
    "evidence": ["F007", "F022"]
  }
]

proposition kinds

kind	use when
theoretical	the claim is a formal or informal theory not yet fully tested
empirical	the claim is directly supported by one or more findings
methodological	the claim is about how to measure or analyze the constructs

// 07.playbooks

Playbooks · YAML DSL

Step-by-step recipes the agent runs end-to-end.

A playbook is a reproducible analysis workflow defined in YAML. It specifies inputs, which engines to use, and the order of steps. Each step references an engine by id and passes typed arguments. Engines are separate pax of type "engine" — the playbook does not bundle the code, it declares the dependency.

yaml · playbooks/quick_start.yaml

name: cascading-failure
version: 1.0
description: |
  Compute centrality, identify critical nodes,
  test percolation under random and targeted failure.
inputs:
  - id: network
    schema: edgelist
  - id: id_col
    type: string
    default: "node_id"
engines:
  - network_centrality
  - percolation
steps:
  - id: 01_centrality
    engine: network_centrality
    args:
      measures: [betweenness, eigenvector, harmonic]
    out: centrality_table
  - id: 02_percolation
    engine: percolation
    args:
      mode: random
      thresholds: [0.05, 0.10, 0.20]
    out: failure_curve
  - id: 03_targeted
    engine: percolation
    args:
      mode: targeted
      order_by: betweenness
    out: critical_nodes
  - id: 04_report
    engine: yaml_report
    in: [centrality_table, failure_curve, critical_nodes]
    out: results.yaml
on_error: log_and_continue
output:
  format: yaml
  attach_findings: [F003, F011]

Playbook ids follow snake_case. The quick_start.yaml name is a convention — the registry surfaces it as the primary playbook on the pax's detail page. A pax of type "engine" carries no constructs or findings, only playbooks and the engine definition itself.

// 08.validation

Validation

Validation runs server-side when you submit. There is no local CLI yet — the registry's CI pipeline checks every structural constraint on upload. Errors block publication; warnings are surfaced for author review but do not block.

what the registry checks

check	blocking
JSON schema validity for all files	yes
Required fields present on all entities	yes
Construct id uniqueness within pax and registry	yes
Finding source_id resolves to sources.json entry	yes
Finding construct_ids resolve to constructs.json	yes
Proposition evidence ids resolve to findings	yes
DOI resolution against Crossref	warning
Confidence vs study_design consistency	warning
SHA-256 stability of pax contents	warning on change

If you author through an LLM with the creation guide loaded, the agent can self-check most of these constraints before you submit. The registry will re-verify on upload.

// 09.peer-review

Peer review

Light, machine-checked, 48–72h turnaround.

The registry is open — anyone can author and submit. Review is structured but not gatekeeping. The goal is to ensure the linter passes and to catch obvious errors (miscitations, sign flips, impossible statistics) that automation misses. Reviewers do not re-run your analyses.

Schema lint. Automated, runs on PR open. Required keys, type checks, source resolution, construct id collisions. Must pass before human review begins.
DOI verification. DOIs resolved against Crossref. Open-access papers must be linkable; paywalled papers must declare so. Non-resolvable DOIs are flagged for author response.
Editorial pass. One human reviewer skims for obvious miscitations, sign flips, or unsupported confidence labels. They do not re-do your peer review — they check you are representing the literature faithfully.
Trust score. Each pax receives a public trust score (0–5) based on author credentials, source quality, replication evidence, and schema completeness. Visible on the marketplace detail page.

Estimated time from PR open to publish: 48–72h for well-formed pax. Schema errors will extend this; fix them before opening the PR.

// 10.versioning

Versioning & deprecations

Pax uses semantic versioning. The version in pax.yaml is the pax version — it is distinct from schema_version, which tracks the pax format specification. Bump the pax version according to the rules below whenever you publish a change.

bump	when
major	removing or renaming a construct id; removing a published finding
minor	adding new constructs or findings; splitting a construct (with alias for old id)
patch	typo fixes, source-id corrections, additional aliases, metadata updates

Deprecations must live in pax.yaml for at least two minor versions before an id is removed. The deprecated field accepts an array of objects with id, deprecated_in, and reason. Any downstream pax depending on a deprecated id will surface a warning at their next registry submission.

yaml · pax.yaml — deprecation block

deprecated:
  - id: old_construct_name
    deprecated_in: "1.3.0"
    reason: "Renamed to new_construct_name for registry alignment."
    replaced_by: new_construct_name

// 11.publishing

Publishing

Publish a pax through the web uploader. Validation runs against the same registry contract whether you submit by hand today or through the agentic path that's coming later.

web upload

Zip your pax directory: zip -r my-pax.zip my-pax-name/
Upload at submit.pax-market.com.
Validation runs automatically. Errors appear inline; fix and re-upload.
After editorial review, your pax is merged and live within minutes.

Once accepted, your pax appears at pax-market.com/pax/my-pax-name immediately after deploy.

agentic publishing — coming soon

An MCP-server runtime will let agents author and publish from inside any MCP-aware client. The submission contract is the same as the web path, so anything you author today will work unchanged when the agentic path ships.