$ ⌘K
// guide/

Authoring guide

Everything you need to write and submit a pax. Cite well, structure rigorously, version honestly.

Hand these to Claude, ChatGPT, or Gemini and you can author or use a pax without any local tooling.

// 01.what-is-a-pax
What is a pax

A pax — Portable Analytic eXpertise — is a structured, machine-readable knowledge package that captures the accumulated empirical knowledge of a research domain. It is not a paper, a database, or a model weight file. It is a folder of flat YAML and JSON that any agent, any script, or any human can read without tooling.

The central insight is portability. Frontier language models know surface-level facts about a lot of domains. They do not know which claims are well-identified, which findings have replicated, or what the field's canonical vocabulary even is. A pax answers all three questions: here is the vocabulary, here are the findings, here is who said what and how they measured it.

Paxes are evidence-backed. Every finding must point to a source entry with a resolvable DOI (or a declared caveat when the literature does not provide one). The linter enforces referential integrity at publish time — a finding cannot cite a source that does not exist in sources.json.

Pax is composable. An agent running a playbook can import multiple pax and resolve constructs across them using the shared construct registry. The registry detects near-duplicate ids and warns authors before collision enters the shared namespace.

// 02.anatomy
Anatomy of a pax

Every pax is a directory. The directory name is the pax's kebab-case id. Inside it you will always find a manifest, and usually find the five knowledge files plus a playbooks subdirectory. Files are flat JSON — no proprietary database, no encoding tricks, no binary blobs.

text
my-pack-name/
  pax.yaml                       # manifest (required)
  knowledge/
    domain.json                  # domain definition (required)
    constructs.json              # construct vocabulary (required)
    findings.json                # empirical claims (required if sourced)
    sources.json                 # paper/report metadata (required if findings)
    propositions.json            # theoretical claims (optional)
    construct_relationships.json # causal/correlational links (optional)
  playbooks/
    quick_start.yaml             # analysis workflow (optional)

The simplest valid pax has a manifest, a domain file, and a constructs file. Add findings, sources, and a playbook when you have the evidence. The linter will tell you what is missing for each pax type at validation time.

pax types
typescopetypical scale
papersingle research paper5–15 C · 5–25 F · 1 S
topicresearch topic, multiple papers15–30 C · 30–80 F · 10–30 S
fieldentire research field30–60 C · 80–300 F · 30–80 S
enterprisebusiness or industry domain15–40 C · 30–80 F
engineanalytical method bundle0 C · 0 F · 4–15 B
// 03.constructs
Authoring constructs

Constructs are the vocabulary of a pax. Each construct entry defines one concept or variable: what it is, how it is measured, what other names it goes by. Construct ids must be snake_case ASCII, no leading underscore, at most 48 characters. The linter checks the shared registry for near-duplicate ids and warns on collision before publish.

json · knowledge/constructs.json
[
  {
    "id": "gdp_per_capita",
    "display_name": "GDP Per Capita",
    "construct_type": "quantifiable",
    "definition": "Gross domestic product divided by total population, measured in constant 2015 USD PPP.",
    "measurement_level": "ratio",
    "aliases": [
      { "alias": "GDP/capita",       "alias_type": "abbreviation" },
      { "alias": "per capita income", "alias_type": "synonym" }
    ]
  },
  {
    "id": "life_expectancy",
    "display_name": "Life Expectancy at Birth",
    "construct_type": "outcome",
    "definition": "Average number of years a newborn is expected to live under current mortality conditions.",
    "measurement_level": "ratio",
    "aliases": [
      { "alias": "LE", "alias_type": "abbreviation" }
    ]
  }
]

Required fields: id, display_name, construct_type, definition.

construct_type values
typeuse when
quantifiablemeasured on a numeric scale (ratio or interval)
conceptabstract, not directly measurable
processa mechanism or causal pathway
compositean index or aggregated score
outcomea dependent variable in the field's literature
// 04.findings
Authoring findings
One entry per empirical claim. Never aggregate two claims into one finding.

Findings encode what the literature actually found. Each entry is a single, falsifiable empirical claim tied to specific constructs, a specific source, and a specific confidence level. The finding_text field is a full sentence in plain English. Do not start with "The study found that" — state the relationship directly.

json · knowledge/findings.json
[
  {
    "source_id":           "author_year_2024",
    "domain_id":           "my_domain",
    "finding_text":        "GDP per capita is positively associated with life expectancy (beta=0.42, SE=0.08, p<0.001, N=1200).",
    "construct_ids":       ["gdp_per_capita", "life_expectancy"],
    "direction":           "positive",
    "confidence":          "strong",
    "method_used":         "OLS regression with country fixed effects",
    "effect_size_value":   0.42,
    "effect_size_se":      0.08,
    "effect_size_type":    "beta",
    "p_value":             0.001,
    "sample_size":         1200,
    "model_specification": "OLS with country and year FE, clustered SE",
    "covariates_controlled": ["population", "trade_openness"]
  }
]
stat fields
fieldtyperequirednote
finding_textstringyesfull sentence, plain English
directionenumyespositive / negative / null / mixed
confidenceenumyesweak / moderate / strong
effect_size_valuenumbernobeta, OR, Cohen's d, etc.
effect_size_senumbernostandard error of the estimate
p_valuenumbernoset null if not reported — never guess
sample_sizeintegernoN for the reported estimate
study design values
designconfidence ceiling
rctstrong
observational_longitudinalmoderate
observational_cross_sectionalweak–moderate
quasi_experimentalmoderate–strong
meta_analysisstrong (with caveats)
simulationweak (unless calibrated)
// 05.sources
Sources & citations

Every finding must have a source. Sources live in sources.json and are referenced by id from findings. The id convention is author_year — for multi-author papers use first-author only (e.g., fearon_laitin_2003). For reports or grey literature, use org_year (e.g., worldbank_2022).

json · knowledge/sources.json
[
  {
    "id":                   "author_year_2024",
    "source_type":          "academic_paper",
    "title":               "The Full Paper Title Goes Here",
    "authors":             "Last, First; Last2, First2",
    "year":                2024,
    "doi":                 "10.1234/example.doi",
    "methodology_summary": "Panel regression with country/year FE, N=1200",
    "sample_size":         1200,
    "study_design":        "observational_longitudinal",
    "open_access":         true
  }
]

Required: id, title, authors, year. Strongly encouraged: doi, study_design. The linter resolves DOIs against Crossref and flags unresolvable ones as warnings. Open-access papers must set open_access: true and must be linkable at publish time. Paywalled papers must declare open_access: false — do not leave it null.

Grey literature (reports, working papers, blog posts) is permitted when it is the best available evidence, but each such entry must include a methodology_summary explaining why it was included and what its limitations are.

// 06.propositions
Propositions
Theoretical claims linking constructs — the field's reusable rules of thumb.

Propositions are higher-order theoretical statements about how constructs relate. Unlike findings, they are not tied to a single study; they represent the accumulated theoretical consensus (or debate) in the field. Propositions link to the findings that support them via an evidence array.

json · knowledge/propositions.json
[
  {
    "id":       "P001",
    "text":     "In multi-tier networks, betweenness centrality of a node predicts its contribution to cascading failure size, conditional on tier depth.",
    "kind":     "theoretical",
    "supports": ["cascading_failure_size"],
    "evidence": ["F003", "F011", "F042"]
  },
  {
    "id":       "P002",
    "text":     "Geographic concentration of refining (not extraction) is the dominant driver of strategic-mineral supply risk.",
    "kind":     "empirical",
    "evidence": ["F007", "F022"]
  }
]
proposition kinds
kinduse when
theoreticalthe claim is a formal or informal theory not yet fully tested
empiricalthe claim is directly supported by one or more findings
methodologicalthe claim is about how to measure or analyze the constructs
// 07.playbooks
Playbooks · YAML DSL
Step-by-step recipes the agent runs end-to-end.

A playbook is a reproducible analysis workflow defined in YAML. It specifies inputs, which engines to use, and the order of steps. Each step references an engine by id and passes typed arguments. Engines are separate pax of type "engine" — the playbook does not bundle the code, it declares the dependency.

yaml · playbooks/quick_start.yaml
name: cascading-failure
version: 1.0
description: |
  Compute centrality, identify critical nodes,
  test percolation under random and targeted failure.
inputs:
  - id: network
    schema: edgelist
  - id: id_col
    type: string
    default: "node_id"
engines:
  - network_centrality
  - percolation
steps:
  - id: 01_centrality
    engine: network_centrality
    args:
      measures: [betweenness, eigenvector, harmonic]
    out: centrality_table
  - id: 02_percolation
    engine: percolation
    args:
      mode: random
      thresholds: [0.05, 0.10, 0.20]
    out: failure_curve
  - id: 03_targeted
    engine: percolation
    args:
      mode: targeted
      order_by: betweenness
    out: critical_nodes
  - id: 04_report
    engine: yaml_report
    in: [centrality_table, failure_curve, critical_nodes]
    out: results.yaml
on_error: log_and_continue
output:
  format: yaml
  attach_findings: [F003, F011]

Playbook ids follow snake_case. The quick_start.yaml name is a convention — the registry surfaces it as the primary playbook on the pax's detail page. A pax of type "engine" carries no constructs or findings, only playbooks and the engine definition itself.

// 08.validation
Validation

Validation runs server-side when you submit. There is no local CLI yet — the registry's CI pipeline checks every structural constraint on upload. Errors block publication; warnings are surfaced for author review but do not block.

what the registry checks
checkblocking
JSON schema validity for all filesyes
Required fields present on all entitiesyes
Construct id uniqueness within pax and registryyes
Finding source_id resolves to sources.json entryyes
Finding construct_ids resolve to constructs.jsonyes
Proposition evidence ids resolve to findingsyes
DOI resolution against Crossrefwarning
Confidence vs study_design consistencywarning
SHA-256 stability of pax contentswarning on change

If you author through an LLM with the creation guide loaded, the agent can self-check most of these constraints before you submit. The registry will re-verify on upload.

// 09.peer-review
Peer review
Light, machine-checked, 48–72h turnaround.

The registry is open — anyone can author and submit. Review is structured but not gatekeeping. The goal is to ensure the linter passes and to catch obvious errors (miscitations, sign flips, impossible statistics) that automation misses. Reviewers do not re-run your analyses.

  1. Schema lint. Automated, runs on PR open. Required keys, type checks, source resolution, construct id collisions. Must pass before human review begins.
  2. DOI verification. DOIs resolved against Crossref. Open-access papers must be linkable; paywalled papers must declare so. Non-resolvable DOIs are flagged for author response.
  3. Editorial pass. One human reviewer skims for obvious miscitations, sign flips, or unsupported confidence labels. They do not re-do your peer review — they check you are representing the literature faithfully.
  4. Trust score. Each pax receives a public trust score (0–5) based on author credentials, source quality, replication evidence, and schema completeness. Visible on the marketplace detail page.

Estimated time from PR open to publish: 48–72h for well-formed pax. Schema errors will extend this; fix them before opening the PR.

// 10.versioning
Versioning & deprecations

Pax uses semantic versioning. The version in pax.yaml is the pax version — it is distinct from schema_version, which tracks the pax format specification. Bump the pax version according to the rules below whenever you publish a change.

bumpwhen
majorremoving or renaming a construct id; removing a published finding
minoradding new constructs or findings; splitting a construct (with alias for old id)
patchtypo fixes, source-id corrections, additional aliases, metadata updates

Deprecations must live in pax.yaml for at least two minor versions before an id is removed. The deprecated field accepts an array of objects with id, deprecated_in, and reason. Any downstream pax depending on a deprecated id will surface a warning at their next registry submission.

yaml · pax.yaml — deprecation block
deprecated:
  - id: old_construct_name
    deprecated_in: "1.3.0"
    reason: "Renamed to new_construct_name for registry alignment."
    replaced_by: new_construct_name
// 11.publishing
Publishing

Publish a pax through the web uploader. Validation runs against the same registry contract whether you submit by hand today or through the agentic path that's coming later.

web upload
  1. Zip your pax directory: zip -r my-pax.zip my-pax-name/
  2. Upload at submit.pax-market.com.
  3. Validation runs automatically. Errors appear inline; fix and re-upload.
  4. After editorial review, your pax is merged and live within minutes.

Once accepted, your pax appears at pax-market.com/pax/my-pax-name immediately after deploy.

agentic publishing — coming soon

An MCP-server runtime will let agents author and publish from inside any MCP-aware client. The submission contract is the same as the web path, so anything you author today will work unchanged when the agentic path ships.