Everything you need to write and submit a pax. Cite well, structure rigorously, version honestly.
Hand these to Claude, ChatGPT, or Gemini and you can author or use a pax without any local tooling.
Most authors don't write a pax by hand. Any frontier language model — Claude, ChatGPT, Gemini, Llama — can draft a well-formed pax when given the creation guide as context. You don't need a runtime, a CLI, or a local install.
The agent needs two things in its context window: the creation guide (schema rules, field definitions, type constraints) and your source material. The rest is generation. Review the anatomy section below if you want to understand what the agent produced.
A pax — Portable Analytic eXpertise — is a structured, machine-readable knowledge package that captures the accumulated empirical knowledge of a research domain. It is not a paper, a database, or a model weight file. It is a folder of flat YAML and JSON that any agent, any script, or any human can read without tooling.
The central insight is portability. Frontier language models know surface-level facts about a lot of domains. They do not know which claims are well-identified, which findings have replicated, or what the field's canonical vocabulary even is. A pax answers all three questions: here is the vocabulary, here are the findings, here is who said what and how they measured it.
Paxes are evidence-backed. Every finding must point to a source entry with a resolvable DOI (or a declared caveat when the literature does not provide one). The linter enforces referential integrity at publish time — a finding cannot cite a source that does not exist in sources.json.
Pax is composable. An agent running a playbook can import multiple pax and resolve constructs across them using the shared construct registry. The registry detects near-duplicate ids and warns authors before collision enters the shared namespace.
Every pax is a directory. The directory name is the pax's kebab-case id. Inside it you will always find a manifest, and usually find the five knowledge files plus a playbooks subdirectory. Files are flat JSON — no proprietary database, no encoding tricks, no binary blobs.
my-pack-name/
pax.yaml # manifest (required)
knowledge/
domain.json # domain definition (required)
constructs.json # construct vocabulary (required)
findings.json # empirical claims (required if sourced)
sources.json # paper/report metadata (required if findings)
propositions.json # theoretical claims (optional)
construct_relationships.json # causal/correlational links (optional)
playbooks/
quick_start.yaml # analysis workflow (optional)The simplest valid pax has a manifest, a domain file, and a constructs file. Add findings, sources, and a playbook when you have the evidence. The linter will tell you what is missing for each pax type at validation time.
| type | scope | typical scale |
|---|---|---|
| paper | single research paper | 5–15 C · 5–25 F · 1 S |
| topic | research topic, multiple papers | 15–30 C · 30–80 F · 10–30 S |
| field | entire research field | 30–60 C · 80–300 F · 30–80 S |
| enterprise | business or industry domain | 15–40 C · 30–80 F |
| engine | analytical method bundle | 0 C · 0 F · 4–15 B |
Constructs are the vocabulary of a pax. Each construct entry defines one concept or variable: what it is, how it is measured, what other names it goes by. Construct ids must be snake_case ASCII, no leading underscore, at most 48 characters. The linter checks the shared registry for near-duplicate ids and warns on collision before publish.
[
{
"id": "gdp_per_capita",
"display_name": "GDP Per Capita",
"construct_type": "quantifiable",
"definition": "Gross domestic product divided by total population, measured in constant 2015 USD PPP.",
"measurement_level": "ratio",
"aliases": [
{ "alias": "GDP/capita", "alias_type": "abbreviation" },
{ "alias": "per capita income", "alias_type": "synonym" }
]
},
{
"id": "life_expectancy",
"display_name": "Life Expectancy at Birth",
"construct_type": "outcome",
"definition": "Average number of years a newborn is expected to live under current mortality conditions.",
"measurement_level": "ratio",
"aliases": [
{ "alias": "LE", "alias_type": "abbreviation" }
]
}
]Required fields: id, display_name, construct_type, definition.
| type | use when |
|---|---|
| quantifiable | measured on a numeric scale (ratio or interval) |
| concept | abstract, not directly measurable |
| process | a mechanism or causal pathway |
| composite | an index or aggregated score |
| outcome | a dependent variable in the field's literature |
Findings encode what the literature actually found. Each entry is a single, falsifiable empirical claim tied to specific constructs, a specific source, and a specific confidence level. The finding_text field is a full sentence in plain English. Do not start with "The study found that" — state the relationship directly.
[
{
"source_id": "author_year_2024",
"domain_id": "my_domain",
"finding_text": "GDP per capita is positively associated with life expectancy (beta=0.42, SE=0.08, p<0.001, N=1200).",
"construct_ids": ["gdp_per_capita", "life_expectancy"],
"direction": "positive",
"confidence": "strong",
"method_used": "OLS regression with country fixed effects",
"effect_size_value": 0.42,
"effect_size_se": 0.08,
"effect_size_type": "beta",
"p_value": 0.001,
"sample_size": 1200,
"model_specification": "OLS with country and year FE, clustered SE",
"covariates_controlled": ["population", "trade_openness"]
}
]| field | type | required | note |
|---|---|---|---|
| finding_text | string | yes | full sentence, plain English |
| direction | enum | yes | positive / negative / null / mixed |
| confidence | enum | yes | weak / moderate / strong |
| effect_size_value | number | no | beta, OR, Cohen's d, etc. |
| effect_size_se | number | no | standard error of the estimate |
| p_value | number | no | set null if not reported — never guess |
| sample_size | integer | no | N for the reported estimate |
| design | confidence ceiling |
|---|---|
| rct | strong |
| observational_longitudinal | moderate |
| observational_cross_sectional | weak–moderate |
| quasi_experimental | moderate–strong |
| meta_analysis | strong (with caveats) |
| simulation | weak (unless calibrated) |
Every finding must have a source. Sources live in sources.json and are referenced by id from findings. The id convention is author_year — for multi-author papers use first-author only (e.g., fearon_laitin_2003). For reports or grey literature, use org_year (e.g., worldbank_2022).
[
{
"id": "author_year_2024",
"source_type": "academic_paper",
"title": "The Full Paper Title Goes Here",
"authors": "Last, First; Last2, First2",
"year": 2024,
"doi": "10.1234/example.doi",
"methodology_summary": "Panel regression with country/year FE, N=1200",
"sample_size": 1200,
"study_design": "observational_longitudinal",
"open_access": true
}
]Required: id, title, authors, year. Strongly encouraged: doi, study_design. The linter resolves DOIs against Crossref and flags unresolvable ones as warnings. Open-access papers must set open_access: true and must be linkable at publish time. Paywalled papers must declare open_access: false — do not leave it null.
Grey literature (reports, working papers, blog posts) is permitted when it is the best available evidence, but each such entry must include a methodology_summary explaining why it was included and what its limitations are.
Propositions are higher-order theoretical statements about how constructs relate. Unlike findings, they are not tied to a single study; they represent the accumulated theoretical consensus (or debate) in the field. Propositions link to the findings that support them via an evidence array.
[
{
"id": "P001",
"text": "In multi-tier networks, betweenness centrality of a node predicts its contribution to cascading failure size, conditional on tier depth.",
"kind": "theoretical",
"supports": ["cascading_failure_size"],
"evidence": ["F003", "F011", "F042"]
},
{
"id": "P002",
"text": "Geographic concentration of refining (not extraction) is the dominant driver of strategic-mineral supply risk.",
"kind": "empirical",
"evidence": ["F007", "F022"]
}
]| kind | use when |
|---|---|
| theoretical | the claim is a formal or informal theory not yet fully tested |
| empirical | the claim is directly supported by one or more findings |
| methodological | the claim is about how to measure or analyze the constructs |
A playbook is a reproducible analysis workflow defined in YAML. It specifies inputs, which engines to use, and the order of steps. Each step references an engine by id and passes typed arguments. Engines are separate pax of type "engine" — the playbook does not bundle the code, it declares the dependency.
name: cascading-failure
version: 1.0
description: |
Compute centrality, identify critical nodes,
test percolation under random and targeted failure.
inputs:
- id: network
schema: edgelist
- id: id_col
type: string
default: "node_id"
engines:
- network_centrality
- percolation
steps:
- id: 01_centrality
engine: network_centrality
args:
measures: [betweenness, eigenvector, harmonic]
out: centrality_table
- id: 02_percolation
engine: percolation
args:
mode: random
thresholds: [0.05, 0.10, 0.20]
out: failure_curve
- id: 03_targeted
engine: percolation
args:
mode: targeted
order_by: betweenness
out: critical_nodes
- id: 04_report
engine: yaml_report
in: [centrality_table, failure_curve, critical_nodes]
out: results.yaml
on_error: log_and_continue
output:
format: yaml
attach_findings: [F003, F011]Playbook ids follow snake_case. The quick_start.yaml name is a convention — the registry surfaces it as the primary playbook on the pax's detail page. A pax of type "engine" carries no constructs or findings, only playbooks and the engine definition itself.
Validation runs server-side when you submit. There is no local CLI yet — the registry's CI pipeline checks every structural constraint on upload. Errors block publication; warnings are surfaced for author review but do not block.
| check | blocking |
|---|---|
| JSON schema validity for all files | yes |
| Required fields present on all entities | yes |
| Construct id uniqueness within pax and registry | yes |
| Finding source_id resolves to sources.json entry | yes |
| Finding construct_ids resolve to constructs.json | yes |
| Proposition evidence ids resolve to findings | yes |
| DOI resolution against Crossref | warning |
| Confidence vs study_design consistency | warning |
| SHA-256 stability of pax contents | warning on change |
If you author through an LLM with the creation guide loaded, the agent can self-check most of these constraints before you submit. The registry will re-verify on upload.
The registry is open — anyone can author and submit. Review is structured but not gatekeeping. The goal is to ensure the linter passes and to catch obvious errors (miscitations, sign flips, impossible statistics) that automation misses. Reviewers do not re-run your analyses.
Estimated time from PR open to publish: 48–72h for well-formed pax. Schema errors will extend this; fix them before opening the PR.
Pax uses semantic versioning. The version in pax.yaml is the pax version — it is distinct from schema_version, which tracks the pax format specification. Bump the pax version according to the rules below whenever you publish a change.
| bump | when |
|---|---|
| major | removing or renaming a construct id; removing a published finding |
| minor | adding new constructs or findings; splitting a construct (with alias for old id) |
| patch | typo fixes, source-id corrections, additional aliases, metadata updates |
Deprecations must live in pax.yaml for at least two minor versions before an id is removed. The deprecated field accepts an array of objects with id, deprecated_in, and reason. Any downstream pax depending on a deprecated id will surface a warning at their next registry submission.
deprecated:
- id: old_construct_name
deprecated_in: "1.3.0"
reason: "Renamed to new_construct_name for registry alignment."
replaced_by: new_construct_namePublish a pax through the web uploader. Validation runs against the same registry contract whether you submit by hand today or through the agentic path that's coming later.
zip -r my-pax.zip my-pax-name/Once accepted, your pax appears at pax-market.com/pax/my-pax-name immediately after deploy.
An MCP-server runtime will let agents author and publish from inside any MCP-aware client. The submission contract is the same as the web path, so anything you author today will work unchanged when the agentic path ships.