$ ⌘K

Reproduce BeigeSage from Scratch (Train + Evaluate)

step_count: 8· runtime: ~6 hours on Colab T4 (39 min MLM pre-training + 5h14m fine-tuning + 1.6 min inference)

End-to-end reproduction of the BeigeSage model: pull the base RoBERTa weights from HuggingFace, MLM-pretrain on the 1970-2023 Beige Book corpus, supervise- fine-tune on the 800-chunk labelled training set, and evaluate on the 200-chunk held-out test set. Recommended environment: Google Colab with a T4 or better GPU (matches authors' setup).

// pipeline
8 steps· DAG
01

Pull replication data from OSF

action download
config (2 keys)
{
  "targets": [
    "./data/beige_book_corpus_1970_2023.txt",
    "./data/labelled_chunks_1000.csv"
  ],
  "url": "https://osf.io/xq35t/"
}
02

Tokenize Beige Book corpus into 256-token chunks

action prepare_corpusdownload_replication_archive
config (4 keys)
{
  "max_length": 256,
  "output": "./data/bb_chunks_256.arrow",
  "stride": 0,
  "tokenizer": "FacebookAI/roberta-large"
}
03

MLM continued pre-training on Beige Book corpus

action trainprepare_corpus
config (6 keys)
{
  "base_model": "FacebookAI/roberta-large",
  "expected_runtime_minutes": 39,
  "mlm_probability": 0.15,
  "objective": "masked_language_modeling",
  "training_args": {
    "learning_rate": "5e-5",
    "num_train_epochs": 1,
    "output_dir": "./models/beigesage-mlm",
    "per_device_train_batch_size": 8,
    "save_strategy": "epoch"
  },
  "training_corpus": "./data/bb_chunks_256.arrow"
}
04

Bin scores and produce 80/20 train/test split

action prepare_labelled_splitdownload_replication_archive
config (4 keys)
{
  "input": "./data/labelled_chunks_1000.csv",
  "outputs": {
    "test": "./data/test_200.csv",
    "train": "./data/train_800.csv"
  },
  "score_to_label_rule": {
    "mixed": "-0.2 \u003c= score \u003c= 0.2",
    "negative": "score \u003c -0.2",
    "positive": "score \u003e 0.2"
  },
  "train_test_split": {
    "random_seed": 42,
    "stratify_by": "label",
    "test_n": 200,
    "train_n": 800
  }
}
05

Supervised fine-tuning on 800 labelled chunks

config (7 keys)
{
  "base_model": "./models/beigesage-mlm",
  "expected_runtime_minutes": 314,
  "label_map": {
    "mixed": 1,
    "negative": 0,
    "positive": 2
  },
  "num_labels": 3,
  "task": "sequence_classification",
  "training_args": {
    "evaluation_strategy": "epoch",
    "learning_rate": "2e-5",
    "load_best_model_at_end": true,
    "metric_for_best_model": "f1_macro",
    "num_train_epochs": 3,
    "output_dir": "./models/beigesage",
    "per_device_train_batch_size": 4,
    "save_strategy": "epoch",
    "weight_decay": 0.01
  },
  "training_data": "./data/train_800.csv"
}
06

Classify 200-chunk test set

action classifysupervised_fine_tuning
config (5 keys)
{
  "expected_runtime_seconds": 95,
  "input": "./data/test_200.csv",
  "max_length": 256,
  "model": "./models/beigesage",
  "output": "./outputs/beigesage_predictions.csv"
}
07

Compute accuracy, macro F1, MCC and confusion matrix

engine classification_metricsevaluate_test_set
config (3 keys)
{
  "gold_field": "human_label",
  "metrics": [
    "accuracy",
    "macro_f1",
    "mcc",
    "confusion_matrix",
    "per_class_precision_recall_f1"
  ],
  "pred_field": "beigesage_prediction"
}
expected results (4 keys)
{
  "accuracy": 0.71,
  "macro_f1": 0.71,
  "mcc": 0.55,
  "per_class_recall": {
    "mixed": 0.65,
    "negative": 0.64,
    "positive": 0.82
  }
}
08

(Optional) Push trained model to HuggingFace Hub

action push_to_hubreport_metrics
config (3 keys)
{
  "include": [
    "model weights",
    "tokenizer",
    "label_map",
    "training README"
  ],
  "private": false,
  "repo_id": "\u003cyour-username\u003e/beigesage"
}
// from pax
// engines
engine.classification_metrics
// note
step bodies extracted from the .pax archive at build time. download the parent pax for the full yaml.
[ download smith-lambert-2026-beigesage.pax.tar.gz ]