Macro-averaged F1 score across the three sentiment classes, weighting each class equally regardless of base-rate prevalence.