Post-Hoc Testing Reference

Overview 

PolyzyMD performs post-hoc pairwise comparisons automatically during polyzymd compare run. Two methods are available: BH-corrected t-tests (default) and Tukey’s HSD. Both methods compute Cohen’s d effect sizes and percent-change for every pair.

Available Methods 

Method	`posthoc_method` value	When to use	Assumptions
BH-corrected t-tests	`"ttest_bh"`	Specific pairs of interest or heterogeneous sample sizes. Default.	Independence between pairs; equal variance (Student) or relaxed (Welch).
Tukey’s HSD	`"tukey_hsd"`	All conditions are equally important; balanced design preferred.	Equal variance; equal (or similar) sample sizes across conditions.

Configuration 

Set post-hoc options in the defaults: block of comparison.yaml:

defaults:
  posthoc_method: "ttest_bh"   # or "tukey_hsd"
  ttest_method: "student"      # or "welch" (only used when posthoc_method is ttest_bh)
  fdr_alpha: 0.05              # significance threshold (used by both ttest_bh and tukey_hsd)

Field	Type	Default	Description
`posthoc_method`	`"ttest_bh"` or `"tukey_hsd"`	`"ttest_bh"`	Selects which post-hoc procedure to use for pairwise comparisons.
`ttest_method`	`"student"` or `"welch"`	`"student"`	Controls the variance assumption for the two-sample t-test. Only used when `posthoc_method` is `"ttest_bh"`.
`fdr_alpha`	float (0, 1]	`0.05`	Significance threshold for pairwise comparisons and ANOVA. Used as the BH false-discovery-rate threshold when `posthoc_method` is `"ttest_bh"`, and as the family-wise alpha threshold when `posthoc_method` is `"tukey_hsd"`.

Method Details 

BH-Corrected t-Tests (`ttest_bh`)

Runs independent two-sample t-tests for each pair of conditions.
ttest_method: "student" assumes equal variances (scipy.stats.ttest_ind with equal_var=True).
ttest_method: "welch" relaxes that assumption (equal_var=False).
Raw p-values from all pairs across all metrics are collected into a single family, then adjusted via the Benjamini-Hochberg step-up procedure.
A pair is significant when p_adj <= fdr_alpha.
Cohen’s d is computed for effect size, with interpretation labels: "negligible" (|d| < 0.2), "small" (0.2 <= |d| < 0.5), "medium" (0.5 <= |d| < 0.8), "large" (|d| >= 0.8).

Note

When a control label is set in comparison.yaml, only control-vs-treatment pairs are tested. Otherwise, all unique pairs are tested.

Tukey’s HSD (`tukey_hsd`)

Tests all pairs simultaneously using scipy.stats.tukey_hsd.
Controls the family-wise error rate (FWER) rather than FDR.
A pair is significant when the Tukey-adjusted p_value <= fdr_alpha.
p_value_adjusted mirrors p_value for Tukey results (Tukey p-values are already family-wise corrected).
Best with balanced designs (equal replicates per condition).
Cohen’s d is still computed for each pair.
The t_statistic field is set to NaN for Tukey results since the test does not produce a t-statistic.

ANOVA 

A one-way ANOVA (scipy.stats.f_oneway) is always run alongside post-hoc tests when there are 3 or more conditions. It tests whether at least one condition’s mean differs from the others.

Field	Type	Description
`f_statistic`	float	F-statistic from the one-way ANOVA.
`p_value`	float	P-value for the omnibus test.
`significant`	bool	Whether `p_value <= fdr_alpha` (uses the configured `fdr_alpha` threshold).

ANOVA does not determine which pairs differ – that is the role of post-hoc tests. ANOVA is skipped when fewer than 3 conditions are present, and returns NaN statistics if any group has fewer than 2 observations.

Output Fields 

These fields appear in comparison JSON files and are used by the CLI formatter.

Pairwise result fields 

Field	Type	Description
`condition_a`	str	Label of first condition (typically control).
`condition_b`	str	Label of second condition (typically treatment).
`metric`	str	Name of the metric being compared.
`t_statistic`	float	T-test statistic. `NaN` for Tukey HSD results.
`p_value`	float	Raw p-value from the pairwise test.
`p_value_adjusted`	float or null	Adjusted p-value (BH-adjusted for `ttest_bh`; mirrors `p_value` for `tukey_hsd` since Tukey p-values are already family-wise corrected). `null` when not available.
`posthoc_method`	str	`"ttest_bh"` or `"tukey_hsd"`.
`cohens_d`	float	Effect size (positive = `condition_a` mean > `condition_b` mean).
`effect_size_interpretation`	str	`"negligible"`, `"small"`, `"medium"`, or `"large"`.
`direction`	str	Interpreted direction of change (e.g. `"higher"`, `"lower"`, or `"undetermined"`).
`significant`	bool	Whether `p_adj <= alpha` (uses adjusted p-value when available, raw otherwise).
`percent_change`	float	Percent change from `condition_a` to `condition_b`.

Comparison-level fields 

Field	Type	Description
`fdr_alpha`	float	The alpha threshold used for significance (BH FDR for `ttest_bh`; FWER for `tukey_hsd`). Also used as the ANOVA significance threshold.
`ttest_method`	str	`"student"` or `"welch"`.
`posthoc_method`	str	`"ttest_bh"` or `"tukey_hsd"`.

CLI Significance Markers 

The CLI formatter annotates pairwise rows with significance markers:

Marker	Meaning
`*`	`p_adj <= fdr_alpha` (default 0.05)
`**`	`p_adj <= 0.01`
`***`	`p_adj <= 0.001`

Note

The contacts formatter uses multi-level markers (**, ***). The default scalar formatter uses a single * for any significant result.

Some plugins also use:

Marker	Meaning
`†` (dagger)	Cohen’s d meets the `min_effect_size` threshold (practical significance). Currently used by the contacts plugin.

Edge Cases 

Scenario	Behavior
Fewer than 2 replicates in a group	t-test returns `NaN` for t-statistic and p-value; Cohen’s d returns `NaN`.
Equal values across all replicates in both groups	p-value = 1.0, Cohen’s d = 0.0 (`"negligible"`).
Single condition	No pairwise tests are generated; ANOVA is skipped.
Two conditions	Pairwise tests run normally; ANOVA is skipped (requires >= 3 conditions).
Tukey HSD with fewer than 2 groups or fewer than 2 observations per group	Returns empty results (no pairs generated).
Zero control mean	Percent change returns `inf` or `-inf`; `NaN` if both means are non-finite.