Post-Hoc Testing Reference

Overview

PolyzyMD performs post-hoc pairwise comparisons automatically during polyzymd compare run. Two methods are available: BH-corrected t-tests (default) and Tukey’s HSD. Both methods compute Cohen’s d effect sizes and percent-change for every pair.


Available Methods

Method

posthoc_method value

When to use

Assumptions

BH-corrected t-tests

"ttest_bh"

Specific pairs of interest or heterogeneous sample sizes. Default.

Independence between pairs; equal variance (Student) or relaxed (Welch).

Tukey’s HSD

"tukey_hsd"

All conditions are equally important; balanced design preferred.

Equal variance; equal (or similar) sample sizes across conditions.


Configuration

Set post-hoc options in the defaults: block of comparison.yaml:

defaults:
  posthoc_method: "ttest_bh"   # or "tukey_hsd"
  ttest_method: "student"      # or "welch" (only used when posthoc_method is ttest_bh)
  fdr_alpha: 0.05              # significance threshold (used by both ttest_bh and tukey_hsd)

Field

Type

Default

Description

posthoc_method

"ttest_bh" or "tukey_hsd"

"ttest_bh"

Selects which post-hoc procedure to use for pairwise comparisons.

ttest_method

"student" or "welch"

"student"

Controls the variance assumption for the two-sample t-test. Only used when posthoc_method is "ttest_bh".

fdr_alpha

float (0, 1]

0.05

Significance threshold for pairwise comparisons and ANOVA. Used as the BH false-discovery-rate threshold when posthoc_method is "ttest_bh", and as the family-wise alpha threshold when posthoc_method is "tukey_hsd".


Method Details

BH-Corrected t-Tests (ttest_bh)

  • Runs independent two-sample t-tests for each pair of conditions.

  • ttest_method: "student" assumes equal variances (scipy.stats.ttest_ind with equal_var=True).

  • ttest_method: "welch" relaxes that assumption (equal_var=False).

  • Raw p-values from all pairs across all metrics are collected into a single family, then adjusted via the Benjamini-Hochberg step-up procedure.

  • A pair is significant when p_adj <= fdr_alpha.

  • Cohen’s d is computed for effect size, with interpretation labels: "negligible" (|d| < 0.2), "small" (0.2 <= |d| < 0.5), "medium" (0.5 <= |d| < 0.8), "large" (|d| >= 0.8).

Note

When a control label is set in comparison.yaml, only control-vs-treatment pairs are tested. Otherwise, all unique pairs are tested.

Tukey’s HSD (tukey_hsd)

  • Tests all pairs simultaneously using scipy.stats.tukey_hsd.

  • Controls the family-wise error rate (FWER) rather than FDR.

  • A pair is significant when the Tukey-adjusted p_value <= fdr_alpha.

  • p_value_adjusted mirrors p_value for Tukey results (Tukey p-values are already family-wise corrected).

  • Best with balanced designs (equal replicates per condition).

  • Cohen’s d is still computed for each pair.

  • The t_statistic field is set to NaN for Tukey results since the test does not produce a t-statistic.


ANOVA

A one-way ANOVA (scipy.stats.f_oneway) is always run alongside post-hoc tests when there are 3 or more conditions. It tests whether at least one condition’s mean differs from the others.

Field

Type

Description

f_statistic

float

F-statistic from the one-way ANOVA.

p_value

float

P-value for the omnibus test.

significant

bool

Whether p_value <= fdr_alpha (uses the configured fdr_alpha threshold).

ANOVA does not determine which pairs differ – that is the role of post-hoc tests. ANOVA is skipped when fewer than 3 conditions are present, and returns NaN statistics if any group has fewer than 2 observations.


Output Fields

These fields appear in comparison JSON files and are used by the CLI formatter.

Pairwise result fields

Field

Type

Description

condition_a

str

Label of first condition (typically control).

condition_b

str

Label of second condition (typically treatment).

metric

str

Name of the metric being compared.

t_statistic

float

T-test statistic. NaN for Tukey HSD results.

p_value

float

Raw p-value from the pairwise test.

p_value_adjusted

float or null

Adjusted p-value (BH-adjusted for ttest_bh; mirrors p_value for tukey_hsd since Tukey p-values are already family-wise corrected). null when not available.

posthoc_method

str

"ttest_bh" or "tukey_hsd".

cohens_d

float

Effect size (positive = condition_a mean > condition_b mean).

effect_size_interpretation

str

"negligible", "small", "medium", or "large".

direction

str

Interpreted direction of change (e.g. "higher", "lower", or "undetermined").

significant

bool

Whether p_adj <= alpha (uses adjusted p-value when available, raw otherwise).

percent_change

float

Percent change from condition_a to condition_b.

Comparison-level fields

Field

Type

Description

fdr_alpha

float

The alpha threshold used for significance (BH FDR for ttest_bh; FWER for tukey_hsd). Also used as the ANOVA significance threshold.

ttest_method

str

"student" or "welch".

posthoc_method

str

"ttest_bh" or "tukey_hsd".


CLI Significance Markers

The CLI formatter annotates pairwise rows with significance markers:

Marker

Meaning

*

p_adj <= fdr_alpha (default 0.05)

**

p_adj <= 0.01

***

p_adj <= 0.001

Note

The contacts formatter uses multi-level markers (**, ***). The default scalar formatter uses a single * for any significant result.

Some plugins also use:

Marker

Meaning

(dagger)

Cohen’s d meets the min_effect_size threshold (practical significance). Currently used by the contacts plugin.


Edge Cases

Scenario

Behavior

Fewer than 2 replicates in a group

t-test returns NaN for t-statistic and p-value; Cohen’s d returns NaN.

Equal values across all replicates in both groups

p-value = 1.0, Cohen’s d = 0.0 ("negligible").

Single condition

No pairwise tests are generated; ANOVA is skipped.

Two conditions

Pairwise tests run normally; ANOVA is skipped (requires >= 3 conditions).

Tukey HSD with fewer than 2 groups or fewer than 2 observations per group

Returns empty results (no pairs generated).

Zero control mean

Percent change returns inf or -inf; NaN if both means are non-finite.


See Also