# Post-Hoc Testing Reference ```{contents} :local: :depth: 2 ``` ## Overview PolyzyMD performs post-hoc pairwise comparisons automatically during `polyzymd compare run`. Two methods are available: BH-corrected t-tests (default) and Tukey's HSD. Both methods compute Cohen's d effect sizes and percent-change for every pair. --- ## Available Methods | Method | `posthoc_method` value | When to use | Assumptions | |--------|------------------------|-------------|-------------| | BH-corrected t-tests | `"ttest_bh"` | Specific pairs of interest or heterogeneous sample sizes. **Default.** | Independence between pairs; equal variance (Student) or relaxed (Welch). | | Tukey's HSD | `"tukey_hsd"` | All conditions are equally important; balanced design preferred. | Equal variance; equal (or similar) sample sizes across conditions. | --- ## Configuration Set post-hoc options in the `defaults:` block of `comparison.yaml`: ```yaml defaults: posthoc_method: "ttest_bh" # or "tukey_hsd" ttest_method: "student" # or "welch" (only used when posthoc_method is ttest_bh) fdr_alpha: 0.05 # significance threshold (used by both ttest_bh and tukey_hsd) ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `posthoc_method` | `"ttest_bh"` or `"tukey_hsd"` | `"ttest_bh"` | Selects which post-hoc procedure to use for pairwise comparisons. | | `ttest_method` | `"student"` or `"welch"` | `"student"` | Controls the variance assumption for the two-sample t-test. Only used when `posthoc_method` is `"ttest_bh"`. | | `fdr_alpha` | float (0, 1] | `0.05` | Significance threshold for pairwise comparisons and ANOVA. Used as the BH false-discovery-rate threshold when `posthoc_method` is `"ttest_bh"`, and as the family-wise alpha threshold when `posthoc_method` is `"tukey_hsd"`. | --- ## Method Details ### BH-Corrected t-Tests (`ttest_bh`) - Runs independent two-sample t-tests for each pair of conditions. - `ttest_method: "student"` assumes equal variances (`scipy.stats.ttest_ind` with `equal_var=True`). - `ttest_method: "welch"` relaxes that assumption (`equal_var=False`). - Raw p-values from all pairs across all metrics are collected into a single family, then adjusted via the Benjamini-Hochberg step-up procedure. - A pair is significant when `p_adj <= fdr_alpha`. - Cohen's d is computed for effect size, with interpretation labels: `"negligible"` (|d| < 0.2), `"small"` (0.2 <= |d| < 0.5), `"medium"` (0.5 <= |d| < 0.8), `"large"` (|d| >= 0.8). ```{note} When a `control` label is set in `comparison.yaml`, only control-vs-treatment pairs are tested. Otherwise, all unique pairs are tested. ``` ### Tukey's HSD (`tukey_hsd`) - Tests all pairs simultaneously using `scipy.stats.tukey_hsd`. - Controls the family-wise error rate (FWER) rather than FDR. - A pair is significant when the Tukey-adjusted `p_value <= fdr_alpha`. - `p_value_adjusted` mirrors `p_value` for Tukey results (Tukey p-values are already family-wise corrected). - Best with balanced designs (equal replicates per condition). - Cohen's d is still computed for each pair. - The `t_statistic` field is set to `NaN` for Tukey results since the test does not produce a t-statistic. --- ## ANOVA A one-way ANOVA (`scipy.stats.f_oneway`) is always run alongside post-hoc tests when there are 3 or more conditions. It tests whether at least one condition's mean differs from the others. | Field | Type | Description | |-------|------|-------------| | `f_statistic` | float | F-statistic from the one-way ANOVA. | | `p_value` | float | P-value for the omnibus test. | | `significant` | bool | Whether `p_value <= fdr_alpha` (uses the configured `fdr_alpha` threshold). | ANOVA does **not** determine which pairs differ -- that is the role of post-hoc tests. ANOVA is skipped when fewer than 3 conditions are present, and returns `NaN` statistics if any group has fewer than 2 observations. --- ## Output Fields These fields appear in comparison JSON files and are used by the CLI formatter. ### Pairwise result fields | Field | Type | Description | |-------|------|-------------| | `condition_a` | str | Label of first condition (typically control). | | `condition_b` | str | Label of second condition (typically treatment). | | `metric` | str | Name of the metric being compared. | | `t_statistic` | float | T-test statistic. `NaN` for Tukey HSD results. | | `p_value` | float | Raw p-value from the pairwise test. | | `p_value_adjusted` | float or null | Adjusted p-value (BH-adjusted for `ttest_bh`; mirrors `p_value` for `tukey_hsd` since Tukey p-values are already family-wise corrected). `null` when not available. | | `posthoc_method` | str | `"ttest_bh"` or `"tukey_hsd"`. | | `cohens_d` | float | Effect size (positive = `condition_a` mean > `condition_b` mean). | | `effect_size_interpretation` | str | `"negligible"`, `"small"`, `"medium"`, or `"large"`. | | `direction` | str | Interpreted direction of change (e.g. `"higher"`, `"lower"`, or `"undetermined"`). | | `significant` | bool | Whether `p_adj <= alpha` (uses adjusted p-value when available, raw otherwise). | | `percent_change` | float | Percent change from `condition_a` to `condition_b`. | ### Comparison-level fields | Field | Type | Description | |-------|------|-------------| | `fdr_alpha` | float | The alpha threshold used for significance (BH FDR for `ttest_bh`; FWER for `tukey_hsd`). Also used as the ANOVA significance threshold. | | `ttest_method` | str | `"student"` or `"welch"`. | | `posthoc_method` | str | `"ttest_bh"` or `"tukey_hsd"`. | --- ## CLI Significance Markers The CLI formatter annotates pairwise rows with significance markers: | Marker | Meaning | |--------|---------| | `*` | `p_adj <= fdr_alpha` (default 0.05) | | `**` | `p_adj <= 0.01` | | `***` | `p_adj <= 0.001` | ```{note} The contacts formatter uses multi-level markers (`**`, `***`). The default scalar formatter uses a single `*` for any significant result. ``` Some plugins also use: | Marker | Meaning | |--------|---------| | `†` (dagger) | Cohen's d meets the `min_effect_size` threshold (practical significance). Currently used by the contacts plugin. | --- ## Edge Cases | Scenario | Behavior | |----------|----------| | Fewer than 2 replicates in a group | t-test returns `NaN` for t-statistic and p-value; Cohen's d returns `NaN`. | | Equal values across all replicates in both groups | p-value = 1.0, Cohen's d = 0.0 (`"negligible"`). | | Single condition | No pairwise tests are generated; ANOVA is skipped. | | Two conditions | Pairwise tests run normally; ANOVA is skipped (requires >= 3 conditions). | | Tukey HSD with fewer than 2 groups or fewer than 2 observations per group | Returns empty results (no pairs generated). | | Zero control mean | Percent change returns `inf` or `-inf`; `NaN` if both means are non-finite. | --- ## See Also - {doc}`comparison_yaml` -- full `comparison.yaml` schema reference - {doc}`analysis_comparison_reference` -- comparison CLI commands and plugin summary - {doc}`../explanation/analysis_statistics_best_practices` -- autocorrelation, FDR concepts, and interpretation guidance