Understanding Post-Hoc Power and Why Not to Calculate It

Don’t Calculate Post-Hoc Power with Observed Effect Size

Prospective Power Analysis Calculator

This calculator assists in prospective power analysis by helping you determine sample size or detectable effect size *before* your study. It emphasizes why calculating power *after* observing results (post-hoc) using those same results is methodologically unsound.

Significance Level (α)

Typically 0.05 (5%). This is the probability of a Type I error (false positive).

Desired Power (1-β)

Typically 0.80 (80%). This is the probability of correctly detecting a true effect (avoiding a Type II error/false negative).

Estimated Effect Size (e.g., Cohen’s d)

A pre-study estimate of the magnitude of the effect you expect to find (e.g., 0.2=small, 0.5=medium, 0.8=large). Do NOT use your observed effect size here.

Current or Planned Sample Size (N)

The total number of participants or observations in your study.

What is Post-Hoc Power Analysis (and Why You Shouldn’t Use It)?

Post-hoc power analysis, sometimes called observed power analysis, refers to the practice of calculating statistical power *after* a study has been completed, using the effect size observed in the data itself. While it might seem intuitive to want to know the power of a study based on its actual findings, this approach is statistically flawed and widely discouraged by researchers and methodologists. The core issue lies in its circularity: if you observed a statistically significant result, your post-hoc power will always be high (often near 100%), regardless of the true underlying effect size or sample size. Conversely, if you observed a non-significant result, the post-hoc power will often be low, suggesting a lack of sensitivity even if the true effect was small or moderate. This makes post-hoc power calculations uninformative for decision-making regarding the study’s design or the interpretation of non-significant findings.

Who should use Prospective Power Analysis? Any researcher planning a study, particularly those in fields like psychology, medicine, education, and social sciences, should conduct a prospective power analysis. This is done before data collection to determine the necessary sample size to detect a specific effect size with a desired level of confidence (power) and significance. It’s a critical step in robust research design, ensuring that studies are adequately powered to yield meaningful results and avoid wasting resources on underpowered investigations.

Common Misconceptions:

Misconception 1: Post-hoc power tells me if my non-significant result is truly null. (Reality: It often inflates the likelihood of concluding there’s no effect when the study was simply underpowered.)
Misconception 2: High post-hoc power confirms my significant finding is reliable. (Reality: If a result is significant at α=0.05, the post-hoc power calculated using that same result will naturally be very high, offering no additional insight.)
Misconception 3: It’s better than nothing when interpreting results. (Reality: It can be actively misleading and lead to incorrect conclusions.)

Prospective Power Analysis: Formula and Mathematical Explanation

Prospective power analysis aims to determine the required sample size (N) or the minimum detectable effect size for a given N, α, and desired power (1-β). The calculation often involves the inverse cumulative distribution function (quantile function) of the normal distribution, typically denoted as Φ^-1(p).

For a two-sided test, the critical value for a given alpha (α) is Z_α/2. The value corresponding to the desired power (1-β) is Z_β (note: this is often expressed as -Z_β in formulas related to the non-central distribution, but for calculation purposes related to separation of means, we use the positive value representing the required distance).

The relationship between effect size (ES), sample size (N), and power is often conceptualized through the non-centrality parameter (λ). For common tests like the t-test, the formula for the non-centrality parameter is approximately λ = N * ES² / k, where k depends on the specific test (e.g., k=2 for independent samples t-test with equal variance, k=4 for paired t-test).

However, a more direct approach for sample size determination involves the critical values:

Formula for Sample Size (N) determination for detecting a specific Effect Size (ES):

N = ( (Z_α/2 + Z_β) / ES )² * k

Where:

N: Required sample size (per group for some tests, or total). The ‘k’ factor adjusts based on the test type (e.g., k=2 for independent groups).
Z_α/2: The Z-score corresponding to the significance level (e.g., for α=0.05, two-tailed, Z_0.025 ≈ 1.96).
Z_β: The Z-score corresponding to the desired power (e.g., for power=0.80, β=0.20, Z_0.20 ≈ 0.84).
ES: The estimated effect size (e.g., Cohen’s d).
k: A multiplier that depends on the statistical test. For comparing two independent means (equal variances assumed), k=2. For a one-sample test or paired samples, k=1. We’ll use k=2 for a general example context.

Simplified Calculation Logic (as implemented in the calculator for Power): Given N, α, and ES, we calculate the observed Z-score difference corresponding to power:

Observed Z = ES * sqrt(N / k)

Power = Φ( Observed Z – Z_α/2 ) (for a one-sided interpretation, or adjusted for two-sided)

The calculator uses approximations often derived from statistical software or libraries which handle the nuances of t-distributions vs. normal distributions, especially for smaller sample sizes.

Variables Table

Key Variables in Power Analysis
Variable	Meaning	Unit	Typical Range
α (Alpha)	Significance Level (Probability of Type I Error)	Probability (0 to 1)	0.01 – 0.10 (Commonly 0.05)
β (Beta)	Type II Error Rate	Probability (0 to 1)	0.05 – 0.40 (Often derived from desired power)
Power (1-β)	Probability of Detecting a True Effect	Probability (0 to 1)	0.70 – 0.99 (Commonly 0.80 or higher)
ES (Effect Size)	Magnitude of the expected or minimum meaningful effect	Standardized Units (e.g., Cohen’s d) or Unstandardized	Depends on field (e.g., 0.2=small, 0.5=medium, 0.8=large)
N (Sample Size)	Number of observations or participants	Count	Varies greatly; determined by other parameters
Z_α/2	Critical Z-value for significance level	Standard Score	Approx. 1.96 for α=0.05 (two-tailed)
Z_β	Z-value for desired power level	Standard Score	Approx. 0.84 for Power=0.80

Practical Examples of Prospective Power Analysis

Understanding how to apply prospective power analysis is key to designing effective studies. Here are two examples focusing on determining sample size.

Example 1: Clinical Trial – Measuring Drug Efficacy

Scenario: A pharmaceutical company is developing a new drug to lower blood pressure. They want to design a Phase II clinical trial to see if the drug has a medium effect compared to a placebo. They need to determine the sample size required.

Inputs:

Significance Level (α): 0.05 (standard two-tailed test)
Desired Power (1-β): 0.80 (80% chance of detecting the effect if it exists)
Estimated Effect Size (Cohen’s d): 0.5 (a medium effect size, meaning the drug group’s mean blood pressure reduction is 0.5 standard deviations higher than the placebo group’s)
Test Type Assumption: Independent samples t-test (k=2)

Calculation (using a power analysis calculator or formula):
Plugging these values into a prospective power analysis reveals the required sample size.

Result: A sample size of approximately 128 participants per group (total N=256) would be needed.

Interpretation: With 128 participants in the drug group and 128 in the placebo group, the study would have an 80% chance of detecting a medium effect size (Cohen’s d=0.5) at the 0.05 significance level, if such an effect truly exists. If they could only recruit 100 participants per group, their power to detect a medium effect would drop significantly.

Example 2: Educational Study – Evaluating a New Teaching Method

Scenario: An educational researcher wants to test if a new teaching method improves student scores on a standardized test compared to the traditional method. They estimate a small to medium effect.

Inputs:

Significance Level (α): 0.05
Desired Power (1-β): 0.90 (they want higher confidence, 90%)
Estimated Effect Size (Cohen’s d): 0.4 (a slightly smaller than medium effect size)
Test Type Assumption: Independent samples t-test (k=2)

Calculation:
Using the inputs…

Result: The required sample size is approximately 105 participants per group (total N=210).

Interpretation: To be 90% sure of detecting a small-to-medium effect (d=0.4) with 95% confidence (α=0.05), the study needs about 105 students in each group. If resources only allow for 60 students per group, they would need to accept lower power or aim to detect only larger effect sizes. This calculation informs resource allocation and feasibility.

Sample Size

Detectable Effect Size (d)

Power vs. Sample Size and Detectable Effect Size

How to Use This Prospective Power Analysis Calculator

This calculator helps you plan your study effectively by determining sample size or the effect size you can realistically detect. Follow these steps:

Set Significance Level (α): Input your desired threshold for statistical significance. The default is 0.05, meaning you’re willing to accept a 5% chance of a Type I error (false positive). Adjust if your field requires a stricter (e.g., 0.01) or more lenient threshold.
Set Desired Power (1-β): Enter the probability you want of detecting a true effect. The default is 0.80 (80%). Higher power (e.g., 0.90) reduces the risk of a Type II error (false negative) but requires a larger sample size.
Estimate Effect Size (ES): This is crucial. Input the *smallest effect size you consider meaningful* or the effect size you *realistically expect* based on prior research or theory. Do NOT use the effect size calculated from your current data. Common values for Cohen’s d are 0.2 (small), 0.5 (medium), and 0.8 (large).
Input Sample Size (N): If you already know your total sample size (or planned size per group), enter it here. The calculator will then estimate the power of your study. If you want to determine the sample size, you might need to use a dedicated sample size calculator, but this tool shows the relationship. *For this calculator’s primary function, we focus on estimating power given N, but the underlying principles relate to N determination.*
Click “Calculate Power”: The calculator will update with your study’s statistical power and intermediate values.

Reading the Results:

Primary Result (Power): This is the key output. If it’s 0.80 or higher, your study is generally considered adequately powered to detect the specified effect size. If it’s significantly lower, your study might be at high risk of a Type II error.
Intermediate Values: These show the parameters used in the calculation (α, β, ES, N).
Formula Explanation: Provides context on how the power is estimated.

Decision-Making Guidance:

Low Power (< 0.70): Consider increasing your sample size, aiming for a larger effect size, or accepting a higher risk of missing a real effect.
Adequate Power (0.70 – 0.90): Your study design is reasonable for detecting the targeted effect size.
High Power (> 0.90): Provides strong assurance, but may require an unnecessarily large sample size.

Remember, power analysis is a planning tool. The chosen effect size should reflect practical significance, not just statistical possibility. This process is vital for ethical research, ensuring participant contributions aren’t wasted on studies unlikely to yield conclusive results. It’s a core aspect of rigorous statistical planning.

Key Factors That Affect Power Analysis Results

Several factors influence the outcome of a prospective power analysis, impacting the required sample size or the achievable power. Understanding these is critical for accurate planning.

1. Significance Level (α): A stricter alpha level (e.g., 0.01 instead of 0.05) requires a larger sample size to achieve the same power. This is because a stricter alpha means the threshold for statistical significance is higher, making it harder to reject the null hypothesis. More evidence (i.e., a larger sample) is needed to meet this higher bar.
2. Desired Power (1-β): Increasing the desired power (e.g., from 0.80 to 0.90) necessitates a larger sample size. Higher power means a greater chance of detecting a true effect, which inherently requires more data to be confident in the finding.
3. Effect Size (ES): This is arguably the most influential factor. Smaller effect sizes require substantially larger sample sizes to detect. Conversely, very large effects can be detected with smaller samples. Researchers must decide whether to target a conventionally small, medium, or large effect size, balancing feasibility with the research question’s importance. Example calculations demonstrate this sensitivity.
4. Variability in the Data (e.g., Standard Deviation): Although not an explicit input in this simplified calculator (it’s incorporated into standardized effect sizes like Cohen’s d), higher variability in the outcome measure inflates the required sample size. If measurements are very noisy, you need more data points to discern a true effect from random variation. Controlling variability through careful study design is crucial.
5. One-Tailed vs. Two-Tailed Test: A one-tailed test requires a smaller sample size than a two-tailed test to achieve the same power for detecting an effect in a specific direction. This is because the entire alpha level is concentrated in one tail of the distribution, making the critical value less extreme. However, one-tailed tests should only be used when there is strong *a priori* justification.
6. Research Design Complexity: More complex designs (e.g., multiple groups, covariates, repeated measures) have different power characteristics. The formulas used in power analysis are specific to the statistical test employed (e.g., t-test, ANOVA, regression). The ‘k’ factor in the sample size formula is an example of how design impacts N. Power calculations for complex models often require specialized software.
7. Attrition / Dropout Rates: Researchers must account for potential participant dropout. If a study requires N=100 participants but anticipates a 20% dropout rate, the initial recruitment target should be N = 100 / (1 – 0.20) = 125 participants to ensure the final analysis sample size is met. This is a practical consideration directly affecting the ‘effective’ sample size.

Frequently Asked Questions (FAQ)

What is the difference between prospective and post-hoc power analysis?

Prospective power analysis is conducted before a study to determine the necessary sample size or achievable power. Post-hoc power analysis is performed after the study using the observed effect size, and it’s statistically flawed because it provides redundant information – if a result is significant, power will be high; if non-significant, it will be low, regardless of the true effect.

Why is using the observed effect size in power calculations problematic?

Using the observed effect size creates a circular argument. The statistical significance of your finding is already determined by the observed effect size and sample size relative to alpha. Calculating power using that same observed effect size will artificially inflate power if the result was significant and deflate it if non-significant, without providing reliable information about the study’s sensitivity to detect a true effect.

Can I use the calculator to determine my sample size?

While this calculator primarily estimates power given a sample size, the underlying principles are used in sample size calculations. To directly calculate sample size, you would typically rearrange the power formula or use a dedicated sample size calculator that takes effect size, alpha, and desired power as inputs. This tool helps illustrate the relationships. You can experiment by trying different sample sizes to see the corresponding power.

What is a ‘medium’ effect size (Cohen’s d = 0.5)?

Cohen’s d = 0.5 represents a medium effect size, meaning the means of the two groups differ by half a standard deviation. In practical terms, it’s often considered a meaningful difference in many fields. However, the interpretation of ‘small’, ‘medium’, and ‘large’ can vary significantly depending on the research area. Always consider the context.

How do I choose the right effect size estimate for my study?

The best approach is to base your estimate on previous, similar research in your field. Meta-analyses can provide reliable estimates. If no prior data exists, you might consider the smallest effect size that would be practically or clinically meaningful for your research question. Alternatively, planning for small, medium, and large effects can provide a range of required sample sizes.

What if my study doesn’t use Cohen’s d? Can power analysis still be done?

Yes. Cohen’s d is just one measure of effect size. Other measures exist for different statistical tests (e.g., R-squared for regression, Odds Ratio for logistic regression, eta-squared for ANOVA). The principle remains the same: you need an estimate of the magnitude of the effect you aim to detect, appropriate for your chosen statistical analysis. Many statistical software packages can perform power analyses for various test types.

Is it possible to have 100% power?

Achieving 100% power is practically impossible and usually unnecessary. It would require an infinitely large sample size or an infinitely large effect size. Standard practice aims for 80% or 90% power, which represents a reasonable balance between detecting true effects and resource efficiency.

What should I do if my prospective power analysis requires an impossibly large sample size?

This often indicates that detecting the desired effect size with the chosen alpha and power levels is not feasible with typical resources. In such cases, you might need to:

Re-evaluate the minimum effect size that is practically meaningful.
Accept a lower level of power (e.g., 70% instead of 80%).
Consider a different research design that might be more sensitive or efficient.
Conduct a pilot study to get a better estimate of the effect size and variability.

It’s better to recognize limitations upfront than to conduct an underpowered study.

Related Tools and Internal Resources

Sample Size Calculator
Determine the optimal number of participants needed for your study based on desired power and effect size.
Understanding Confidence Intervals
Learn how confidence intervals provide a range of plausible values for an effect size, complementing hypothesis testing.
What is Statistical Significance?
A deep dive into p-values, hypothesis testing, and the concept of statistical significance.
Cohen’s d Calculator
Calculate Cohen’s d effect size from means and standard deviations, useful for estimating effect size inputs.
Guide to Hypothesis Testing
Master the fundamentals of formulating hypotheses and interpreting statistical tests.
Basics of Research Design
Explore different study designs and their implications for data analysis and power.