Calculate Linkage Disequilibrium (LD) Statistic

Linkage Disequilibrium (LD) Calculator

Calculate and understand the Linkage Disequilibrium (LD) statistic, a key measure in population genetics and association studies.

LD Statistic Calculator

Frequency of Allele A (pA)

Enter the frequency of allele A (e.g., 0.6 for 60%).

Frequency of Allele a (pa)

Enter the frequency of allele a (e.g., 0.4 for 40%).

Frequency of Allele B (pB)

Enter the frequency of allele B (e.g., 0.7 for 70%).

Frequency of Allele b (pb)

Enter the frequency of allele b (e.g., 0.3 for 30%).

Frequency of Haplotype AB (P_AB)

Enter the observed frequency of the AB haplotype (e.g., 0.45 for 45%).

Frequency of Haplotype Ab (P_Ab)

Enter the observed frequency of the Ab haplotype (e.g., 0.15 for 15%).

Frequency of Haplotype aB (P_aB)

Enter the observed frequency of the aB haplotype (e.g., 0.25 for 25%).

Frequency of Haplotype ab (P_ab)

Enter the observed frequency of the ab haplotype (e.g., 0.15 for 15%).

Results

—

Expected Frequency of AB (pA * pB)
—

Expected Frequency of Ab (pA * pb)
—

Expected Frequency of aB (pa * pB)
—

Expected Frequency of ab (pa * pb)
—

D (Disequilibrium Measure)
—

D’ (Normalized Disequilibrium)
—

Formula Used:
The primary measure is D, calculated as: D = P_AB – (pA * pB), where P_AB is the observed frequency of the AB haplotype and pA and pB are the frequencies of alleles A and B, respectively.
D’ is a normalized version of D, accounting for maximum possible disequilibrium given allele frequencies.

Observed and Expected Haplotype Frequencies
Haplotype	Observed Frequency (P)	Expected Frequency (p1 * p2)	Difference (P – Expected)
AB	—	—	—
Ab	—	—	—
aB	—	—	—
ab	—	—	—

Comparison of Observed vs. Expected Haplotype Frequencies

What is Linkage Disequilibrium (LD)?

Linkage Disequilibrium (LD) is a fundamental concept in population genetics and molecular biology. It describes the non-random association of alleles at two or more loci on the same chromosome. In simpler terms, LD measures how often certain alleles are found together on the same chromosome more or less frequently than would be expected by chance if they were inherited independently. High LD between two loci indicates that the alleles at these loci are often inherited as a block, suggesting they are physically close on the chromosome and recombination has not yet broken this association apart. Conversely, low LD suggests the loci are far apart or have experienced significant recombination.

Who Should Use LD Analysis?

Population Geneticists: To study evolutionary history, recombination rates, and population structure.
Genetic Epidemiologists: To identify genetic variants associated with diseases through Genome-Wide Association Studies (GWAS). LD is crucial for fine-mapping genetic associations to pinpoint causal variants.
Biotechnologists: In marker-assisted selection for breeding programs.
Researchers studying gene regulation: To understand how linked regulatory elements might influence gene expression.

Common Misconceptions about LD:

LD is only about genes: LD applies to any polymorphic locus, including non-coding DNA, microsatellites, and SNPs (Single Nucleotide Polymorphisms).
LD is static: LD patterns change over time due to recombination, mutation, genetic drift, and selection.
LD implies causation: A strong LD between a marker and a trait does not mean the marker itself causes the trait; it often means a nearby causal variant is in LD with the marker.
LD is always measured by D or D’: While D and D’ are common metrics, other measures like r² (squared correlation coefficient) are also widely used, particularly in GWAS.

Linkage Disequilibrium (LD) Formula and Mathematical Explanation

The core idea behind quantifying linkage disequilibrium is to compare the observed frequency of allele combinations (haplotypes) with the frequency expected under the assumption of independence (i.e., Hardy-Weinberg Equilibrium for multiple loci). Let’s consider two loci, Locus 1 with alleles A and a, and Locus 2 with alleles B and b.

The frequencies of the individual alleles are denoted as:

pA: frequency of allele A
pa: frequency of allele a (pa = 1 – pA)
pB: frequency of allele B
pb: frequency of allele b (pb = 1 – pB)

The observed frequencies of the four possible haplotypes are denoted as:

P_AB: observed frequency of haplotype AB
P_Ab: observed frequency of haplotype Ab
P_aB: observed frequency of haplotype aB
P_ab: observed frequency of haplotype ab

If the alleles at the two loci were in perfect equilibrium, the expected frequencies would be the product of the individual allele frequencies:

Expected P_AB = pA * pB
Expected P_Ab = pA * pb
Expected P_aB = pa * pB
Expected P_ab = pa * pb

The measure of disequilibrium, D, is defined as the difference between the observed and expected frequency of one specific haplotype, typically P_AB:

D = P_AB – (pA * pB)

This single value D captures the extent of non-random association. However, its interpretation can be difficult as it depends on the allele frequencies. To address this, D is often normalized.

Normalized Measures: D’ (D-prime)

D’ is a commonly used measure that scales D to a range between 0 and 1 (or -1 and 1). It accounts for the maximum possible value D could take given the allele frequencies. The calculation involves finding the minimum and maximum possible values for D (D_max and D_min) based on the frequencies of the four haplotypes and the individual allele frequencies, and then normalizing D.

For example, if D is positive (meaning P_AB is higher than expected):

D’ = D / D_max

If D is negative (meaning P_AB is lower than expected):

D’ = D / D_min

Where D_max = min(pA*pB, pa*pb) and D_min = max(-pA*pb, -pa*pB) in specific contexts of haplotype frequency constraints. A D’ value close to 1 indicates strong LD, meaning the observed haplotype frequencies deviate significantly from what’s expected by chance. A D’ close to 0 indicates weak LD, suggesting the alleles are largely inherited independently.

Variables Table:

Variable	Meaning	Unit	Typical Range
pA, pa, pB, pb	Allele frequencies at two loci	Frequency (proportion)	0 to 1
P_AB, P_Ab, P_aB, P_ab	Observed haplotype frequencies	Frequency (proportion)	0 to 1
D	Measure of disequilibrium	Frequency difference	Approximately -0.25 to +0.25 (can vary)
D’	Normalized disequilibrium measure	Dimensionless	0 to 1 (commonly 0 to 1)
r²	Squared correlation coefficient (alternative LD measure)	Dimensionless	0 to 1

Practical Examples (Real-World Use Cases)

Example 1: Disease Association Study

Researchers are investigating a genetic marker near a suspected disease susceptibility gene. They analyze haplotype data from 1000 individuals, half with the disease and half healthy controls. The frequencies of alleles at two nearby SNPs (SNP1: A/a, SNP2: B/b) are measured.

Inputs:

SNP1 Allele Frequencies: pA = 0.7, pa = 0.3
SNP2 Allele Frequencies: pB = 0.5, pb = 0.5
Observed Haplotype Frequencies (across all 1000 individuals):

P_AB = 0.40
P_Ab = 0.30
P_aB = 0.15
P_ab = 0.15

Calculations:

Expected P_AB = 0.7 * 0.5 = 0.35
D = P_AB – Expected P_AB = 0.40 – 0.35 = 0.05
Calculating D’ involves more complex boundary calculations, but let’s assume for this example D’ = 0.20.

Results Interpretation:

The D value is slightly positive (0.05), suggesting a small tendency for A and B alleles to be inherited together more often than expected. The D’ value (0.20) indicates relatively low LD between these two SNPs. This suggests that SNP1 and SNP2 are not tightly linked and recombination has largely randomized their combinations. In a disease association study, this low LD might mean that neither SNP is a strong direct marker for the disease or that the disease-causing variant is far from both.

Example 2: Population Structure Analysis

A population geneticist is studying two highly polymorphic microsatellite loci (Marker1: Alleles M1, m1 and Marker2: Alleles M2, m2) in a specific island population to understand its evolutionary history.

Inputs:

Marker1 Allele Frequencies: pM1 = 0.8, pm1 = 0.2
Marker2 Allele Frequencies: pM2 = 0.6, pm2 = 0.4
Observed Haplotype Frequencies:

P_M1M2 = 0.55
P_M1m2 = 0.25
P_m1M2 = 0.05
P_m1m2 = 0.15

Calculations:

Expected P_M1M2 = 0.8 * 0.6 = 0.48
D = P_M1M2 – Expected P_M1M2 = 0.55 – 0.48 = 0.07
Again, assuming a calculated D’ = 0.50.

Results Interpretation:

Here, D is positive (0.07), and D’ is 0.50. This indicates a moderate level of linkage disequilibrium. The observed frequency of the M1M2 haplotype (0.55) is higher than expected (0.48). The D’ of 0.50 suggests that roughly half of the possible non-random association is present. This moderate LD could indicate a recent population bottleneck, limited gene flow, or selection favoring the M1M2 haplotype. The geneticist would further investigate these possibilities using population genetics models.

How to Use This LD Calculator

This calculator simplifies the process of quantifying linkage disequilibrium between two genetic loci. Follow these steps:

Input Allele Frequencies: Enter the frequencies for each allele at the two loci. For instance, if you have a locus with alleles A and a, enter the frequency of A (pA) and the frequency of a (pa). Note that pa should ideally sum up to 1 with pA (pa = 1 – pA), though the calculator can derive one if the other is provided. Do the same for the second locus (B and b).
Input Haplotype Frequencies: Enter the observed frequencies for each of the four possible combinations (haplotypes) of alleles from the two loci (AB, Ab, aB, ab). These frequencies represent how often these specific allele combinations appear together on the same chromosome in your population sample. Ensure that the sum of these four observed haplotype frequencies equals 1 (or is very close to 1, accounting for rounding).
Calculate: Click the “Calculate LD” button. The calculator will compute the expected frequencies for each haplotype under the assumption of independence and then calculate the disequilibrium measures (D and D’).
Review Results: The main result section will display the calculated D and D’ values prominently. You’ll also see the intermediate values for expected haplotype frequencies and the difference (D).
Interpret the Table and Chart: The table provides a clear breakdown of observed vs. expected frequencies and their differences for each haplotype. The chart visually compares these frequencies, making it easier to spot which haplotype associations deviate most from random expectation.

How to Read Results:

D Value: A positive D means the observed frequency of the AB haplotype is greater than expected; a negative D means it’s less than expected.
D’ Value: Ranges from 0 to 1.
- D’ close to 1: Strong evidence of significant linkage disequilibrium. The alleles are not randomly assorted.
- D’ close to 0: Weak evidence of linkage disequilibrium. The alleles behave as if they are inherited independently.
- Values in between: Indicate varying degrees of non-random association.
Table & Chart: Look for the largest differences between observed and expected frequencies. These highlight the specific haplotype combinations that are either over-represented or under-represented in your sample, driving the overall LD.

Decision-Making Guidance:

High LD (D’ near 1): Suggests the loci are physically close on the chromosome, or there are evolutionary forces (like selection or low recombination rates) maintaining the association. This is crucial for fine-mapping in GWAS, as a marker in high LD with a disease-causing variant is likely informative.
Low LD (D’ near 0): Suggests the loci are far apart and recombination frequently separates them. Markers in low LD are less useful for imputation or indirect association studies.

Key Factors That Affect LD Results

Several biological and statistical factors influence the level of Linkage Disequilibrium observed between genetic loci:

Physical Distance: This is the most significant factor. Loci that are physically closer together on the same chromosome recombine less frequently. Consequently, alleles at closely linked loci tend to be inherited together, resulting in higher LD. Loci far apart or on different chromosomes recombine frequently, leading to lower LD.
Recombination Rate: Higher recombination rates between loci break down LD more quickly over generations. Conversely, regions with low recombination rates (like near the centromere) tend to maintain higher LD.
Time Since Mutation/Introduction: If a new mutation or a novel haplotype combination arises, it will initially be in complete LD with nearby alleles. Over many generations, recombination will gradually reduce this LD. Therefore, observing high LD can sometimes indicate a relatively recent event or a region where recombination is suppressed.
Genetic Drift: In small populations, random fluctuations in allele frequencies (genetic drift) can cause alleles to become associated by chance, increasing LD, even if the loci are not physically close. Drift can lead to random fixation or loss of haplotypes.
Selection: If a particular combination of alleles (haplotype) confers a selective advantage or disadvantage, it can significantly impact LD. Positive selection for a haplotype will increase its frequency and thus increase LD around the selected site. Conversely, negative selection against a haplotype will decrease its frequency and LD.
Population History (Bottlenecks, Founder Effects, Migration): Past events can dramatically shape LD patterns. A population bottleneck or founder effect reduces genetic diversity and can lead to an increase in LD for all loci within the affected population. Admixture (mixing of previously isolated populations) can also alter LD patterns in complex ways.
Mutation Rate: While recombination is the primary force breaking down LD, mutation can introduce new alleles and haplotypes, indirectly affecting LD patterns over very long timescales.
Meiotic Drive: Non-Mendelian segregation of alleles during meiosis can distort haplotype frequencies and influence LD.

Frequently Asked Questions (FAQ)

What is the difference between D and D’?

D measures the absolute difference between observed and expected haplotype frequencies. D’ is a normalized version that accounts for allele frequencies, making it more comparable across different loci and populations. D’ is generally preferred for interpreting the strength of LD.

What does an r² value represent in LD?

r² (squared correlation coefficient) measures the extent to which the alleles at two loci are statistically correlated. It is calculated as r² = (D²)/((pA*pa)*(pB*pb)). Unlike D’, r² is sensitive to allele frequencies and is often used in Genome-Wide Association Studies (GWAS) because it reflects the predictability of one SNP’s genotype based on another’s.

How is LD calculated in practice for large datasets like GWAS?

For large-scale studies, software packages like PLINK, LDhat, or Haploview are used. They efficiently calculate LD statistics (often r²) for millions of pairwise SNP comparisons using genotype data, typically requiring imputation for missing genotypes and phased haplotypes.

Can LD be used to predict genotypes?

Yes, if two loci are in high LD, knowing the genotype at one locus can help predict the genotype at the other. This principle is fundamental to SNP imputation, where genotypes at ungenotyped SNPs are inferred using LD information from reference panels.

Is high LD always a sign of a disease-associated locus?

Not necessarily. High LD means a marker is inherited with another variant nearby. If that nearby variant is the true causal variant for a disease, then the marker in high LD is informative. However, the marker itself might not be causal and could be linked to many different variants.

How does recombination frequency relate to LD?

Recombination acts to break down LD. The higher the recombination rate between two loci, the faster LD decays over generations. Conversely, low recombination rates maintain LD.

Can LD be negative?

Yes, the D statistic can be negative. A negative D indicates that a specific haplotype (e.g., AB) is less frequent than expected by chance. However, the normalized D’ is typically presented as a positive value ranging from 0 to 1, indicating the magnitude of deviation from independence.

What is the role of sex in LD?

Recombination rates often differ between sexes (e.g., typically higher in females than males for humans). This difference can lead to variations in LD patterns between maternally and paternally inherited haplotypes or across different chromosomal regions.

Related Tools and Resources

Understanding Linkage DisequilibriumLearn the basics of LD and its significance.
LD Formula ExplainedDive deeper into the mathematical calculations behind LD.
Step-by-Step Calculator GuideMaster the use of our LD calculator tool.
Population Genetics ConceptsExplore related topics in population genetics.
Introduction to Genome-Wide Association Studies (GWAS)Discover how LD is applied in disease research.
Haplotype Analysis ExplainedUnderstand the structure and importance of haplotypes.
Recombination Rates and Genetic MappingLearn about recombination and its role in LD decay.