Scanning the genome
to link diseases to specific genetic differences, researchers must sort through a mass of data
1 | Study sample with microarray SNP chip
microarray chips analyze the genomes of volunteers with a particular
disease m (the cases) and similar people without the disease m (the
controls). statistical comparisons indicate which single genetic code
differences, or snps, are more common in people with the disease.
k snps tested: up to 1 million
2 | Select the best SNPs
if people with a disease share a snp not common in healthy controls
(grayed at left), researchers flag it as a candidate snp. with such large
numbers of snps, many will appear to be linked to the disease just by
chance. so a stringent statistical test is applied to reduce the number
of candidate snps for further study.
k statistical hurdle: high
3 | Test candidate SNPs in another sample
the candidate snps are checked in a larger sample of cases and
controls. the statistical test applied at this stage is less stringent
to avoid eliminating snps truly linked to the disease. only the snps
most likely to be important should pass this test.
k snps tested: 10–500
k statistical hurdle: medium
But in a million-SNP blitz, too many false
results manage to scramble over the hurdle just by random luck.
“There have been problems in the past
when people have declared victory prematurely,” says geneticist Joel Hirschhorn
of the Broad Institute, by declaring SNPs
to be significant based only on the traditional statistical hurdles. “It was hard to
convince people that [the old level] was
not an appropriate threshold. People are
starting to accept that now.”
The simplest solution is just raising
the hurdle. Traditionally, researchers
have permitted a bogus result to sneak
through about 1 time in 20. With the new
genome-wide scans, it’s now usually no
more than 5 in 100 million. This raises the
bar considerably, Hirschhorn says.
But higher hurdles require bigger
studies. That’s because much of the muscle power behind these studies depends
on how many participants are included.
Ne w studies need an extra boost of muscle
to hoist the important SNPs over the now-higher bar — otherwise no SNP might get
flagged. So researchers have to scramble
to find money and volunteers. Typical
sample sizes for genetic association studies can now run in the tens of thousands.
Even then, added muscle power might
not be enough. So researchers are turning
to multistage studies, too. In such studies
scientists first scan the full genome, then
try to replicate the strongest findings with
new subjects in subsequent studies. “That
really leads to a new type of epidemiology,