Glowing strands of DNA from five people highlight differences among humans in
the number of copies of amylase genes, which encode enzymes that break down
sugars. Red and green probes bind to regions hosting the genes, and each DNA
strand has a different number of the genes on the short arm of chromosome 1.
gle DNA bases, current estimates suggest
that structural variation may encompass four times as many bases as SNPs
do, meaning that people’s genomes differ by an additional 0.5 percent. So any
two people are really only about 99.4 percent alike.
New findings indicate that a substantial portion of otherwise healthy people are missing large chunks of their
genomes, gaps that can predispose them
to certain diseases. Already, structural
variants, especially the type known as
copy number variants, have been linked
to neurological disorders such as schizophrenia and autism, to susceptibility to
HIV infection, to Crohn’s disease and
even to tendencies in weight.
“Our work in structural variation
is showing that no one is really normal,” says Charles Lee, a cytogeneticist at Brigham and Women’s Hospital
and Harvard Medical School in Boston.
Lee was among the first to discover the
broad range of structural variation in the
Scientists previously knew that having extra copies of an entire chromosome could lead to disorders, such as the
third copy of chromosome 21 that causes
Down syndrome. Research had also identified very large deletions that remove
so much of a chromosome that the void
can be seen under a microscope, and had
revealed nips and tucks that remove single genes or parts of genes.
But until completing the Human
Genome Project, an effort to map all the
genes and surrounding DNA found in
people, researchers had no way to detect
structural changes too small to be seen
under the microscope and too big to be
detected by looking at individual genes.
Because these variations cover large
expanses of the genome rather than sin-
Not necessarily two copies
Lee and his colleagues knew from
work begun decades ago that some parts
of the human genome contain multiple
copies of certain genes. For example,
light-sensitive opsin proteins, made in
the eye and necessary for color vision,
are encoded by a cluster of two to nine
genes on the X chromosome. Some of the
copies encode proteins that are better at
sensing green light, while others are specialized for red. The number of copies of
the genes affects how well people see colors, and missing certain ones can lead to
In another example, each person has
many copies of the immune system HLA
genes. And blood disorders known as
thalassemias arise when copies of genes
that encode subunits of hemoglobin,
blood’s oxygen-carrying molecule, are
Those examples were thought to be
exceptions to the rule that each gene is
inherited as two copies, one from the
mother and the other from the father.
No one suspected that parents routinely
pass along three, four or more copies of
entire parts of the genome, or sometimes
fail to pass along a whole section.
Because structural variation alters
entire sections of a chromosome, the
genes within those sections can be copied multiple times, inverted or deleted.
As a result, the number of copies of the
genes in an altered stretch can vary in
Even the Human Genome Project was
affected by the assumption that most of
the genome contains only a single copy of
each parent’s DNA, Lee contends. What
the project compiled is an averaged
human genome, a sequence homogenized from multiple people that represents no real person. About 66 percent
is from an anonymous man of European
descent. The rest is a mishmash of DNA
sequences from several other people.
Left out in favor of a global template
is the diversity among single letters and
overall structure originally present in
each of the DNA donors. Multiple copies
of genes look alike or have minor differences that could be easily attributed to
glitches in the decoding process. So piecing together the whole genome sequence
from snippets of data ended up collapsing what should have been several gene
copies into a single gene, Lee says.
He discovered structural deviation
from this generic sequence while testing
a method for detecting abnormalities in
the genomes of tumor cells. Cancer cells
are notorious for deleting, rearranging
and duplicating parts of the genome. Lee
wanted to map those changes by comparing the cancer genomes with the Human
Genome Project consensus sequence.
But first he needed to be sure that DNA
samples from healthy people matched
the consensus sequence.