50 YEARS AGO
is valuable tool
WASHINGTON — We live in a golden age
of scientific data, with larger stockpiles
of genetic information, medical images
and astronomical observations than ever
before. Artificial intelligence can pore
over these troves to uncover potential
scientific discoveries much more quickly
than people can. But we should not blindly
trust AI’s scientific insights until these
computer programs can better gauge
how certain they are in their own results,
argues data scientist Genevera Allen of
Rice University in Houston.
AI systems that use
machine learning — learning
by studying data, rather than
following instructions — can
be trusted with some tasks,
Allen says. For example, AI
is reliable with work that
humans can later verify,
like counting moon craters
or predicting earthquake
aftershocks (SN: 12/22/18 &
1/5/19, p. 25).
More exploratory algorithms that poke around
datasets to find previously
unknown patterns or relationships “are
very hard to verify,” Allen said February
15 at a news conference during the annual
meeting of the American Association for
the Advancement of Science. Deferring
judgment to such autonomous systems
may lead to faulty conclusions, she warned.
Take precision medicine. Researchers
often aim to find groups of genetically
similar patients to help tailor treatments.
AI programs that sift through genetic data
have identified patient groups for some
diseases, such as breast cancer. But such
effort hasn’t worked as well for many
other conditions, like colorectal cancer.
Algorithms examining different datasets
have clustered together conflicting patient
classifications. That leaves scientists to
wonder which, if any, AI to trust.
UPDATE: M13 did help unlock
secrets of viral replication.
viruses, called bacteriophages
or simply phages, kill the host
cell after hijacking the cell’s
machinery to make copies of
themselves. Other phages,
including M13, leave the cell
intact. Scientists are using
phage replication to develop
drugs and technologies, such
as virus-powered batteries
(SN: 4/25/09, p. 12). Adding
genetic instructions to phage
DNA for making certain
molecules lets some phages
produce antibodies against
diseases such as lupus and
cancer. The technique, called
phage display, garnered an
American-British duo the
2018 Nobel Prize in chemistry
(SN: 10/27/18, p. 16).
® Excerpt from the
April 5, 1969
issue of Science News
Why scientific findings by AI can’t always be trusted
Viruses, which cannot
reproduce on their own,
infect cells and usurp their
genetic machinery for use in
making new viruses.... But
just how viruses use the cell
machinery is unknown.…
Some answers may come
from work with an unusual
virus, called M13, that has
a particularly compatible
relationship with ... [E. coli]
These contradictions arise because data-mining algorithms are designed to draw
conclusions with no uncertainty, Allen said.
“If you tell a clustering algorithm, ‘Find
groups in my dataset,’ it comes back and it
says, ‘I found some groups.’ ” Tell it to find
three groups; it finds three. Request four,
and it gives you four.
What AI should really do, Allen said, is
report something like, “I really think that
these groups of patients are really, really
grouped similarly … but these others
over here, I’m less certain
Scientists are used to
dealing with uncertainty.
But traditional uncertainty-measuring techniques are
designed for cases where
a scientist has analyzed
data that were collected
to evaluate a hypothesis.
That’s not how data-mining
AI programs usually work.
These systems have no
guiding hypotheses and
muddle through huge
datasets that are generally
collected for no single purpose.
Researchers including Allen are
designing protocols, however, to help
next-generation AI estimate the accuracy
and reproducibility of its discoveries.
One of these techniques relies on the
idea that if an AI program has made a
real discovery — like identifying clinically
meaningful patient groups — then that
finding should hold up in other datasets.
It’s generally too expensive to collect new
datasets to test what an AI has found. But
“we can perturb the [existing] data and
randomize the data in a way that mimics
[collecting] future datasets,” Allen said.
If the AI finds the same types of patient
classifications, for example, “you probably have a pretty good discovery on your
hands,” she said. — Maria Temming
Genevera Allen is devising
ways to measure uncertainty
to help AI programs gauge
whether their discoveries are
accurate and replicable.