Start with a question.
For centuries, that’s what the scientific method has told us to do. Then, research that question and turn it into a hypothesis to either prove or debunk.
But the power of today’s technology has jumbled up the order of that classical approach. Thanks to computers and the Internet gathering endless amounts of information, massive sets of “big data” are often compiled before any questions are posed.
“In contemporary settings, people often collect data in order to generate hypotheses,” explained Daniela Witten, an associate professor of statistics and biostatistics.
Researchers faced with big data sets often don’t know what they’re looking for — they just know that questions lurk inside. “They want to sort through the data and see if there are interesting patterns they can identify,” Witten said.
That’s where statistics come in. Witten develops algorithms that dig through big data to locate meaningful information — to find a signal in the noise.
One project she’s working on is a collaboration with researchers in the Department of Genome Sciences, whose labs can sequence the three billion base pairs of the human genome. The problem with that, Witten says, is that it creates three billion potential questions. “We can think, ‘Maybe this base pair is associated with your risk of disease, or maybe that one is,’ and so on,” she said.
The solution? Narrow down the data. With an algorithm, Witten and her collaborators can rank the billions of base pairs by their association with diseases. Used in a clinical setting, that could allow doctors to look directly at the high-risk regions of a patient’s DNA (the signal), and ignore the rest (the noise).
A lot of this is unchartered territory, and the ground shifts from one project to the next.
“The field of statistics is still trying to figure out how to analyze these big data sets,” Witten said. “Since the time I started my Ph.D., the type of data we’re seeing has changed so much. As a result, the questions have changed, and the statistical methods have changed. It’s really an exciting time to be doing this.”
Witten has a B.S. and a Ph.D. from Stanford University. She is a 2011 recipient of the National Institute of Health’s Early Independence Award, a 2013 Sloan Research follow, a 2013 National Science Foundation CAREER Awardee, and a three-time member of Forbes Magazine’s 30 Under 30.