Author Sifts Research Data to Help Parents — It Shouldn’t Be This Hard

For most of the 200,000 years humans have lived and loved, parenting information has amounted to folklore. In the 20th century, experts such as Dr. Benjamin Spock and others offered theories and fads that came and went. Yet, today, no one can agree on whether you should pick up your baby when he cries or let her cry it out. Some say sleeping with your baby strengthens your bond, while others warn it puts the baby’s life in danger.

Data analysis promises to provide some solid guidance about how to raise a child. One of the pioneers in the field is Emily Oster, a Brown University economics professor. When Prof. Oster became pregnant with her first child, she was bewildered by all the contradictory parenting advice she found.

“As a parent, you want nothing more than to do right for your children, to make the best choices for them,” Oster wrote in her first book, Expecting Better: Why the Conventional Pregnancy Wisdom Is Wrong–and What You Really Need to Know. “At the same time, it can be impossible to know what those best choices are. We can do better, and data and economics, surprisingly, can help.”

Now the mother of two, Oster followed up her first book with Cribsheet: A Data-Driven Guide to Better, More Relaxed Parenting, From Birth to Preschool.

In both books, Oster combed through research studies to examine common pregnancy advice. “The approach of the book is to read the literature and synthesize,” Oster told Seagate. “For the most part, I did not do any of my own data analysis on these questions. There are a few places in the book where I actually did some of the research, and one example where I did collect some of my own data as an exercise. … But mostly it is reading and sorting various studies.”

While Oster wanted to arm parents with information and let them make their own decisions, she seldom found definitive answers to show that a particular practice was definitely harmful or essential.

Fragmented data

Oster’s work identifies a huge barrier to using data to inform real-world practices: fragmented data. While many enterprises still struggle to reduce or eliminate siloed information, in scientific research, the situation is much worse. Each study is, in essence, its own silo. While there are some services that aggregate data on what research studies have been done on a particular subject — such as Scopus, Dimensions, and the Web of Science — the results of each individual effort are typically self-contained.

Oster read individual studies, assessed their methodology, and decided how definitive they were. For example, pregnant women are routinely advised to abstain from all caffeine. After she evaluated the research, she concluded there was no scientific evidence that a woman who drinks one or two cups of coffee a day endangers her pregnancy.

For Oster to come up with an answer to the coffee question was a labor intensive process. Of course, it’s not possible to use this kind of manual data analysis to answer every question about pregnancy, childbirth, and parenting.

Fixing academic research silos

Oster’s data dilemma is hardly unique. Many scientists still store data in spreadsheets, and, when a trial is completed and the results have been published, they don’t often share data outside of the collaborators on the study, according to a study conducted at the University of North Carolina.

The National Institutes of Health requested public comment last year on revisions to its data-sharing policy. The goal is to improve the ability of researchers to share and access research data. While the NIH has required applicants for large grants to include a data-sharing strategy since 2003, research results are still not widely shared.

A study by Johns Hopkins University, for example, found that only a third of academic institutions have a policy on reporting research results. While reporting and sharing data remains a work in progress, today, natural language processing and the machine learning branch of artificial intelligence (AI) are now helping researchers automate the process of extracting information from unstructured research publications.

Artificial Intelligence will be key

AI solves the one-woman data-crunching challenge that Oster faced by automating literature searches, parsing unstructured information, and even generating new hypotheses to investigate, notes a report in Nature.

What’s more, machine learning algorithms can be trained to extract content and filter, rank, and group search results. Euretos is one early-stage tool that combines natural language processing and structured keyword lists to reveal connections among disparate studies that, for example, could identify a disease that a particular drug might treat.

Another new tool, SourceData, is making the content within the charts and figures in research papers discoverable. When built out, it will extract items such as molecules, genes or organisms from figures and captions to let researchers identify papers that address a question—like, “Does caffeine have an effect on pregnancy?”

New tools can save research in the Data Age

New tools that are helping automate academic research are also being put to work by the private sector. Healthcare companies, for instance, are developing new strategies to capture data digitally.

Clinical data management systems (CDMS), for example, are replacing spreadsheets as a way to manage the data from clinical trials, according to Science Direct. But these systems are still mostly used by large pharmaceutical companies doing clinical trials, because they’re expensive and require significant IT resources.

New technology is also finding its way into the research world. A recent study published in MobiHealthNews found 64 percent of researchers have used digital health tools, such as wearable sensors, remote monitoring devices, and mobile apps in their clinical trials, and 97 percent plan to use these tools in the next five years.

Efforts are also underway to aggregate data from various different research efforts. For example, AllTrials, an international initiative, calls for a registry of all clinical trials that includes full methods and the results. Such a registry would provide a single database in which AI could be put to work to help with analysis.

“Across the information value chain – from reading the data, to preparing it, to critically analyzing it with less bias, to presenting contextual results – AI can and will remove many of the bottlenecks that make users give up when tackling intensive data exploration,” says Dan Sommers, senior director and market intelligence lead at Qlik, a data analytics platform.

When it comes to parenting advice, AI might just enable the next author to derive new pearls of wisdom from automated data analysis, and rely less on the fruits of manual number crunching.