Closing the Data Gap in Cancer Research

February 4, 2022

World Cancer Day 2022 — “Close the Care Gap”

Since its inception in 2000, World Cancer Day has brought together the international cancer community to promote awareness and educate the public on cancer prevention, detection, and treatment. Across the globe, communities gather together to advocate the importance of significantly reducing illness and death from cancer. This year’s theme, “Close the Care Gap,” aims to bridge the gap in cancer care through education and dedicated investments in cancer research. 

Investments in research continue to increase as new technologies and scientific advancements produce tremendous amounts of data in an effort to better understand cancers of all types. Cancer patients undergo a litany of scientific evaluations and procedures, each of which produces data in different forms and formats. Altogether, a patient produces over 100 million data points per day, including genomic, imaging, analytical, and real-world data/evidence (RWD/RWE) like patient records.

Healthcare data is set to exponentially increase at an unprecedented rate, and has seen an explosive growth rate of 878% since 2015, compounding a foundational challenge many organizations face — closing the gap in a chaotic data ecosystem. This blog discusses the complicated nature of cancers, the increasing data gap, and how technology aims to address it.

Research for a new wave of therapeutics 

In the United States, cancer is the second leading cause of death, with an estimate of 2 million new cases diagnosed this year. It comprises over 200 unique diseases that stem from a variety of different factors at any given point in time and is invariably connected to the individual’s unique biology. The human body is composed of trillions of cells that all go through routine cell division, growth and multiplication to replace old and/or damaged cells. When this breaks down, cancer may begin to develop. It’s an inherently complex process and we’ve only scratched the surface of truly understanding it. 

There has been tremendous momentum and funding to support cancer research and discoveries. The National Institutes of Health (NIH) and the National Cancer Institute (NCI) together have invested  nearly $110 million in cancer research through their own grant programs to date. In 2015, the Obama Administration launched the Precision Medicine Initiative to “enable a new era of medicine through research, technology, and policies that empower patients, researchers, and providers to work together toward development of individualized treatments.” Cancer research has also become the #1 investment in biotechnology and pharmaceutical R&D. In 2020, Biotechnology Innovation Organization (BIO) released an analysis that found “companies with development pipelines focused on oncology and ‘platforms’ - therapeutic modalities applied to a wide range of disease spaces - topped the list of highest-valued private companies (79%) and IPOs (59%) since 2019.”

The adoption of next-generation sequencing (NGS) in cancer research provided more information about the genetic variants that cause cancer. With sequencing costs rapidly decreasing over the years, the volume of genomic data has exploded, exposing insights into the likelihood of cancer development and identifying effective treatments for individual patients. In drug discovery, this information helps identify the proverbial “needle in a haystack,” leading to the production of more effective drugs and increased drug efficacy. 

Today, researchers splice genetic code with the utmost precision thanks to CRISPR, immunotherapies show great potential in the treatment and prevention of certain types of cancer, and advances in mRNA technology led to the creation of a global cancer vaccine market that is projected to grow to $24.32 billion by 2030

These investments in cancer research and subsequent discoveries evolved what was once a “one size fits all” approach into a multi-faceted, comprehensive, and personalized one that has the potential to drastically improve patient outcomes. Along with it, however, comes a daunting challenge to address — one that is already creating and perpetuating gaps in accessible and actionable data.

Cancer data — some missing, some big 

As the scope of research grows to drive new breakthroughs and better understand the mechanisms that underpin various cancers, new data sources arise to provide the information necessary to support new conclusions. Properly managing “big data” is a tall order with high data collection standards and data needs are forecasted to grow as technology evolves. 

Today, data comes in a smorgasbord of heterogeneous file formats that include imaging, genomics, molecular, and increasingly real-world data/evidence (RWD/RWE). RWD/RWE gathered from patient medical records play a big role in the design of clinical trials and observational studies to determine new approaches, choose the right treatment for patients, and monitor the safety of drugs. Smartphone and wearable device data show potential in providing more comprehensive insights into patient care and have started being included in clinical studies. Clinical trials are an integral aspect of cancer research, yet it’s estimated that only <5% of adult patients participate.

A recent study from the Journal of the American Medical Association (JAMA) of a cohort of >4 million patients concluded that patients with incomplete data in their medical records had a significantly lower 2-year survival rate than those with complete data, citing that the “high prevalence of missing data suggests that continued investment in data exchange standards remains an important step toward addressing the missing RWD problem for patients with cancer.” The missing data also excludes patients from large observational studies, limiting the scope of the research, possibly producing erroneous information, and omitting what could be critical insights into the success measurements of treatments. 

While missing data creates barriers in cancer research, data proliferation brings its own challenges. Data generated from a single tumor could be upwards of 1TB (terabyte), imaging data generate similar amounts, and a single human genome generates up to 200GB (gigabytes). To increase the usability of such massive volumes, the focus shifts to building and implementing more data-centric approaches to how data is collected, structured into standardized formats, and integrated. 

Access to actionable data is paramount to accelerating cancer research. Collaborative, multi-organizational initiatives like GENIE, Cancer Genome Atlas, and DepMap Portal grant researchers access to databases with vast amounts of high-quality data while other initiatives directly engage with patients to share samples and clinical data. Together, these efforts increase the inclusion of more comprehensive data and can significantly impact the future of research, treatments, and patient outcomes. 

How technology solves the data gap

Cloud technology continues to gain increasing momentum due to its unfettered ability to provide the elasticity, flexibility, and scalability required to manage the increasing scope of data and enable data-driven research. Through the cloud, Moderna was able to deliver its first COVID-19 vaccine candidate to the US National Institutes of Health only 42 days after the initial sequencing of the virus. It has the ability to speed scientific innovations faster, while simultaneously shrinking cycles, providing access to huge volumes data and the computational resources necessary to analyze it. 

New, data-centric approaches and frameworks are emerging in an effort to improve how data is collected, integrated, and harmonized — with the cloud at the forefront. For data to be truly actionable, it must be harmonized in a format that allows for it to be findable, accessible, interoperable, and reusable — the guiding principle of FAIR data. Replatforming to the cloud provides organizations with the opportunity to address challenges and improve data management to extract valuable knowledge by applying advanced analytics, like AI/ML. 

As the volume and complexity of data continues to increase significantly over the next few years, researchers will lean more and more on tools to extract and interpret valuable insights, like predicting how a certain type of cancer will progress. Even thought the inclusion of advanced analytics and AI/ML in cancer research is in the beginning stages, it has immense potential to unlock critical insights. It has the ability to identify patterns in massive datasets, make predictions in success of specific drug targets, and develop treatments more efficiently, reducing treatment costs and increasing treatment accessibility. 


The global oncology market is estimated to reach $250 billion by 2024 and the NCI estimates that cancer-related costs are projected to increase to $246 billion by 2030. With more investment made into cancer research comes more knowledge and insights to drive the emergence of novel and targeted treatments. As research continues to converge with technology at an accelerated rate, more and more approaches to data management evolve in an effort to bridge the data gap and provide access to comprehensive, actionable data. Solving the data gap will improve the care gap, ultimately improving patient care and outcomes. 

For more information on World Cancer Day, visit

No items found.