Closing the Data Gap in Cancer Research

Victoria Janik
|
February 4, 2022

World Cancer Day 2022 — “Close the Care Gap”

Since its inception in 2000, World Cancer Day has brought together the international cancer community to promote awareness and educate the public on cancer prevention, detection, and treatment. Across the globe, communities gather together to advocate the importance of significantly reducing illness and death from cancer. This year’s theme, “Close the Care Gap,” aims to bridge the gap in cancer care through education and dedicated investments in cancer research. 

Investments in research continue to increase as new technologies and scientific advancements produce tremendous amounts of data in an effort to better understand cancers of all types. Cancer patients undergo a litany of scientific evaluations and procedures, each of which produces data in different forms and formats. Altogether, a patient produces over 100 million data points per day, including genomic, imaging, analytical, and real-world data/evidence (RWD/RWE) like patient records.

Healthcare data is set to exponentially increase at an unprecedented rate, and has seen an explosive growth rate of 878% since 2015, compounding a foundational challenge many organizations face — closing the gap in a chaotic data ecosystem. This blog discusses the complicated nature of cancers, the increasing data gap, and how technology aims to address it.

Research for a new wave of therapeutics 


In the United States, cancer is the second leading cause of death, with an estimate of 2 million new cases diagnosed this year. It comprises over 200 unique diseases that stem from a variety of different factors at any given point in time and is invariably connected to the individual’s unique biology. The human body is composed of trillions of cells that all go through routine cell division, growth and multiplication to replace old and/or damaged cells. When this breaks down, cancer may begin to develop. It’s an inherently complex process and we’ve only scratched the surface of truly understanding it. 

There has been tremendous momentum and funding to support cancer research and discoveries. The National Institutes of Health (NIH) and the National Cancer Institute (NCI) together have invested  nearly $110 million in cancer research through their own grant programs to date. In 2015, the Obama Administration launched the Precision Medicine Initiative to “enable a new era of medicine through research, technology, and policies that empower patients, researchers, and providers to work together toward development of individualized treatments.” Cancer research has also become the #1 investment in biotechnology and pharmaceutical R&D. In 2020, Biotechnology Innovation Organization (BIO) released an analysis that found “companies with development pipelines focused on oncology and ‘platforms’ - therapeutic modalities applied to a wide range of disease spaces - topped the list of highest-valued private companies (79%) and IPOs (59%) since 2019.”

The adoption of next-generation sequencing (NGS) in cancer research provided more information about the genetic variants that cause cancer. With sequencing costs rapidly decreasing over the years, the volume of genomic data has exploded, exposing insights into the likelihood of cancer development and identifying effective treatments for individual patients. In drug discovery, this information helps identify the proverbial “needle in a haystack,” leading to the production of more effective drugs and increased drug efficacy. 

Today, researchers splice genetic code with the utmost precision thanks to CRISPR, immunotherapies show great potential in the treatment and prevention of certain types of cancer, and advances in mRNA technology led to the creation of a global cancer vaccine market that is projected to grow to $24.32 billion by 2030

These investments in cancer research and subsequent discoveries evolved what was once a “one size fits all” approach into a multi-faceted, comprehensive, and personalized one that has the potential to drastically improve patient outcomes. Along with it, however, comes a daunting challenge to address — one that is already creating and perpetuating gaps in accessible and actionable data.

Cancer data — some missing, some big 

As the scope of research grows to drive new breakthroughs and better understand the mechanisms that underpin various cancers, new data sources arise to provide the information necessary to support new conclusions. Properly managing “big data” is a tall order with high data collection standards and data needs are forecasted to grow as technology evolves. 

Today, data comes in a smorgasbord of heterogeneous file formats that include imaging, genomics, molecular, and increasingly real-world data/evidence (RWD/RWE). RWD/RWE gathered from patient medical records play a big role in the design of clinical trials and observational studies to determine new approaches, choose the right treatment for patients, and monitor the safety of drugs. Smartphone and wearable device data show potential in providing more comprehensive insights into patient care and have started being included in clinical studies. Clinical trials are an integral aspect of cancer research, yet it’s estimated that only <5% of adult patients participate.

A recent study from the Journal of the American Medical Association (JAMA) of a cohort of >4 million patients concluded that patients with incomplete data in their medical records had a significantly lower 2-year survival rate than those with complete data, citing that the “high prevalence of missing data suggests that continued investment in data exchange standards remains an important step toward addressing the missing RWD problem for patients with cancer.” The missing data also excludes patients from large observational studies, limiting the scope of the research, possibly producing erroneous information, and omitting what could be critical insights into the success measurements of treatments. 

While missing data creates barriers in cancer research, data proliferation brings its own challenges. Data generated from a single tumor could be upwards of 1TB (terabyte), imaging data generate similar amounts, and a single human genome generates up to 200GB (gigabytes). To increase the usability of such massive volumes, the focus shifts to building and implementing more data-centric approaches to how data is collected, structured into standardized formats, and integrated. 

Access to actionable data is paramount to accelerating cancer research. Collaborative, multi-organizational initiatives like GENIE, Cancer Genome Atlas, and DepMap Portal grant researchers access to databases with vast amounts of high-quality data while other initiatives directly engage with patients to share samples and clinical data. Together, these efforts increase the inclusion of more comprehensive data and can significantly impact the future of research, treatments, and patient outcomes. 

How technology solves the data gap

Cloud technology continues to gain increasing momentum due to its unfettered ability to provide the elasticity, flexibility, and scalability required to manage the increasing scope of data and enable data-driven research. Through the cloud, Moderna was able to deliver its first COVID-19 vaccine candidate to the US National Institutes of Health only 42 days after the initial sequencing of the virus. It has the ability to speed scientific innovations faster, while simultaneously shrinking cycles, providing access to huge volumes data and the computational resources necessary to analyze it. 

New, data-centric approaches and frameworks are emerging in an effort to improve how data is collected, integrated, and harmonized — with the cloud at the forefront. For data to be truly actionable, it must be harmonized in a format that allows for it to be findable, accessible, interoperable, and reusable — the guiding principle of FAIR data. Replatforming to the cloud provides organizations with the opportunity to address challenges and improve data management to extract valuable knowledge by applying advanced analytics, like AI/ML. 

As the volume and complexity of data continues to increase significantly over the next few years, researchers will lean more and more on tools to extract and interpret valuable insights, like predicting how a certain type of cancer will progress. Even thought the inclusion of advanced analytics and AI/ML in cancer research is in the beginning stages, it has immense potential to unlock critical insights. It has the ability to identify patterns in massive datasets, make predictions in success of specific drug targets, and develop treatments more efficiently, reducing treatment costs and increasing treatment accessibility. 

Conclusion

The global oncology market is estimated to reach $250 billion by 2024 and the NCI estimates that cancer-related costs are projected to increase to $246 billion by 2030. With more investment made into cancer research comes more knowledge and insights to drive the emergence of novel and targeted treatments. As research continues to converge with technology at an accelerated rate, more and more approaches to data management evolve in an effort to bridge the data gap and provide access to comprehensive, actionable data. Solving the data gap will improve the care gap, ultimately improving patient care and outcomes. 

For more information on World Cancer Day, visit https://www.worldcancerday.org/

Share this article

Previous post

There is no previous post
Back to all posts
June 27, 2022

Barrier Busting: Bringing ELN and LIMS Scientific Data Together

Read Blog
May 31, 2022

Committed to Curing Diabetes

Read Blog
May 23, 2022

New Frontiers: World’s First Community-Driven AI Store for Biology

Read Blog
May 18, 2022

Tetra Blasts Off at Boston’s Bio-IT World

Read Blog
May 9, 2022

Give Your in vivo Data the Attention it Deserves

Read Blog
May 2, 2022

Customizing Digital Lab Experiences With Ease

Read Blog
April 14, 2022

Sharing a Vision and Deep Customer Commitment

Read Blog
April 11, 2022

Escaping the Scientific Data Quagmire

Read Blog
April 1, 2022

Innovating with a HoloLens and Drones

Read Blog
April 6, 2022

Digital Twins: Seeing Double with a Predictive Eye

Read Blog
March 28, 2022

Automated Anomaly Detection and Correction

Read Blog
March 30, 2022

Making Labs More Efficient

Read Blog
March 4, 2022

Introducing Tetra Data Platform v3.2

Read Blog
March 2, 2022

Are you prepared to utilize ML/AI and Data Visualization?

Read Blog
February 22, 2022

SLAS 2022: The Industry’s “Hyped” for Accessible and Actionable Scientific Data

Read Blog
February 21, 2022

BIOVIA partners with TetraScience

Read Blog
February 16, 2022

Tetra Partner Network: An Interview with Klemen Zupancic, CEO, SciNote

Read Blog
February 4, 2022

Closing the Data Gap in Cancer Research

Read Blog
January 27, 2022

Waters & The Tetra Partner Network: Making Data Science Possible

Read Blog
December 16, 2021

Announcing Acquisition of Tetra Lab Monitoring Business by Elemental Machines

Read Blog
November 29, 2021

Move From Fractal to Flywheel with The Tetra Partner Network

Read Blog
March 26, 2021

How an IDS Complements Raw Experimental R&D Data in the Digital Lab

Read Blog
July 30, 2021

What is an R&D Data Cloud? (And Why Should You Care?)

Read Blog
March 26, 2021

What is a True Data Integration, Anyway?

Read Blog
June 1, 2020

Data Science Use Cases for the Digital Lab: Novel Analyses with Waters Empower CDS Data

Read Blog
April 20, 2022

Unlock the Power of Your ELN and LIMS

Read Blog
July 23, 2020

The Science Behind Trash Data

Read Blog
August 20, 2021

The 4 Keys to Unlock the Lab of the Future

Read Blog
September 29, 2021

TetraScience Achieves SOC 2 Type 2 Validation, Advances R&D Data Cloud GxP Compliance Capabilities

Read Blog
April 20, 2020

Round-up of Semantic Web thought leadership articles

Read Blog
May 11, 2021

R&D Data Cloud: Moving Your Digital Lab Beyond SDMS

Read Blog
September 10, 2021

Principles of Deep Learning Theory

Read Blog
July 8, 2020

Powering Bioprocessing 4.0 for Therapeutic Development

Read Blog
March 30, 2022

Why Biopharma Needs an End-to-End, Purpose-Built Platform for Scientific Data — Part 2

Read Blog
August 19, 2021

Part 2: How TetraScience Approaches the Challenge of Scaling True Scientific Data Integrations

Read Blog
March 23, 2022

Why Biopharma Needs an End-to-End, Purpose-Built Platform for Scientific Data — Part 1

Read Blog
January 18, 2021

New Matter: Inside the Minds of SLAS Scientists Podcast

Read Blog
June 29, 2020

Enabling Compliance in GxP Labs

Read Blog
May 14, 2020

LRIG-New England: Lunchtime Virtual Rapid Fire Event - May 26, 2020

Read Blog
June 10, 2020

Remote Lab Scheduling is No Longer Optional, it is a Requirement

Read Blog
August 2, 2020

Incident Reporting for GxP Compliance

Read Blog
October 15, 2020

Protein Purification with Cytiva UNICORN: Enhanced Analytics through Harmonization and Integration

Read Blog
July 29, 2020

Cloud-based Data Management with Lab Automation: HighRes Biosolutions Cellario + TetraScience

Read Blog
August 20, 2020

Understanding Why Freezer Temperatures May Not Be Uniform

Read Blog
July 14, 2021

Find Experimental Data Faster with Google-Like Search in Tetra Data Platform 3.1 Release

Read Blog
July 22, 2021

Experimental Data in Life Sciences R&D — It’s How Many Copies of Jaws?!

Read Blog
April 26, 2020

The Digital Lab Needs an Intermediate Data Schema (IDS): a First Principle Analysis

Read Blog
April 6, 2020

TetraScience ADF Converter -- Delivering on the Promise of Allotrope and a Startup’s Journey

Read Blog
August 6, 2020

"Data Plumbing" for the Digital Lab

Read Blog
June 8, 2020

Data Automation for High-Throughput Screening with Dotmatics, Tecan, and PerkinElmer Envision

Read Blog
May 15, 2020

Applying Data Automation and Standards to Cell Counter Files

Read Blog
June 11, 2020

AWS Healthcare & Life Sciences Web Day | Virtual Industry Event

Read Blog
February 12, 2021

AWS Executive Conversations: Evolving R&D

Read Blog
April 15, 2021

Announcing Our Series B: The What, When, Why, Who, and Where

Read Blog
April 15, 2021

Announcing our Series B: The DNA Markers of Category Kings and Queens

Read Blog
April 15, 2021

Announcing our Series B: Tetra 1.0 and 2.0 | The Noise and the Signal

Read Blog
March 29, 2020

Allotrope Leaf Node Model — a Balance between Practical Solution and Semantics Compatibility

Read Blog
March 13, 2020

Choose the right alert set points for your freezers, refrigerators, and incubators

Read Blog
August 27, 2020

99 Problems, but an SDMS Ain't One

Read Blog