August 27, 2020
Data Integration

99 Problems, but an SDMS Ain't One

Break down data silos and make managing scientific data easier


Authors: Mike Tarselli, CSO, Spin Wang, CEO

Data volumes in biology, chemistry, and materials science continue to explode. This leads to rising complexity for processing workflows: legacy systems just can’t keep up with innovation demand. Holistic data solutions, coupled with data-centric capabilities to enable the full lifecycle of R&D data, bring these glaring limitations to light. Can we stem the data tide?

Open, configurable data solutions with vendor-agnostic connectivity will empower life science organizations to advance discovery and innovation.[1]

Let’s consider how a scientific data management system (SDMS) might impair the critical innovation path instead of facilitating an open, interactive data future.

What is an SDMS, anyway?

So many letters, so little time - an SDMS is software acting like a filing cabinet. These capture, catalogue, and archive all versions of data from instruments -- HPLCs, mass specs, flow cytometers, sequencers -- and scientific applications like LIMS or ELNs. The extracted data are stored and handled in context - and process-specific data formats, which usually maintain consistent metadata and a defined structure.

Essentially, it’s a data warehouse with known limitations: potential incompatibility with heterogeneous data landscapes, ingestion of new experimental data, or data prep for downstream analysis tools. This perpetuates a common biotech Achilles’ heel - the dreaded data silo.

Silos-for-SDMS-blog--2-

Break out of the SDMS and eliminate all your woes

Woe #1: Missed connections
An SDMS doesn’t play well with others: no linking of disparate data sources and targets, like devices, external collaborators, or software systems (ELNs, database, etc.).

Open access to more information
The Tetra R&D Data Cloud easily ingests heterogeneous data sources across the R&D ecosystem. By unifying data in the cloud via native connections, scientists are freed from the data yoke.

Woe #2: Wrangling Data
Data harmonization involves converting vendor-specific or proprietary formats, often manual steps today. An SDMS doesn’t have the functionality to automatically harmonize data into query-ready formats or easy application to visualization or analyses.

Standardize, Harmonize
Automatically harmonize across vendors and formats connecting experimental data into a structured open format -- an Intermediate Data Schema (IDS) or Parquet -- enabling data science and Big Data applications. This means disparate sources can finally be united in standard formats, allowing you to glean insights from all data captured. This wrings value from your precious data, no matter where it came from.

Woe #3: Too much static
An SDMS is a stale, limited repository with no customization opportunities to open your data flows. It simply cannot handle the evolving complexity of research projects.[2]

Let it flow
Comprehensive data engineering capabilities will streamline data prep for AI/ML and advanced analytics. Data flows seamlessly throughout the development process and allows for easy iteration. The TetraScience Data Platform allows Pharmas to create their own processing pipelines, configure triggers, view status, files, and automate notifications via a modern centralized dashboard...a literal control nexus for all your operations.

Woe #4: Stuffing the cabinet
Integration limitation with the myriad file formats found in R&D means SDMSs can easily become yet another data silo, eliminating straightforward data management or collaboration.

Declutter and streamline
Automatic centralization of results in the Tetra R&D Data Cloud enables seamless access, ensures data integrity, and allows for secure collaboration anywhere.

Woe #5: You're gonna need a bigger boat!
R&D data is Big Data.[3] On-premise SDMS can’t digest data scale and complexity if organizations are not able to leverage the digital benefits of the cloud. Without the flexibility needed or the ability to scale, innovation plummets. To swim this data lake, you’re gonna need a bigger boat.

Floating in the cloud...on a yacht
TetraScience’s cloud-native architecture provides organizations the right tools to tackle the complexities in R&D. It makes it easy to manage high volumes of data and complicated workflows with access to the data in a secure environment.

Summary

Certainly, an SDMS will work for your organization if you only use one data format or just need a place to stash your files. It’s simple, easy, and stable...but also capability limited. R&D data are complex by nature. Complex challenges require novel approaches. Holistic data solutions that are adaptable, flexible, and scalable optimize data management and eliminate data silos. Get rid of outdated systems that perpetuate roadblocks and cause data woes - and say “Whoa!” to the breakneck speed of digital transformation!

Follow TetraScience for ongoing updates about R&D data in life sciences and other related topics:

  1. N. Limaye, "Data Integration: Changing the Pharma and Healthcare Landscape," www.Technologynetworks.com/biopharma, 27 February 2020
  2. T. Broekel, "Measuring technological complexity - Current approaches and a new measure of structural complexity," Utrecht University, 12 March 2018
  3. J. Cumbers, "How The Cloud Can Solve Life Science's Big Data Problem," www.Forbes.com, 19 December 2019

Contact a Solution Architect

Build an Integration
Mike Tarselli, PH.D.
Chief Scientific Officer, TetraScience. Mike remains curious about chemistry, drug development, data flows, and scientific collaboration.

Read more posts by this author