What is a True Data Integration, Anyway?

Spin Wang
March 26, 2021

Data Integration is one of the biggest challenges for the life sciences industry in its journey to leverage AI/ML. Data gets stuck in proprietary, vendor-proprietary and -specific formats and interfaces. Data silos are connected via rigid, one-off point-to-point and unsustainable connections. As the number of disparate systems increases exponentially - thanks to equipment obsoletion, new instrument models, evolving data standards, acquisitions, and a host of other factors - internal maintenance of this spider web of connections quickly exceeds any life sciences org’s internal IT capabilities.

Here, we’ll introduce our definition of a true integration with an R&D data system, in particular with a piece of lab instrument or instrument software. We’ll also explain why our audacious approach - building and maintaining an expanding library of agents, connectors, and apps - can unify ALL pharma and biotech data silos, something that is the foundation for accelerating drug discovery and R&D.

How can we knit together disparate instruments and software systems into a logical platform?

Fragmented Ecosystem 

Let’s start from an assumption we believe is well-understood: life sciences R&D data systems are notoriously fragmented and heterogeneous. We’ve touched on this before (post and video).

Systems diverge on:

  • File formats - 20+ in common usage for mass spectrometry, alone
  • Instrument data schemas
  • Physical and data interfaces (OPC/UA, API, Software Toolkit, Serial Port, etc.)
  • FDA submission standards
  • Terminology differences between ELNs, LIMS, LES (workflow, reaction, scheme, process)
  • And numerous other variables

AI/ML and digital transformation depend on data liquidity

Do you want clean, curated data sets, with consistent headers, attuned to FAIR standards, auditable, traceable, and portable? The only way to enable AI/ML will be to have high quality data. To achieve this data liquidity, your divergent instrument outputs must “flow” across the system, connecting the disjointed landscape via a vendor-agnostic open network.

What is a Tetra Integration?

So what is TetraScience’s definition of a true R&D Data Integration, that enables automation, unites diverse data systems, and enables AI/ML? 

A true R&D Data Integration with a particular data system needs to be able to surpass a high bar. It must be:

  • Configurable and productized: with flexible knobs to adjust the behavior of pulling data from sources or pushing data into data targets. The configuration should be documented, tailored to the particular data system, generally achievable with a couple of buttons and via a central configuration portal remotely and securely without having to log into the on-prem IT environments
  • Bidirectional: the integration should support both pulling data from the data system and also pushing the data into the data system, thus treating a system as a data source and data target simultaneously and enabling the full Design Make Test Analysis (DMTA) cycle of any scientific discovery
  • Automated: with no or minimal manual intervention to grab all data from the data source or pushing the data to targets, detect new data automatically to capture every change in the data
  • Compliant: all changes to the integration including configuration are traced in the user action audit trail, operations on the data set are logged and can be associated to provide a fully traceable and transparent history of the data set, enabling fully GxP validatable solutions
  • Complete: if the integration is designed to extract data from a data source, the integration needs to extract ALL the scientifically meaningful information possible, including raw data, processed results, context of the experiment, such as sample, system, users, etc.
  • Enable true data liquidity: the integration must not simply stop at moving the data around, it must also harmonize the data to vendor-agnostic and data science compatible formats so that data can be consumed or pushed to any data targets and freely flow across the systems
  • Chainable: the output of one integration can trigger the next integration, for example, data pulled from an ELN via the integration, can trigger to a push integration to the instrument control software, thus avoid manual transcribing the batch or sample list; conversely, data pulled from the instruments via the integration, can further trigger the push integration to submit the data to the ELN or LIMS

A true R&D Data Integration must necessarily be “full-stack and purpose-built” — configurable data collection, harmonization to vendor-neutral formats, preparation for analytics consumption, automated push to data targets, and tailored to the system and related scientific use cases — so that scientists and data scientists can access and take actions on previously siloed data in order to accelerate discovery.

A true R&D Data Integration can be achieved via a combination of Tetra agents, connectors, and pipelines depending on the specific data systems. For example: 

  • To integrate with Waters Empower, we leverage a data agent to draw information from a Chromatography Data System (CDS) using a vendor toolkit
  • To integrate with NanoTemper Prometheus, we leverage the Tetra file-log agent and Tetra Data Pipelines
  • To integrate with Solace, AGU SDC, we use RESTful API services to build connectors
  • To integrate with osmometers, blood gas analyzers, or shaking incubators, we use an IoT agent to stream continuous data from a physical, mounted instrument to the cloud through secure MQTT 
  • To integrate with Benchling, IDBS eWorkbook, Dotmatics Studies, PerkinElmer Signals and to push data into the ELNs, we use Tetra Data Pipelines

Tetra’s “Special Sauce”: Productized and data-centric integrations

Important Note: When we use the term true R&D Data Integration, we reject simple “drag-and-drop” of instrument RAW files into a data lake. To meet our criteria and quality standards, we must contextually transform source data into a harmonized format, like JSON. These integrations are differentiators for the Tetra Data Platform; to us, if you’re moving files without true parsing and interpretation, no value is added.

Differentiated from LIMS/SDMS: IoT agent and software agent/connector

Most life sciences data integrations are performed by LIMS and SDMS software, which need to bring data from different sources in ELN/LIMS for tracking and reporting, and to SDMS for storage. LIMS and SDMS rely on two major methods: 

  • Serial to ethernet adaptor for instruments such as osmometer and analyzer
  • File-based export and import 

While these may be viable options, they are far from optimal for an organization trying to upgrade to an Industry 4.0 motif. Consider the following scenarios:

ELN, LIMS and SDMS have been traditionally relying on file export from the instrument for the majority of their integrations with the instruments and instrument control / processing software.

Data harmonization for data science and data liquidity

Extraction from source systems is insufficient to claim true integration. For example, imagine a data scientist has access to thousands of .pdfs from a Malvern Particle Sizer, or thousands of mass spec binary files from Waters MassLynx, or TA instruments differential scanning calorimetry (DSC) binary files; these formats can’t unlock the value of the data and impact R&D.

Other than the file name and path, these binary files are essentially meaningless to other data analytics applications. These data need to be further harmonized into our Intermediate Data Schema (IDS), based on JSON and Parquet, to truly allow any applications to consume the data and R&D teams to apply their data science tools. 

Fostering a community based on data liquidity 

TetraScience has taken on the challenging, audacious stance of building out, maintaining, and even upgrading sophisticated integrations; we believe this to be a first in the life sciences data industry, which has long suffered from the vendor data silo problem:  

  • An instrumental OEM’s primary business driver involves selling instruments and consumables
  • An informatics application provider’s  primary goal is to get more data flowing into its own software

The R&D Data Cloud and companion Tetra Integrations are entirely designed to serve the data itself, liberating it without introducing any proprietary layer of interpretation.  If your software can read JSON or Parquet, and talk to SQL, you can immediately benefit from the Tetra Integration philosophy.

Our cloud-native and Docker-based platform allows us to leverage the entire industry’s momentum to rapidly develop, test, and enhance these integrations with real customer feedback. Rapid iteration and distribution of consistent, reproducible integrations across our customer base introduces more use cases, more test cases, and more battle-tested improvements for the entire scientific community. 

Check out some of our Tetra Integrations, and request an integration for your team right on that page. We're always interested in hearing from you!

Share this article

Previous post

There is no previous post
Back to all posts
September 15, 2022

Creating a Treasure Trove of Scientific Insights

Read Blog
September 8, 2022

Pragmatic Compliance Solutions: Adding Value Effectively to GxP

Read Blog
August 10, 2022

Reinvented Resource Management Powers Innovation Cycles

Read Blog
August 11, 2022

Introducing Tetra Data Platform v3.3

Read Blog
August 4, 2022

Automating qPCR Workflows for Better Scientific Outcomes

Read Blog
July 28, 2022

3 Ghosts of Data Past (and how to eliminate them)

Read Blog
July 26, 2022

Science at Your Fingertips - Across the Enterprise

Read Blog
July 22, 2022

Building The Digital CDMO with TetraScience

Read Blog
June 27, 2022

Barrier Busting: Bringing ELN and LIMS Scientific Data Together

Read Blog
May 31, 2022

Committed to Curing Diabetes

Read Blog
May 23, 2022

New Frontiers: World’s First Community-Driven AI Store for Biology

Read Blog
May 18, 2022

Tetra Blasts Off at Boston’s Bio-IT World

Read Blog
May 9, 2022

Give Your in vivo Data the Attention it Deserves

Read Blog
May 2, 2022

Customizing Digital Lab Experiences With Ease

Read Blog
April 14, 2022

Sharing a Vision and Deep Customer Commitment

Read Blog
April 11, 2022

Escaping the Scientific Data Quagmire

Read Blog
April 1, 2022

Innovating with a HoloLens and Drones

Read Blog
April 6, 2022

Digital Twins: Seeing Double with a Predictive Eye

Read Blog
March 28, 2022

Automated Anomaly Detection and Correction

Read Blog
March 30, 2022

Making Labs More Efficient

Read Blog
March 4, 2022

Introducing Tetra Data Platform v3.2

Read Blog
March 2, 2022

Are you prepared to utilize ML/AI and Data Visualization?

Read Blog
February 22, 2022

SLAS 2022: The Industry’s “Hyped” for Accessible and Actionable Scientific Data

Read Blog
February 21, 2022

BIOVIA partners with TetraScience

Read Blog
February 16, 2022

Tetra Partner Network: An Interview with Klemen Zupancic, CEO, SciNote

Read Blog
February 4, 2022

Closing the Data Gap in Cancer Research

Read Blog
January 27, 2022

Waters & The Tetra Partner Network: Making Data Science Possible

Read Blog
December 16, 2021

Announcing Acquisition of Tetra Lab Monitoring Business by Elemental Machines

Read Blog
November 29, 2021

Move From Fractal to Flywheel with The Tetra Partner Network

Read Blog
March 26, 2021

How an IDS Complements Raw Experimental R&D Data in the Digital Lab

Read Blog
July 30, 2021

What is an R&D Data Cloud? (And Why Should You Care?)

Read Blog
March 26, 2021

What is a True Data Integration, Anyway?

Read Blog
June 1, 2020

Data Science Use Cases for the Digital Lab: Novel Analyses with Waters Empower CDS Data

Read Blog
April 20, 2022

Unlock the Power of Your ELN and LIMS

Read Blog
July 23, 2020

The Science Behind Trash Data

Read Blog
August 20, 2021

The 4 Keys to Unlock the Lab of the Future

Read Blog
September 29, 2021

TetraScience Achieves SOC 2 Type 2 Validation, Advances R&D Data Cloud GxP Compliance Capabilities

Read Blog
April 20, 2020

Round-up of Semantic Web thought leadership articles

Read Blog
May 11, 2021

R&D Data Cloud: Moving Your Digital Lab Beyond SDMS

Read Blog
September 10, 2021

Principles of Deep Learning Theory

Read Blog
July 8, 2020

Powering Bioprocessing 4.0 for Therapeutic Development

Read Blog
March 30, 2022

Why Biopharma Needs an End-to-End, Purpose-Built Platform for Scientific Data — Part 2

Read Blog
August 19, 2021

Part 2: How TetraScience Approaches the Challenge of Scaling True Scientific Data Integrations

Read Blog
March 23, 2022

Why Biopharma Needs an End-to-End, Purpose-Built Platform for Scientific Data — Part 1

Read Blog
January 18, 2021

New Matter: Inside the Minds of SLAS Scientists Podcast

Read Blog
June 29, 2020

Enabling Compliance in GxP Labs

Read Blog
May 14, 2020

LRIG-New England: Lunchtime Virtual Rapid Fire Event - May 26, 2020

Read Blog
June 10, 2020

Remote Lab Scheduling is No Longer Optional, it is a Requirement

Read Blog
August 2, 2020

Incident Reporting for GxP Compliance

Read Blog
October 15, 2020

Protein Purification with Cytiva UNICORN: Enhanced Analytics through Harmonization and Integration

Read Blog
July 29, 2020

Cloud-based Data Management with Lab Automation: HighRes Biosolutions Cellario + TetraScience

Read Blog
August 20, 2020

Understanding Why Freezer Temperatures May Not Be Uniform

Read Blog
July 14, 2021

Find Experimental Data Faster with Google-Like Search in Tetra Data Platform 3.1 Release

Read Blog
July 22, 2021

Experimental Data in Life Sciences R&D — It’s How Many Copies of Jaws?!

Read Blog
April 26, 2020

The Digital Lab Needs an Intermediate Data Schema (IDS): a First Principle Analysis

Read Blog
April 6, 2020

TetraScience ADF Converter -- Delivering on the Promise of Allotrope and a Startup’s Journey

Read Blog
August 6, 2020

"Data Plumbing" for the Digital Lab

Read Blog
June 8, 2020

Data Automation for High-Throughput Screening with Dotmatics, Tecan, and PerkinElmer Envision

Read Blog
May 15, 2020

Applying Data Automation and Standards to Cell Counter Files

Read Blog
June 11, 2020

AWS Healthcare & Life Sciences Web Day | Virtual Industry Event

Read Blog
February 12, 2021

AWS Executive Conversations: Evolving R&D

Read Blog
April 15, 2021

Announcing Our Series B: The What, When, Why, Who, and Where

Read Blog
April 15, 2021

Announcing our Series B: The DNA Markers of Category Kings and Queens

Read Blog
April 15, 2021

Announcing our Series B: Tetra 1.0 and 2.0 | The Noise and the Signal

Read Blog
March 29, 2020

Allotrope Leaf Node Model — a Balance between Practical Solution and Semantics Compatibility

Read Blog
March 13, 2020

Choose the right alert set points for your freezers, refrigerators, and incubators

Read Blog
August 27, 2020

99 Problems, but an SDMS Ain't One

Read Blog