Data Science Use Cases for the Digital Lab: Novel Analyses with Waters Empower CDS Data

Spin Wang
|
June 1, 2020
  • Biopharma R&D has not realized the full power of the data it possesses and generates every day. There are advanced, easy-to-use data science tools and there is a vast amount of data. But these two are disconnected; the data is locked in on-premise silos and heterogenous formats, and often lacks the interface needed to derive insights at scale.
  • This “Data Science Use Cases for the Digital Lab” series of blog posts shares some obvious and non-obvious ways top pharmaceutical organizations are applying data science to extract novel insights from their R&D data.
  • We crowdsource use cases through our partnerships with top pharmaceutical organizations, enabled by our ever-growing network of connections to common data sources and data science tools, and our cloud-native platform automatically collects, centralizes, harmonizes, and prepares R&D data for analysis.
  • Use cases in this blog post include: analytical method performance | column degradation | instrument usage and operational insights | stability studies trending | chromatogram overlay | quality procedure assessment | metadata quality reporting.

Overview

The use of Data Science across all industries has been rising in recent years. At the same time, modern laboratories have experienced massive growth in data volumes. In a single pharmaceutical company, there can be hundreds of High Performance Liquid Chromatography (HPLC) instruments used by process development, quality control, downstream bioprocess, manufacturing, and many other stages of drug R&D. It is common to have millions or tens of millions of injections that need to be available for analysis, and thousands of new injections produced daily.

Waters Empower is one of the most widely used chromatography data systems (CDS). CDS controls chromatographic instruments like HPLCs, runs the injections, collects the raw data, and performs analysis on the chromatogram to detect peaks.

As with most R&D instruments and their control software, accessing the data contained within the Waters Empower CDS has traditionally been challenging. Our Data Science Link for Waters Empower makes the CDS data instantly accessible and actionable. Data scientists can now perform analyses and identify insights using their preferred data science tools, without hours of data wrangling.

We have included more information about how the Data Science Link for Waters Empower works at the bottom of this post.

Waters Empower CDS Data: Data Science Use Cases

Once your Empower data is in the cloud, harmonized, structured, and connected to your favorite data science tools – a non-trivial effort – what meaningful analyses can your data scientists and analysts perform? After partnering with many of the world’s leading pharmaceutical companies, we have collected several obvious and non-obvious data science use cases that will help you get started. The use cases are logically organized and have overlaps. We will continue to add to this list over time. If you have a use case to add, we want to hear it! Contact information is listed at the bottom of this post.

1 - Analytical method performance

Data science tools can easily automate trending and cluster analysis of method performance characteristics, (for example, peak tailing factor, resolution, and relative retention time), ensuring continued method performance and providing R&D organizations with a fast and flexible feedback cycle.

Such analysis can be used in experiment design to drive continuous improvements and optimization of the methods.

2 - Column degradation

This is similar to the previous use case but focused primarily on column performance. Plotting the key performance or suitability parameters, such as peak tailing factor, resolution, or symmetry for one set of sample runs at different times will show very clear trending of column degradation.

Such information is crucial when teams try to generate control charts, predict life time of the column, or try to transfer the method to different teams or to CROs/CDMOs.

Use factors such as peak area, retention time, and tailing factor from the System Suitability Test (SST) runs to "predict" failures before they occur. Use this information to define safe operating limits, automate reporting and alerts when instruments/columns are approaching these limits, and build more intelligent control charts.

Image: Peak area mean value vs. time for each column

chromatography-2

Image: Peak tailing factor vs. time for each column

empower-spotfire-1

3 - Instrument usage and operational insights

HPLC is a major work horse in R&D. Therefore, it is crucial to optimize the usage of these instruments and understand the operational efficiency of the R&D teams via HPLC data. Data science can help you gain operational insights for the following aspects of HPLC instruments:

  • Instrument utilization. Easily analyze the distribution of your instrument usage. Identify the instruments that are not used. Better understand why and optimize your capital spend.
  • Instrument up-time for better preventative maintenance. For example, sum up the injection run time from all the injections performed on each HPLC system and identify which instrument has the longest cumulative run time. This can serve as a great indicator for preventative maintenance.
  • Understand the distribution of your sample and method parameters. For example, if the team runs a large percentage of injections using a particular method with long injection run time, and you are looking for a method to optimize, then that particular method may give you the biggest return on investment.

Image: Instrument usage analysis

Instrument-Usage-Analysis-Image_Blinded

4 - Stability trending

A stability study is a common type of analysis. It usually takes a lot of time and manual effort to organize the data, create the right report or analysis, and predict shelf life. However, once your data is accessible and prepared, data science can automate stability analysis and enhance the insights.

  • Plot the key metrics of an Active Pharmaceutical Ingredient (API) and other impurities at different time points
  • Use visualizations and data science tools to perform statistical analysis on the key stability indicator trend

5 - Chromatogram overlay

Getting a set of tables and charts is useful, however, scientists often need to visually inspect a group of chromatograms to understand the anomalies and subtle changes in the results. Such visual representation can help the brain to easily detect insights, and often triggers more in-depth and quantitative analysis and comparison.

Image: Chromatogram overlay

Visualize_your_results_overlay_curves_and_explore_insights

6 - Quality procedure assessment

It is crucial to follow the appropriate quality procedures while working in an R&D environment. Data science can help quality teams assess adherence to established procedures.

For example:

  • Flag when manual integration is used instead of the CDS' processing method
  • Flag injections that are processed multiple times
  • Flag injections that are not signed off properly or that do not have enough sign offs

Quality teams can closely monitor any deviation or anomaly from the organization's established process and immediately take actions to remediate and prevent errors from propagating downstream. For example, if the ratio of results without sign-off suddenly increases, that may indicate a change in process, which may be expected or by mistake.

Constantly assessing the quality procedure will rapidly reduce mistakes, increasing trust and synergy across departments.

7 - Metadata quality reporting

Metadata integrity is often difficult to assess. Missing and mislabeled information has significant impact on the reporting and usage of R&D data. Data science tools can assess the completeness and consistency of key business-related metadata entered in the Waters Empower CDS. Armed with such an assessment, teams can take action to improve metadata quality. After all, in the end, you get what you measure/track.

So, what does "metadata quality" mean? Here are some examples:

  • Each injection may have a custom field defined in Empower called ELN_Experiment_ID. Entering this accurately is crucial to aggregate or "join" the Empower data with experiment set-up information in the ELN, or to further automate data transfer into the ELN or LIMS.
  • For HPLCs that do not use a Waters column, Empower is not able to automatically gather the column information because the eCord will be missing. If you want to run a column degradation analysis like the one described above, then it is crucial to understand which data sets have the Waters eCord populated. For those using non-Waters columns, it is crucial to know if ColumnId is entered properly and consistently as a custom field.
  • Scientists likely use several slightly different naming conventions for their sample sets and methods that mean the same thing. It is helpful to look at the variation and uncover obvious or hidden conventions.

With access to the metadata, in a consumable format, data science tools can automatically flag experiments without the proper sample naming convention, method naming convention, or instances missing specific custom fields (for example, ColumnId or ELN_Experiment_ID).

Identifying these metadata quality issues and providing feedback in real time will vastly improve metadata integrity, making your R&D data much more actionable.

To improve “metadata quality”, here are the 3 steps you can follow.

Note: this applies to any other type of instruments or workflow.

Surface metadata quality as a metric

Using tools like Spotfire, PowerBI you can flag the injections with the incorrect naming. Such as, a dashboard can be updated automatically on a daily basis, thus providing a real time view of the “quality” of crucial metadata.

Reporting

For the injections run last month, what is the percentage of incorrect / missing metadata? Is this percentage decreasing over time or increasing over time.

Notification and action

If you have some well defined business rules on the structure TetraScience can automatically trigger data pipelines to notify the scientists by sending an email to an email-distribution list. The content of the email can be something like the following

Tuesday 2020 Feb 20, 10:10:10AM CET
Pipeline “Waters Empower metadata verification”
Pipeline failed due to

  • Error: Sample name “xxxxxxx” does not match the pre-defined naming convention
  • Error: Missing critical Custom field “ColumnID” and "LIMS_Request_ID"
  • Warning: “ExperimentID 112313 is not available in the ELN, recommend...”

Summary

The R&D data locked inside the Waters Empower CDS silo has tremendous potential, if only you could access it via the cloud and use your data science tools to identify insights and take action! The use cases listed above will drive value and improve efficiency for your teams.

Read on for details about how our Data Science Link for Waters Empower enables these analyses. We can also collect and aggregate important information from other instruments, ELN, or LIMS, together with your Empower CDS data, using the TetraScience Platform.

And let us know if you have a use case to add to the list!

TetraScience’s Data Science Link

The Data Science Link is an end-to-end application of the Tetra Data Platform that automates data acquisition from the most complex (and frequently used) R&D lab instruments, harmonizes it, and moves it to popular data science tools where it can be analyzed.

Image: Data Science Link for Waters Empower schematic

TetraScience's Empower Data Agent automatically collects CDS data

Waters Empower CDS is one of the most complex and frequently used instruments in biopharma R&D. TetraScience’s Empower Data Agent is the world's fastest and most sophisticated product for data extraction from the Waters Empower CDS. It supports advanced features such as:

  • Configuring data extraction based on projects of interest and sign-off status
  • Detecting changes or re-processed injections

Deployment

Install the Empower Data Agent on the order of minutes and point it to the TetraScience Platform. Empower data will immediately flow to the cloud where it will be harmonized and made available to your data science tools within a few minutes.

Data Science and Analytics Tools

Once the Data Science Link has extracted, harmonized, and prepared your Empower data, easily connect the following data science and analytics tools to access the data:

These are the tools used most frequently by our current customers; we regularly add new connections. We also support customers in developing and configuring your own applications, written in any programming language you choose.

Share another data science use case or inquire about the Data Science Link application by contacting us at solution@tetrascience.com or any of the channels below.

Share this article

Previous post

There is no previous post
Back to all posts
June 27, 2022

Barrier Busting: Bringing ELN and LIMS Scientific Data Together

Read Blog
May 31, 2022

Committed to Curing Diabetes

Read Blog
May 23, 2022

New Frontiers: World’s First Community-Driven AI Store for Biology

Read Blog
May 18, 2022

Tetra Blasts Off at Boston’s Bio-IT World

Read Blog
May 9, 2022

Give Your in vivo Data the Attention it Deserves

Read Blog
May 2, 2022

Customizing Digital Lab Experiences With Ease

Read Blog
April 14, 2022

Sharing a Vision and Deep Customer Commitment

Read Blog
April 11, 2022

Escaping the Scientific Data Quagmire

Read Blog
April 1, 2022

Innovating with a HoloLens and Drones

Read Blog
April 6, 2022

Digital Twins: Seeing Double with a Predictive Eye

Read Blog
March 28, 2022

Automated Anomaly Detection and Correction

Read Blog
March 30, 2022

Making Labs More Efficient

Read Blog
March 4, 2022

Introducing Tetra Data Platform v3.2

Read Blog
March 2, 2022

Are you prepared to utilize ML/AI and Data Visualization?

Read Blog
February 22, 2022

SLAS 2022: The Industry’s “Hyped” for Accessible and Actionable Scientific Data

Read Blog
February 21, 2022

BIOVIA partners with TetraScience

Read Blog
February 16, 2022

Tetra Partner Network: An Interview with Klemen Zupancic, CEO, SciNote

Read Blog
February 4, 2022

Closing the Data Gap in Cancer Research

Read Blog
January 27, 2022

Waters & The Tetra Partner Network: Making Data Science Possible

Read Blog
December 16, 2021

Announcing Acquisition of Tetra Lab Monitoring Business by Elemental Machines

Read Blog
November 29, 2021

Move From Fractal to Flywheel with The Tetra Partner Network

Read Blog
March 26, 2021

How an IDS Complements Raw Experimental R&D Data in the Digital Lab

Read Blog
July 30, 2021

What is an R&D Data Cloud? (And Why Should You Care?)

Read Blog
March 26, 2021

What is a True Data Integration, Anyway?

Read Blog
June 1, 2020

Data Science Use Cases for the Digital Lab: Novel Analyses with Waters Empower CDS Data

Read Blog
April 20, 2022

Unlock the Power of Your ELN and LIMS

Read Blog
July 23, 2020

The Science Behind Trash Data

Read Blog
August 20, 2021

The 4 Keys to Unlock the Lab of the Future

Read Blog
September 29, 2021

TetraScience Achieves SOC 2 Type 2 Validation, Advances R&D Data Cloud GxP Compliance Capabilities

Read Blog
April 20, 2020

Round-up of Semantic Web thought leadership articles

Read Blog
May 11, 2021

R&D Data Cloud: Moving Your Digital Lab Beyond SDMS

Read Blog
September 10, 2021

Principles of Deep Learning Theory

Read Blog
July 8, 2020

Powering Bioprocessing 4.0 for Therapeutic Development

Read Blog
March 30, 2022

Why Biopharma Needs an End-to-End, Purpose-Built Platform for Scientific Data — Part 2

Read Blog
August 19, 2021

Part 2: How TetraScience Approaches the Challenge of Scaling True Scientific Data Integrations

Read Blog
March 23, 2022

Why Biopharma Needs an End-to-End, Purpose-Built Platform for Scientific Data — Part 1

Read Blog
January 18, 2021

New Matter: Inside the Minds of SLAS Scientists Podcast

Read Blog
June 29, 2020

Enabling Compliance in GxP Labs

Read Blog
May 14, 2020

LRIG-New England: Lunchtime Virtual Rapid Fire Event - May 26, 2020

Read Blog
June 10, 2020

Remote Lab Scheduling is No Longer Optional, it is a Requirement

Read Blog
August 2, 2020

Incident Reporting for GxP Compliance

Read Blog
October 15, 2020

Protein Purification with Cytiva UNICORN: Enhanced Analytics through Harmonization and Integration

Read Blog
July 29, 2020

Cloud-based Data Management with Lab Automation: HighRes Biosolutions Cellario + TetraScience

Read Blog
August 20, 2020

Understanding Why Freezer Temperatures May Not Be Uniform

Read Blog
July 14, 2021

Find Experimental Data Faster with Google-Like Search in Tetra Data Platform 3.1 Release

Read Blog
July 22, 2021

Experimental Data in Life Sciences R&D — It’s How Many Copies of Jaws?!

Read Blog
April 26, 2020

The Digital Lab Needs an Intermediate Data Schema (IDS): a First Principle Analysis

Read Blog
April 6, 2020

TetraScience ADF Converter -- Delivering on the Promise of Allotrope and a Startup’s Journey

Read Blog
August 6, 2020

"Data Plumbing" for the Digital Lab

Read Blog
June 8, 2020

Data Automation for High-Throughput Screening with Dotmatics, Tecan, and PerkinElmer Envision

Read Blog
May 15, 2020

Applying Data Automation and Standards to Cell Counter Files

Read Blog
June 11, 2020

AWS Healthcare & Life Sciences Web Day | Virtual Industry Event

Read Blog
February 12, 2021

AWS Executive Conversations: Evolving R&D

Read Blog
April 15, 2021

Announcing Our Series B: The What, When, Why, Who, and Where

Read Blog
April 15, 2021

Announcing our Series B: The DNA Markers of Category Kings and Queens

Read Blog
April 15, 2021

Announcing our Series B: Tetra 1.0 and 2.0 | The Noise and the Signal

Read Blog
March 29, 2020

Allotrope Leaf Node Model — a Balance between Practical Solution and Semantics Compatibility

Read Blog
March 13, 2020

Choose the right alert set points for your freezers, refrigerators, and incubators

Read Blog
August 27, 2020

99 Problems, but an SDMS Ain't One

Read Blog