June 1, 2020
Data Science

Data Science Use Cases for the Digital Lab: Novel Analyses with Waters Empower CDS Data

  • Biopharma R&D has not realized the full power of the data it possesses and generates every day. There are advanced, easy-to-use data science tools and there is a vast amount of data. But these two are disconnected; the data is locked in on-premise silos and heterogenous formats, and often lacks the interface needed to derive insights at scale.
  • This “Data Science Use Cases for the Digital Lab” series of blog posts shares some obvious and non-obvious ways top pharmaceutical organizations are applying data science to extract novel insights from their R&D data.
  • We crowdsource use cases through our partnerships with top pharmaceutical organizations, enabled by our ever-growing network of connections to common data sources and data science tools, and our cloud-native platform automatically collects, centralizes, harmonizes, and prepares R&D data for analysis.
  • Use cases in this blog post include: analytical method performance | column degradation | instrument usage and operational insights | stability studies trending | chromatogram overlay | quality procedure assessment | metadata quality reporting.

Overview

The use of Data Science across all industries has been rising in recent years. At the same time, modern laboratories have experienced massive growth in data volumes. In a single pharmaceutical company, there can be hundreds of High Performance Liquid Chromatography (HPLC) instruments used by process development, quality control, downstream bioprocess, manufacturing, and many other stages of drug R&D. It is common to have millions or tens of millions of injections that need to be available for analysis, and thousands of new injections produced daily.

Waters Empower is one of the most widely used chromatography data systems (CDS). CDS controls chromatographic instruments like HPLCs, runs the injections, collects the raw data, and performs analysis on the chromatogram to detect peaks.

As with most R&D instruments and their control software, accessing the data contained within the Waters Empower CDS has traditionally been challenging. Our Data Science Link for Waters Empower makes the CDS data instantly accessible and actionable. Data scientists can now perform analyses and identify insights using their preferred data science tools, without hours of data wrangling.

We have included more information about how the Data Science Link for Waters Empower works at the bottom of this post.

Waters Empower CDS Data: Data Science Use Cases

Once your Empower data is in the cloud, harmonized, structured, and connected to your favorite data science tools – a non-trivial effort – what meaningful analyses can your data scientists and analysts perform? After partnering with many of the world’s leading pharmaceutical companies, we have collected several obvious and non-obvious data science use cases that will help you get started. The use cases are logically organized and have overlaps. We will continue to add to this list over time. If you have a use case to add, we want to hear it! Contact information is listed at the bottom of this post.

1 - Analytical method performance

Data science tools can easily automate trending and cluster analysis of method performance characteristics, (for example, peak tailing factor, resolution, and relative retention time), ensuring continued method performance and providing R&D organizations with a fast and flexible feedback cycle.

Such analysis can be used in experiment design to drive continuous improvements and optimization of the methods.

2 - Column degradation

This is similar to the previous use case but focused primarily on column performance. Plotting the key performance or suitability parameters, such as peak tailing factor, resolution, or symmetry for one set of sample runs at different times will show very clear trending of column degradation.

Such information is crucial when teams try to generate control charts, predict life time of the column, or try to transfer the method to different teams or to CROs/CDMOs.

Use factors such as peak area, retention time, and tailing factor from the System Suitability Test (SST) runs to "predict" failures before they occur. Use this information to define safe operating limits, automate reporting and alerts when instruments/columns are approaching these limits, and build more intelligent control charts.

Image: Peak area mean value vs. time for each column

chromatography-2

Image: Peak tailing factor vs. time for each column

empower-spotfire-1

3 - Instrument usage and operational insights

HPLC is a major work horse in R&D. Therefore, it is crucial to optimize the usage of these instruments and understand the operational efficiency of the R&D teams via HPLC data. Data science can help you gain operational insights for the following aspects of HPLC instruments:

  • Instrument utilization. Easily analyze the distribution of your instrument usage. Identify the instruments that are not used. Better understand why and optimize your capital spend.
  • Instrument up-time for better preventative maintenance. For example, sum up the injection run time from all the injections performed on each HPLC system and identify which instrument has the longest cumulative run time. This can serve as a great indicator for preventative maintenance.
  • Understand the distribution of your sample and method parameters. For example, if the team runs a large percentage of injections using a particular method with long injection run time, and you are looking for a method to optimize, then that particular method may give you the biggest return on investment.

Image: Instrument usage analysis

Instrument-Usage-Analysis-Image_Blinded

4 - Stability trending

A stability study is a common type of analysis. It usually takes a lot of time and manual effort to organize the data, create the right report or analysis, and predict shelf life. However, once your data is accessible and prepared, data science can automate stability analysis and enhance the insights.

  • Plot the key metrics of an Active Pharmaceutical Ingredient (API) and other impurities at different time points
  • Use visualizations and data science tools to perform statistical analysis on the key stability indicator trend

5 - Chromatogram overlay

Getting a set of tables and charts is useful, however, scientists often need to visually inspect a group of chromatograms to understand the anomalies and subtle changes in the results. Such visual representation can help the brain to easily detect insights, and often triggers more in-depth and quantitative analysis and comparison.

Image: Chromatogram overlay

Visualize_your_results_overlay_curves_and_explore_insights

6 - Quality procedure assessment

It is crucial to follow the appropriate quality procedures while working in an R&D environment. Data science can help quality teams assess adherence to established procedures.

For example:

  • Flag when manual integration is used instead of the CDS' processing method
  • Flag injections that are processed multiple times
  • Flag injections that are not signed off properly or that do not have enough sign offs

Quality teams can closely monitor any deviation or anomaly from the organization's established process and immediately take actions to remediate and prevent errors from propagating downstream. For example, if the ratio of results without sign-off suddenly increases, that may indicate a change in process, which may be expected or by mistake.

Constantly assessing the quality procedure will rapidly reduce mistakes, increasing trust and synergy across departments.

7 - Metadata quality reporting

Metadata integrity is often difficult to assess. Missing and mislabeled information has significant impact on the reporting and usage of R&D data. Data science tools can assess the completeness and consistency of key business-related metadata entered in the Waters Empower CDS. Armed with such an assessment, teams can take action to improve metadata quality. After all, in the end, you get what you measure/track.

So, what does "metadata quality" mean? Here are some examples:

  • Each injection may have a custom field defined in Empower called ELN_Experiment_ID. Entering this accurately is crucial to aggregate or "join" the Empower data with experiment set-up information in the ELN, or to further automate data transfer into the ELN or LIMS.
  • For HPLCs that do not use a Waters column, Empower is not able to automatically gather the column information because the eCord will be missing. If you want to run a column degradation analysis like the one described above, then it is crucial to understand which data sets have the Waters eCord populated. For those using non-Waters columns, it is crucial to know if ColumnId is entered properly and consistently as a custom field.
  • Scientists likely use several slightly different naming conventions for their sample sets and methods that mean the same thing. It is helpful to look at the variation and uncover obvious or hidden conventions.

With access to the metadata, in a consumable format, data science tools can automatically flag experiments without the proper sample naming convention, method naming convention, or instances missing specific custom fields (for example, ColumnId or ELN_Experiment_ID).

Identifying these metadata quality issues and providing feedback in real time will vastly improve metadata integrity, making your R&D data much more actionable.

To improve “metadata quality”, here are the 3 steps you can follow.

Note: this applies to any other type of instruments or workflow.

Surface metadata quality as a metric

Using tools like Spotfire, PowerBI you can flag the injections with the incorrect naming. Such as, a dashboard can be updated automatically on a daily basis, thus providing a real time view of the “quality” of crucial metadata.

Reporting

For the injections run last month, what is the percentage of incorrect / missing metadata? Is this percentage decreasing over time or increasing over time.

Notification and action

If you have some well defined business rules on the structure TetraScience can automatically trigger data pipelines to notify the scientists by sending an email to an email-distribution list. The content of the email can be something like the following

Tuesday 2020 Feb 20, 10:10:10AM CET
Pipeline “Waters Empower metadata verification”
Pipeline failed due to

  • Error: Sample name “xxxxxxx” does not match the pre-defined naming convention
  • Error: Missing critical Custom field “ColumnID” and "LIMS_Request_ID"
  • Warning: “ExperimentID 112313 is not available in the ELN, recommend...”

Summary

The R&D data locked inside the Waters Empower CDS silo has tremendous potential, if only you could access it via the cloud and use your data science tools to identify insights and take action! The use cases listed above will drive value and improve efficiency for your teams.

Read on for details about how our Data Science Link for Waters Empower enables these analyses. We can also collect and aggregate important information from other instruments, ELN, or LIMS, together with your Empower CDS data, using the TetraScience Platform.

And let us know if you have a use case to add to the list!

TetraScience’s Data Science Link

The Data Science Link is an end-to-end application of the TetraScience Platform that automates data acquisition from the most complex (and frequently used) R&D lab instruments, harmonizes it, and moves it to popular data science tools where it can be analyzed.

Image: Data Science Link for Waters Empower schematic

Screen-Shot-2020-05-25-at-12.01.51-PM

TetraScience's Empower Data Agent automatically collects CDS data

Waters Empower CDS is one of the most complex and frequently used instruments in biopharma R&D. TetraScience’s Empower Data Agent is the world's fastest and most sophisticated product for data extraction from the Waters Empower CDS. It supports advanced features such as:

  • Configuring data extraction based on projects of interest and sign-off status
  • Detecting changes or re-processed injections

Deployment

Install the Empower Data Agent on the order of minutes and point it to the TetraScience Platform. Empower data will immediately flow to the cloud where it will be harmonized and made available to your data science tools within a few minutes.

Data Science and Analytics Tools

Once the Data Science Link has extracted, harmonized, and prepared your Empower data, easily connect the following data science and analytics tools to access the data:

These are the tools used most frequently by our current customers; we regularly add new connections. We also support customers in developing and configuring your own applications, written in any programming language you choose.

Share another data science use case or inquire about the Data Science Link application by contacting us at solution@tetrascience.com or any of the channels below.

Follow TetraScience for more data science use cases for R&D data and other related topics:

Learn more about how we automatically harmonize and centralize experimental data, connecting disparate silos to activate the flow of data across your R&D ecosystem. Contact us at www.TetraScience.com/contact-us.

Contact a Solution Architect

Build an Integration
Spin Wang
Cornell Applied Physics and MIT EECS. Co-founder and CEO of TetraScience. Forbes 30 Under 30 in Science.

Read more posts by this author