The use of Data Science across all industries has been rising in recent years. At the same time, modern laboratories have experienced massive growth in data volumes. In a single pharmaceutical company, there can be hundreds of High Performance Liquid Chromatography (HPLC) instruments used by process development, quality control, downstream bioprocess, manufacturing, and many other stages of drug R&D. It is common to have millions or tens of millions of injections that need to be available for analysis, and thousands of new injections produced daily.
Waters Empower is one of the most widely used chromatography data systems (CDS). CDS controls chromatographic instruments like HPLCs, runs the injections, collects the raw data, and performs analysis on the chromatogram to detect peaks.
As with most R&D instruments and their control software, accessing the data contained within the Waters Empower CDS has traditionally been challenging. Our Data Science Link for Waters Empower makes the CDS data instantly accessible and actionable. Data scientists can now perform analyses and identify insights using their preferred data science tools, without hours of data wrangling.
We have included more information about how the Data Science Link for Waters Empower works at the bottom of this post.
Once your Empower data is in the cloud, harmonized, structured, and connected to your favorite data science tools – a non-trivial effort – what meaningful analyses can your data scientists and analysts perform? After partnering with many of the world’s leading pharmaceutical companies, we have collected several obvious and non-obvious data science use cases that will help you get started. The use cases are logically organized and have overlaps. We will continue to add to this list over time. If you have a use case to add, we want to hear it! Contact information is listed at the bottom of this post.
Data science tools can easily automate trending and cluster analysis of method performance characteristics, (for example, peak tailing factor, resolution, and relative retention time), ensuring continued method performance and providing R&D organizations with a fast and flexible feedback cycle.
Such analysis can be used in experiment design to drive continuous improvements and optimization of the methods.
This is similar to the previous use case but focused primarily on column performance. Plotting the key performance or suitability parameters, such as peak tailing factor, resolution, or symmetry for one set of sample runs at different times will show very clear trending of column degradation.
Such information is crucial when teams try to generate control charts, predict life time of the column, or try to transfer the method to different teams or to CROs/CDMOs.
Use factors such as peak area, retention time, and tailing factor from the System Suitability Test (SST) runs to "predict" failures before they occur. Use this information to define safe operating limits, automate reporting and alerts when instruments/columns are approaching these limits, and build more intelligent control charts.
Image: Peak area mean value vs. time for each column
Image: Peak tailing factor vs. time for each column
HPLC is a major work horse in R&D. Therefore, it is crucial to optimize the usage of these instruments and understand the operational efficiency of the R&D teams via HPLC data. Data science can help you gain operational insights for the following aspects of HPLC instruments:
Image: Instrument usage analysis
A stability study is a common type of analysis. It usually takes a lot of time and manual effort to organize the data, create the right report or analysis, and predict shelf life. However, once your data is accessible and prepared, data science can automate stability analysis and enhance the insights.
Getting a set of tables and charts is useful, however, scientists often need to visually inspect a group of chromatograms to understand the anomalies and subtle changes in the results. Such visual representation can help the brain to easily detect insights, and often triggers more in-depth and quantitative analysis and comparison.
Image: Chromatogram overlay
It is crucial to follow the appropriate quality procedures while working in an R&D environment. Data science can help quality teams assess adherence to established procedures.
For example:
Quality teams can closely monitor any deviation or anomaly from the organization's established process and immediately take actions to remediate and prevent errors from propagating downstream. For example, if the ratio of results without sign-off suddenly increases, that may indicate a change in process, which may be expected or by mistake.
Constantly assessing the quality procedure will rapidly reduce mistakes, increasing trust and synergy across departments.
Metadata integrity is often difficult to assess. Missing and mislabeled information has significant impact on the reporting and usage of R&D data. Data science tools can assess the completeness and consistency of key business-related metadata entered in the Waters Empower CDS. Armed with such an assessment, teams can take action to improve metadata quality. After all, in the end, you get what you measure/track.
So, what does "metadata quality" mean? Here are some examples:
With access to the metadata, in a consumable format, data science tools can automatically flag experiments without the proper sample naming convention, method naming convention, or instances missing specific custom fields (for example, ColumnId or ELN_Experiment_ID).
Identifying these metadata quality issues and providing feedback in real time will vastly improve metadata integrity, making your R&D data much more actionable.
To improve “metadata quality”, here are the 3 steps you can follow.
Note: this applies to any other type of instruments or workflow.
Using tools like Spotfire, PowerBI you can flag the injections with the incorrect naming. Such as, a dashboard can be updated automatically on a daily basis, thus providing a real time view of the “quality” of crucial metadata.
For the injections run last month, what is the percentage of incorrect / missing metadata? Is this percentage decreasing over time or increasing over time.
If you have some well defined business rules on the structure TetraScience can automatically trigger data pipelines to notify the scientists by sending an email to an email-distribution list. The content of the email can be something like the following
Tuesday 2020 Feb 20, 10:10:10AM CET
Pipeline “Waters Empower metadata verification”
Pipeline failed due to
The R&D data locked inside the Waters Empower CDS silo has tremendous potential, if only you could access it via the cloud and use your data science tools to identify insights and take action! The use cases listed above will drive value and improve efficiency for your teams.
Read on for details about how our Data Science Link for Waters Empower enables these analyses. We can also collect and aggregate important information from other instruments, ELN, or LIMS, together with your Empower CDS data, using the TetraScience Platform.
And let us know if you have a use case to add to the list!
The Data Science Link is an end-to-end application of the TetraScience Platform that automates data acquisition from the most complex (and frequently used) R&D lab instruments, harmonizes it, and moves it to popular data science tools where it can be analyzed.
Image: Data Science Link for Waters Empower schematic
Waters Empower CDS is one of the most complex and frequently used instruments in biopharma R&D. TetraScience’s Empower Data Agent is the world's fastest and most sophisticated product for data extraction from the Waters Empower CDS. It supports advanced features such as:
Install the Empower Data Agent on the order of minutes and point it to the TetraScience Platform. Empower data will immediately flow to the cloud where it will be harmonized and made available to your data science tools within a few minutes.
Once the Data Science Link has extracted, harmonized, and prepared your Empower data, easily connect the following data science and analytics tools to access the data:
These are the tools used most frequently by our current customers; we regularly add new connections. We also support customers in developing and configuring your own applications, written in any programming language you choose.
Share another data science use case or inquire about the Data Science Link application by contacting us at solution@tetrascience.com or any of the channels below.