- Biopharma R&D has not realized the full power of the data it possesses and generates every day. There are easy-to-use data science tools and there is a vast amount of data. But disconnects exist; data are locked in on-premises silos and heterogenous formats, and often lack the interface needed to derive insights at scale.
- Our “Data Science & Application Use Cases for the Digital Lab” blog series shares some non-obvious ways top pharmaceutical organizations apply data science to extract novel insights from their R&D data.
- We crowdsource use cases through our partnerships with top biotech and pharmaceutical organizations, enabled by our ever-growing Tetra Partner Network of connections to common data sources and data science tools. Our cloud-native Tetra Data Platform automatically collects, centralizes, harmonizes and prepares R&D data for analysis.
Authors: Evan Anderson, Cheng Han, Mike Tarselli, Spin Wang
Protein therapeutics are integral to the pharmaceutical landscape. Consider insulin for diabetes, Factor VIII to treat clotting conditions, or monoclonal antibodies against emerging pathogens or cancerous cells: all are protein-based and require orthogonal characterization and purification techniques to more common small-molecule workflows. Fast protein liquid chromatography (FPLC), a common separation technique, utilizes multiple methods to purify proteins based on their size, charge, or affinity to column packing. For this post, we’ll consider the nearly-ubiquitous Cytiva ÄKTA series of instruments, an adaptive protein purification platform. These instruments are controlled by Cytiva UNICORN software, which holds the chromatographic data and associated metadata for each result.
As with many R&D instruments and their control software, accessing the data contained within the Cytiva UNICORN has traditionally been challenging:
- Tedious comparison of results across projects
- Search for specific by diverse metadata results is non-native
- Time-consuming manual analysis
Our Data Science Link makes the data inside UNICORN instantly accessible and actionable. Scientists can now perform analyses and identify insights using their preferred data science tools, without manual data wrangling.
Image: Data Science Link Overview
Search, Select, Overlay, Evaluate, and Report
Our partners sought to reduce scientist hours involved in manual data transfer, comparison and analysis. On the Tetra Data Platform, all data from UNICORN systems is automatically harmonized. Using data applications and visualization tools like Streamlit, Jupyter Notebook, Spotfire, Tableau and etc., R&D organizations can build interactive applications that enable scientists to streamline search, select, overlay, evaluate and report.
Scientists seek to compare FPLC run results through comparing output chromatograms. They can conduct flexible queries to obtain results of interests. For example, in order to select a column with better performance for a therapeutic protein of interest, a scientist can search by molecules, resins, column diameter and/or start or end date ranges.
Image: Search by multiple terms
Pre-aggregated values from UNICORN results - molecules, HPLC systems and column parameters - permit rapid selection. Scientists can leverage partial name matches to find specific resins and fetch relevant chromatograms.
Among search results, scientists can select all or a subset of the chromatogram of interests.
Image: Surface chromatograms from selected results
After locating the specific data they are looking for, scientists can select the chromatograms (see: elution peak overlay) to overlay and analyze. In the subsequent sections, we will discuss some common use cases in method development, column performance, and fast access to structured and complete experiment data.
Chromatographic overlays from multiple runs can help visualize trends in column behavior, or spot anomalies in flow rate under certain conditions. Clear communication is key; interdepartmental reports help analytical staff visually report their optimization studies
However, there’s a fly in the ointment: run start time, injection time, and injection volume can vary by sample. As a result, the chromatograms arrive misaligned on both time and intensity axes. Simple plotting and overlay of the chromatograms together will lead to erroneous conclusions. Normalization, the process of aligning peaks and baseline with awareness of injection volume, time, and column volume, consumes hours of manual effort for process development scientists daily.
If scientists have all the data from Cytiva UNICORN harmonized, centralized and available for query, they can use the injection volume value to normalize the intensity or height of the chromatogram; scientists can use custom set marks defined in the run log to realign the chromatograms. Enabling scientists to align their chromatograms at a setmark such as elution or wash allows them to compare protein mobility - and, thus, method performance - between results.
Image: Perform overlay with auto alignment for peaks and baseline
Image: Align peaks by Elution Start to compare method performance between different results
Image: Perform custom baseline adjustments (see that the y position at x=1170mLs is now at y=0 mAU) for downstream peak integration
Comparison of peak integrations drives rapid conclusions on recovery yields, impurities, or reaction progression without extra clicks or reinterpretation. This allows scientists to go the “extra step” to automating screening runs, calibrating QC runs, or monitoring method development. Once finished, the final visualization will help scientists determine optimal conditions
Image: Perform custom peak integration to evaluate elution efficiency
In any large scientific organization, communications are key to ensuring alignment between large teams operating in parallel functions. As interpretation of data - in addition to manual processes, file conversions, and kludging this into PowerPoint - may take longer than the experiment itself, it’s unsurprising to see scientists dedicating as much or more of their time to reporting and publishing data sets and notebook pages as they do to their science.
In the instance below, scientists can share chromatograms overlay images and peak table results in .csv format so they can be included in emails, presentations or Excel. The .csv results can also be imported into statistical softwares such as JMP(“JUMP”).
Image: Dynamically Zoom in on overlay and save resulting graphs
Applications: Method Development
Identifying the protocol that achieves optimal yield and quality requires iterative method development. Simply, one changes method parameters and compares purification results. Capturing key method parameters systematically leads to highly efficient and transparent optimization. For example, some of these important parameters include flow rate, pH gradient, buffer, pressure and various scouting variables defined by users.
Capturing and tracking run results and associated operating conditions is not only important in method development but also is essential, in the context of Quality by Design, to help decision making during scaling up and method transfer.
Achieve the highest resolution size separation in final polishing steps
Every time chromatography is performed, a portion of scientists' material is lost. This may not be an issue if it’s a commodity material, but in a therapeutic context scientists may have lost much of an antibody, protein therapeutic, or oligonucleotide scientists have spent weeks manufacturing and characterizing.
Polishing runs are critical to produce quality therapeutics. After bulk separations and a concentrating chromatography run, the remaining material may still hide among closely-related contaminants like protein isoforms, polymers, n-1 adducts, or other post-translational modifications. Higher resolution obtained during polishing correlates to less expense, minimized use of customized resin, lower chance of rework (and therefore further material loss), and a better chance that the material so derived will meet specifications.
In this instance, chromatographic overlay will quickly allow scientists to:
- Determine the yield with the integral of peak area and volume
- Observe overall peak shape and symmetry, as more symmetric peaks indicate pure final material
- Screen various resins or techniques of the purification scheme to determine which combination decreases manual handling, balances recovery against purity, and delivers material to spec prior to polishing
Flow Rate Scouting
Flow rate, though a simple parameter (units volume through the column / time) controls the theoretical “plates”  that the eluent travels through. This defines what maximal resolution can be. Granted, this also depends on the type of chromatography; size exclusion and gel permeation techniques function differently than silica or hydrophobic stationary phases. As a general rule, scientists want the fastest flow rate that their processes can tolerate (this reduces solvent cost and material degradation) while maintaining resolution.
In a given flow rate scouting, a scientist likely conducts several runs at different speeds, with a standard amount of material and standard run conditions, and monitors the peak shape, retention time (where it appears in the spectrum) and overall recovery (area under the curve or physical measurement at end).
Having all UNICORN results available on Tetra Data Platform, scientists do not need to manually open each run and compare on screen. Instead, they can overlay and time-synch chromatograms of interests, while the areas under the curve (AUCs) are pre-calculated. With a liquid handler and auto injector, scientists could in fact then physically automate this process and automate the data analysis using Tetra Data Platform pipelines.
Applications: Column Performance
When transferring methods to an external organization - a CDMO, a collaborator, or another corporate site - it's important to communicate all critical variables. This might include (but is not limited to): column type, size, packing material, flow rate, diffusion rate, material physical properties, pH sensitivity, and many more.
Scientists can plot the performance of the purification across a large number of runs and better understand how column performance changes over time.
Fast Access to Structured and Complete Experiment Data
Check out this video to see how easy it is to set up the Cytiva UNICORN integration with the Tetra Data Platform.
With the data readily available for scientists to manipulate in data science and data analytics tools, they can create more sophisticated analysis, such as
- Comparing control runs to detect operating and machine anomalies over time, across batches and systems
- Extracting chromatographic features such as peak shapes
- Process optimization to achieve higher yields (elution peak volumes) and better purity
When structured and complete data is made available and data integrity is reinforced, scientific research or process improvements are no longer limited by the functionalities of the instrument control software.
Cytiva UNICORN protein purification data provide tremendous insights. When scientists can compare current-day runs with historical results thanks to harmonized, analysis-ready data, scientists save time, manual processing headaches, and reach better technical conclusions
Follow our blog, where we will continuously share use cases from our Tetra Partner Network. These demonstrate how to harness the power of harmonized and vendor-agnostic scientific data. Whether your goal is build reports, conduct correlation or causality analysis, or run AI/ML models to discover untold truths, we hope that you will find relevant answers here.
- "Fast protein liquid chromatography - Wikipedia."
- "Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Tasks, Survey Says"
- "Quality by design - Wikipedia."
- "How to combine chromatography techniques | Cytiva"
- "Theoretical Plate - an overview | ScienceDirect Topics."
Share another data science use case or inquire about the Data Science Link application by contacting us at email@example.com or any of the channels below.