Protein Purification with Cytiva UNICORN: Enhanced Analytics through Harmonization and Integration

Cheng Han
October 15, 2020
  • Biopharma R&D has not realized the full power of the data it possesses and generates every day. There are easy-to-use data science tools and there is a vast amount of data. But disconnects exist; data are locked in on-premises silos and heterogenous formats, and often lack the interface needed to derive insights at scale.
  • Our “Data Science & Application Use Cases for the Digital Lab” blog series shares some non-obvious ways top pharmaceutical organizations apply data science to extract novel insights from their R&D data.
  • We crowdsource use cases through our partnerships with top biotech and pharmaceutical organizations, enabled by our ever-growing Tetra Partner Network of connections to common data sources and data science tools. Our cloud-native Tetra Data Platform automatically collects, centralizes, harmonizes and prepares R&D data for analysis.

Authors: Evan Anderson, Cheng Han, Mike Tarselli, Spin Wang


Protein therapeutics are integral to the pharmaceutical landscape. Consider insulin for diabetes, Factor VIII to treat clotting conditions, or monoclonal antibodies against emerging pathogens or cancerous cells: all are protein-based and require orthogonal characterization and purification techniques to more common small-molecule workflows. Fast protein liquid chromatography (FPLC)[1], a common separation technique, utilizes multiple methods to purify proteins based on their size, charge, or affinity to column packing. For this post, we’ll consider the nearly-ubiquitous Cytiva ÄKTA series of instruments, an adaptive protein purification platform. These instruments are controlled by Cytiva UNICORN software, which holds the chromatographic data and associated metadata for each result.

As with many R&D instruments and their control software, accessing the data contained within the Cytiva UNICORN has traditionally been challenging:

  • Tedious comparison of results across projects
  • Search for specific by diverse metadata results is non-native
  • Time-consuming manual analysis

Our Data Science Link makes the data inside UNICORN instantly accessible and actionable. Scientists can now perform analyses and identify insights using their preferred data science tools, without manual data wrangling[2].

Image: Data Science Link Overview


Search, Select, Overlay, Evaluate, and Report

Our partners sought to reduce scientist hours involved in manual data transfer, comparison and analysis. On the Tetra Data Platform, all data from UNICORN systems is automatically harmonized. Using data applications and visualization tools like Streamlit, Jupyter Notebook, Spotfire, Tableau and etc., R&D organizations can build interactive applications that enable scientists to streamline search, select, overlay, evaluate and report.


Scientists seek to compare FPLC run results through comparing output chromatograms. They can conduct flexible queries to obtain results of interests. For example, in order to select a column with better performance for a therapeutic protein of interest, a scientist can search by molecules, resins, column diameter and/or start or end date ranges.

Image: Search by multiple terms


Pre-aggregated values from UNICORN results - molecules, HPLC systems and column parameters - permit rapid selection.  Scientists can leverage partial name matches to find specific resins and fetch relevant chromatograms.


Among search results, scientists can select all or a subset of the chromatogram of interests.

Image: Surface chromatograms from selected results


After locating the specific data they are looking for, scientists can select the chromatograms (see: elution peak overlay) to overlay and analyze. In the subsequent sections, we will discuss some common use cases in method development, column performance, and fast access to structured and complete experiment data.


Chromatographic overlays from multiple runs can help visualize trends in column behavior, or spot anomalies in flow rate under certain conditions. Clear communication is key; interdepartmental reports help analytical staff visually report their optimization studies

However, there’s a fly in the ointment: run start time, injection time, and injection volume can vary by sample. As a result, the chromatograms arrive misaligned on both time and intensity axes. Simple plotting and overlay of the chromatograms together will lead to erroneous conclusions. Normalization, the process of aligning peaks and baseline with awareness of  injection volume, time, and column volume, consumes hours of manual effort for process development scientists daily.

If scientists have all the data from Cytiva UNICORN harmonized, centralized and available for query, they can use the injection volume value to normalize the intensity or height of the chromatogram; scientists can use custom set marks defined in the run log to realign the chromatograms. Enabling scientists to align their chromatograms at a setmark such as elution or wash allows them to compare protein mobility - and, thus, method performance - between results.

Image: Perform overlay with auto alignment for peaks and baseline


Image: Align peaks by Elution Start to compare method performance between different results


Image: Perform custom baseline adjustments (see that the y position at x=1170mLs is now at y=0 mAU) for downstream peak integration



Comparison of peak integrations drives rapid conclusions on recovery yields, impurities, or reaction progression without extra clicks or reinterpretation. This allows scientists to go the “extra step” to automating screening runs, calibrating QC runs, or monitoring method development. Once finished, the final visualization will help scientists determine optimal conditions

Image: Perform custom peak integration to evaluate elution efficiency



In any large scientific organization, communications are key to ensuring alignment between large teams operating in parallel functions. As interpretation of data - in addition to manual processes, file conversions, and kludging this into PowerPoint - may take longer than the experiment itself, it’s unsurprising to see scientists dedicating as much or more of their time to reporting and publishing data sets and notebook pages as they do to their science.

In the instance below, scientists can share chromatograms overlay images and peak table results in .csv format so they can be included in emails, presentations or Excel.  The .csv results can also be imported into statistical softwares such as JMP(“JUMP”).

Image: Dynamically Zoom in on overlay and save resulting graphs


Applications: Method Development

Identifying the protocol that achieves optimal yield and quality requires iterative method development. Simply, one changes method parameters and compares purification results. Capturing key method parameters systematically leads to highly efficient and transparent optimization. For example, some of these important parameters include flow rate, pH gradient, buffer, pressure and various scouting variables defined by users.


Capturing and tracking run results and associated operating conditions is not only important in method development but also is essential, in the context of Quality by Design[3], to help decision making during scaling up and method transfer.

Achieve the highest resolution size separation in final polishing steps

Every time chromatography is performed, a portion of scientists' material is lost. This may not be an issue if it’s a commodity material, but in a therapeutic context scientists may have lost much of an antibody, protein therapeutic, or oligonucleotide scientists have spent weeks manufacturing and characterizing.

Polishing runs are critical to produce quality therapeutics[4]. After bulk separations and a concentrating  chromatography run, the remaining material may still hide among closely-related contaminants like protein isoforms, polymers, n-1 adducts, or other post-translational modifications. Higher resolution obtained during polishing correlates to less expense, minimized use of customized resin, lower chance of rework (and therefore further material loss), and a better chance that the material so derived will meet specifications.

In this instance, chromatographic overlay will quickly allow scientists to:

  • Determine the yield with the integral of peak area and volume
  • Observe overall peak shape and symmetry, as more symmetric peaks indicate pure final material
  • Screen various resins or techniques of the purification scheme to determine which combination decreases manual handling, balances recovery against purity, and delivers material to spec prior to polishing

Flow Rate Scouting

Flow rate, though a simple parameter (units volume through the column / time) controls the theoretical “plates” [5] that the eluent travels through. This defines what maximal resolution can be. Granted, this also depends on the type of chromatography; size exclusion and gel permeation techniques function differently than silica or hydrophobic stationary phases. As a general rule, scientists want the fastest flow rate that their processes can tolerate (this reduces solvent cost and material degradation) while maintaining resolution.

In a given flow rate scouting, a scientist likely conducts several runs at different speeds, with a standard amount of material and standard run conditions, and monitors the peak shape, retention time (where it appears in the spectrum) and overall recovery (area under the curve or physical measurement at end).

Having all UNICORN results available on Tetra Data Platform, scientists do not need to  manually open each run and compare on screen. Instead, they can overlay and time-synch chromatograms of interests, while the areas under the curve (AUCs) are pre-calculated. With a liquid handler and auto injector, scientists could in fact then physically automate this process and automate the data analysis using Tetra Data Platform pipelines.

Applications: Column Performance

When transferring methods to an external organization - a CDMO, a collaborator, or another corporate site - it's important to communicate all critical variables. This might include (but is not limited to): column type, size, packing material, flow rate, diffusion rate, material physical properties, pH sensitivity, and many more.

Scientists can plot the performance of the purification across a large number of runs and better understand how column performance changes over time.

Fast Access to Structured and Complete Experiment Data

Check out this video to see how easy it is to set up the Cytiva UNICORN integration with the Tetra Data Platform.

With the data readily available for scientists to manipulate in data science and data analytics tools, they can create more sophisticated analysis, such as

  • Comparing control runs to detect operating and machine anomalies over time, across batches and systems
  • Extracting chromatographic features such as peak shapes
  • Process optimization to achieve higher yields (elution peak volumes) and better purity

When structured and complete data is made available and data integrity is reinforced, scientific research or process improvements are no longer limited by the functionalities of the instrument control software.


Cytiva UNICORN protein purification data provide tremendous insights.  When scientists can compare current-day runs with historical results thanks to harmonized, analysis-ready data, scientists save time, manual processing headaches, and reach better technical conclusions

Follow our blog, where we will continuously share use cases from our Tetra Partner Network.  These demonstrate how to harness the power of harmonized and vendor-agnostic scientific data.  Whether your goal is build reports, conduct correlation or causality analysis, or run AI/ML models to discover untold truths, we hope that you will find relevant answers here.


  1. "Fast protein liquid chromatography - Wikipedia."
  2. "Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Tasks, Survey Says"
  3. "Quality by design - Wikipedia."
  4. "How to combine chromatography techniques | Cytiva"
  5. "Theoretical Plate - an overview | ScienceDirect Topics."

Share another data science use case or inquire about the Data Science Link application by contacting us at or any of the channels below.

Share this article

Previous post

There is no previous post
Back to all posts
June 27, 2022

Barrier Busting: Bringing ELN and LIMS Scientific Data Together

Read Blog
May 31, 2022

Committed to Curing Diabetes

Read Blog
May 23, 2022

New Frontiers: World’s First Community-Driven AI Store for Biology

Read Blog
May 18, 2022

Tetra Blasts Off at Boston’s Bio-IT World

Read Blog
May 9, 2022

Give Your in vivo Data the Attention it Deserves

Read Blog
May 2, 2022

Customizing Digital Lab Experiences With Ease

Read Blog
April 14, 2022

Sharing a Vision and Deep Customer Commitment

Read Blog
April 11, 2022

Escaping the Scientific Data Quagmire

Read Blog
April 1, 2022

Innovating with a HoloLens and Drones

Read Blog
April 6, 2022

Digital Twins: Seeing Double with a Predictive Eye

Read Blog
March 28, 2022

Automated Anomaly Detection and Correction

Read Blog
March 30, 2022

Making Labs More Efficient

Read Blog
March 4, 2022

Introducing Tetra Data Platform v3.2

Read Blog
March 2, 2022

Are you prepared to utilize ML/AI and Data Visualization?

Read Blog
February 22, 2022

SLAS 2022: The Industry’s “Hyped” for Accessible and Actionable Scientific Data

Read Blog
February 21, 2022

BIOVIA partners with TetraScience

Read Blog
February 16, 2022

Tetra Partner Network: An Interview with Klemen Zupancic, CEO, SciNote

Read Blog
February 4, 2022

Closing the Data Gap in Cancer Research

Read Blog
January 27, 2022

Waters & The Tetra Partner Network: Making Data Science Possible

Read Blog
December 16, 2021

Announcing Acquisition of Tetra Lab Monitoring Business by Elemental Machines

Read Blog
November 29, 2021

Move From Fractal to Flywheel with The Tetra Partner Network

Read Blog
March 26, 2021

How an IDS Complements Raw Experimental R&D Data in the Digital Lab

Read Blog
July 30, 2021

What is an R&D Data Cloud? (And Why Should You Care?)

Read Blog
March 26, 2021

What is a True Data Integration, Anyway?

Read Blog
June 1, 2020

Data Science Use Cases for the Digital Lab: Novel Analyses with Waters Empower CDS Data

Read Blog
April 20, 2022

Unlock the Power of Your ELN and LIMS

Read Blog
July 23, 2020

The Science Behind Trash Data

Read Blog
August 20, 2021

The 4 Keys to Unlock the Lab of the Future

Read Blog
September 29, 2021

TetraScience Achieves SOC 2 Type 2 Validation, Advances R&D Data Cloud GxP Compliance Capabilities

Read Blog
April 20, 2020

Round-up of Semantic Web thought leadership articles

Read Blog
May 11, 2021

R&D Data Cloud: Moving Your Digital Lab Beyond SDMS

Read Blog
September 10, 2021

Principles of Deep Learning Theory

Read Blog
July 8, 2020

Powering Bioprocessing 4.0 for Therapeutic Development

Read Blog
March 30, 2022

Why Biopharma Needs an End-to-End, Purpose-Built Platform for Scientific Data — Part 2

Read Blog
August 19, 2021

Part 2: How TetraScience Approaches the Challenge of Scaling True Scientific Data Integrations

Read Blog
March 23, 2022

Why Biopharma Needs an End-to-End, Purpose-Built Platform for Scientific Data — Part 1

Read Blog
January 18, 2021

New Matter: Inside the Minds of SLAS Scientists Podcast

Read Blog
June 29, 2020

Enabling Compliance in GxP Labs

Read Blog
May 14, 2020

LRIG-New England: Lunchtime Virtual Rapid Fire Event - May 26, 2020

Read Blog
June 10, 2020

Remote Lab Scheduling is No Longer Optional, it is a Requirement

Read Blog
August 2, 2020

Incident Reporting for GxP Compliance

Read Blog
October 15, 2020

Protein Purification with Cytiva UNICORN: Enhanced Analytics through Harmonization and Integration

Read Blog
July 29, 2020

Cloud-based Data Management with Lab Automation: HighRes Biosolutions Cellario + TetraScience

Read Blog
August 20, 2020

Understanding Why Freezer Temperatures May Not Be Uniform

Read Blog
July 14, 2021

Find Experimental Data Faster with Google-Like Search in Tetra Data Platform 3.1 Release

Read Blog
July 22, 2021

Experimental Data in Life Sciences R&D — It’s How Many Copies of Jaws?!

Read Blog
April 26, 2020

The Digital Lab Needs an Intermediate Data Schema (IDS): a First Principle Analysis

Read Blog
April 6, 2020

TetraScience ADF Converter -- Delivering on the Promise of Allotrope and a Startup’s Journey

Read Blog
August 6, 2020

"Data Plumbing" for the Digital Lab

Read Blog
June 8, 2020

Data Automation for High-Throughput Screening with Dotmatics, Tecan, and PerkinElmer Envision

Read Blog
May 15, 2020

Applying Data Automation and Standards to Cell Counter Files

Read Blog
June 11, 2020

AWS Healthcare & Life Sciences Web Day | Virtual Industry Event

Read Blog
February 12, 2021

AWS Executive Conversations: Evolving R&D

Read Blog
April 15, 2021

Announcing Our Series B: The What, When, Why, Who, and Where

Read Blog
April 15, 2021

Announcing our Series B: The DNA Markers of Category Kings and Queens

Read Blog
April 15, 2021

Announcing our Series B: Tetra 1.0 and 2.0 | The Noise and the Signal

Read Blog
March 29, 2020

Allotrope Leaf Node Model — a Balance between Practical Solution and Semantics Compatibility

Read Blog
March 13, 2020

Choose the right alert set points for your freezers, refrigerators, and incubators

Read Blog
August 27, 2020

99 Problems, but an SDMS Ain't One

Read Blog