Applying Data Automation and Standards to Cell Counter Files

Evan Anderson
|
May 15, 2020

Advanced Data Engineering Improves Data Integrity Using an Allotrope-Compatible, Data Science Ready File Format



Diagrams-for-Blogpost-2

Authors:
Evan Anderson - Delivery Engineer, TetraScience
George Van Den Driessche – Scientist I, Biogen
Spin Wang - CEO & Co-Founder, TetraScience

The pharmaceutical industry generates experimental data every day that is stored on local PCs or on instrument vendor servers, with vendor dependent data schemas. These practices result in the creation of data silos across pharma that do not adhere to FAIR (Findable, Accessible, Interoperable, and Reusable) data principles and prevent companies from utilizing all the information found in their raw data sets.

“The lab of the future is built on data. Right now, our cell counter data is largely inaccessible, and the heterogeneous nature of the information makes it difficult to analyze without significant manual manipulation. My team needs to make this data accessible and actionable for our scientists and data scientists,” says Len Blackwell, Associate Director of Strategic Analytics at Biogen.

This blog post highlights how our recent collaboration with Biogen knocks down the data silos associated with their Beckman Coulter Vi-CELL cell counter results, making the data readily available for further analysis using standard data science tools.

Cell Counters generate valuable, but heterogenous, data

Cell counters enable process development scientists to differentiate the number of viable versus non-viable cells using the trypan blue exclusion cell counting method. These values are then used to measure overall cell density and cell viability percentage. Cell density measurements help monitor cell culture feed requirements and the cell viability percentage assesses the overall health of a cell culture.

“Cell counting is a critical step in the biomanufacturing process that provides information about the density and viability of the mammalian cells that produce our protein products. In the process development laboratories, daily cell counts inform process development engineers of the results of experimental conditions with the goal of optimizing cell health and productivity,” says Brandon Moore, Cell Culture Engineer at Biogen.

A cell culture study typically lasts 14 days, with one sample analyzed from each condition per day. Biogen’s cell counter sample analysis produces 50 images that are analyzed with the trypan blue exclusion cell counting method. The resulting measurements are stored in a single text file as aggregated values and raw data arrays. The image files, used for the measurements, are exported and stored separately from the numerical data.

This presents a few challenges:

  1. Multiple file types (image and text files)
  2. 51 files generated for each sample, multiplied by the number of days in a study
  3. Major data integrity risk due to the separate file storage

Image: The Beckman Coulter Vi-CELL exports images and a .txt report for each experiment.

raw-files

Automating cell counter data movement and conversion

TetraScience addresses these data integrity challenges with the Tetra Data Platform. The platform provides automated cell counter file movement from the instrument PC to an AWS data lake. Once the files are moved to the data lake, they undergo two conversion pipelines. First, the instrument data schema is mapped into an Intermediate Data Schema JSON file (IDS-JSON) and then the IDS-JSON is mapped into a pharmaceutical standard Allotrope Data Format (ADF) file. The ADF file provides a single output that captures both image files and the numerical data reports generated per cell counter sample.

Image: an excerpt of the cell counter report as a JSON in the IDS format.

ids

Harnessing Cell Counter Data

Once the cell counter data is moved to an AWS data lake and packaged as either an IDS-JSON, or ADF, data scientists can begin analyzing these files with python notebooks like Google CoLab, or other common data science tools. Scientists can call files related to cell growth studies by querying the IDS-JSON files with ElasticSearch. Next, the cell density data is plotted versus date using the Pandas and Seaborn python libraries. ADF files enhance cell counter data integrity by combining numerical and image data into one file output; scientists can access this information with the H5py python library. The combined data outputs enable new types of data analysis, such as cell contamination monitoring with cell image analysis. For more information on the ADF file format, check out our blog post about the ADF graph model and leaf node model.

Image: Converting cell counter data into an ADF wraps up the standardized data along with associated images and ontology specified by the Allotrope Foundation

adf

We can also easily use TetraScience REST API to import data into interactive python notebooks to do more detailed image analysis.

Image: Use TetraScience REST API to access your data using popular data science tools, like Jupyter iPython notebooks

api-python

Conclusion

“There are two really important improvements that come out of this process for Biogen. First, analysis is fully automated; once someone reads a sample in the cell counter, the data for the growth curve is visualized in the BI tool. This was previously a manual process with a lot of data movement. Second, data integrity is improved by aggregating multiple files from each sample into a single file and automating storage of the data. These are key points and should not be overlooked, “ says Blackwell.

We are delighted to have the opportunity to work with innovators at Biogen like Len Blackwell, Associate Director, and George Van Den Driessche, Scientist I. This collaboration will save their scientists time, make their cell counter data truly accessible and actionable, and negate the risk presented by separately storing files.

Share this article

Previous post

There is no previous post
Back to all posts
June 27, 2022

Barrier Busting: Bringing ELN and LIMS Scientific Data Together

Read Blog
May 31, 2022

Committed to Curing Diabetes

Read Blog
May 23, 2022

New Frontiers: World’s First Community-Driven AI Store for Biology

Read Blog
May 18, 2022

Tetra Blasts Off at Boston’s Bio-IT World

Read Blog
May 9, 2022

Give Your in vivo Data the Attention it Deserves

Read Blog
May 2, 2022

Customizing Digital Lab Experiences With Ease

Read Blog
April 14, 2022

Sharing a Vision and Deep Customer Commitment

Read Blog
April 11, 2022

Escaping the Scientific Data Quagmire

Read Blog
April 1, 2022

Innovating with a HoloLens and Drones

Read Blog
April 6, 2022

Digital Twins: Seeing Double with a Predictive Eye

Read Blog
March 28, 2022

Automated Anomaly Detection and Correction

Read Blog
March 30, 2022

Making Labs More Efficient

Read Blog
March 4, 2022

Introducing Tetra Data Platform v3.2

Read Blog
March 2, 2022

Are you prepared to utilize ML/AI and Data Visualization?

Read Blog
February 22, 2022

SLAS 2022: The Industry’s “Hyped” for Accessible and Actionable Scientific Data

Read Blog
February 21, 2022

BIOVIA partners with TetraScience

Read Blog
February 16, 2022

Tetra Partner Network: An Interview with Klemen Zupancic, CEO, SciNote

Read Blog
February 4, 2022

Closing the Data Gap in Cancer Research

Read Blog
January 27, 2022

Waters & The Tetra Partner Network: Making Data Science Possible

Read Blog
December 16, 2021

Announcing Acquisition of Tetra Lab Monitoring Business by Elemental Machines

Read Blog
November 29, 2021

Move From Fractal to Flywheel with The Tetra Partner Network

Read Blog
March 26, 2021

How an IDS Complements Raw Experimental R&D Data in the Digital Lab

Read Blog
July 30, 2021

What is an R&D Data Cloud? (And Why Should You Care?)

Read Blog
March 26, 2021

What is a True Data Integration, Anyway?

Read Blog
June 1, 2020

Data Science Use Cases for the Digital Lab: Novel Analyses with Waters Empower CDS Data

Read Blog
April 20, 2022

Unlock the Power of Your ELN and LIMS

Read Blog
July 23, 2020

The Science Behind Trash Data

Read Blog
August 20, 2021

The 4 Keys to Unlock the Lab of the Future

Read Blog
September 29, 2021

TetraScience Achieves SOC 2 Type 2 Validation, Advances R&D Data Cloud GxP Compliance Capabilities

Read Blog
April 20, 2020

Round-up of Semantic Web thought leadership articles

Read Blog
May 11, 2021

R&D Data Cloud: Moving Your Digital Lab Beyond SDMS

Read Blog
September 10, 2021

Principles of Deep Learning Theory

Read Blog
July 8, 2020

Powering Bioprocessing 4.0 for Therapeutic Development

Read Blog
March 30, 2022

Why Biopharma Needs an End-to-End, Purpose-Built Platform for Scientific Data — Part 2

Read Blog
August 19, 2021

Part 2: How TetraScience Approaches the Challenge of Scaling True Scientific Data Integrations

Read Blog
March 23, 2022

Why Biopharma Needs an End-to-End, Purpose-Built Platform for Scientific Data — Part 1

Read Blog
January 18, 2021

New Matter: Inside the Minds of SLAS Scientists Podcast

Read Blog
June 29, 2020

Enabling Compliance in GxP Labs

Read Blog
May 14, 2020

LRIG-New England: Lunchtime Virtual Rapid Fire Event - May 26, 2020

Read Blog
June 10, 2020

Remote Lab Scheduling is No Longer Optional, it is a Requirement

Read Blog
August 2, 2020

Incident Reporting for GxP Compliance

Read Blog
October 15, 2020

Protein Purification with Cytiva UNICORN: Enhanced Analytics through Harmonization and Integration

Read Blog
July 29, 2020

Cloud-based Data Management with Lab Automation: HighRes Biosolutions Cellario + TetraScience

Read Blog
August 20, 2020

Understanding Why Freezer Temperatures May Not Be Uniform

Read Blog
July 14, 2021

Find Experimental Data Faster with Google-Like Search in Tetra Data Platform 3.1 Release

Read Blog
July 22, 2021

Experimental Data in Life Sciences R&D — It’s How Many Copies of Jaws?!

Read Blog
April 26, 2020

The Digital Lab Needs an Intermediate Data Schema (IDS): a First Principle Analysis

Read Blog
April 6, 2020

TetraScience ADF Converter -- Delivering on the Promise of Allotrope and a Startup’s Journey

Read Blog
August 6, 2020

"Data Plumbing" for the Digital Lab

Read Blog
June 8, 2020

Data Automation for High-Throughput Screening with Dotmatics, Tecan, and PerkinElmer Envision

Read Blog
May 15, 2020

Applying Data Automation and Standards to Cell Counter Files

Read Blog
June 11, 2020

AWS Healthcare & Life Sciences Web Day | Virtual Industry Event

Read Blog
February 12, 2021

AWS Executive Conversations: Evolving R&D

Read Blog
April 15, 2021

Announcing Our Series B: The What, When, Why, Who, and Where

Read Blog
April 15, 2021

Announcing our Series B: The DNA Markers of Category Kings and Queens

Read Blog
April 15, 2021

Announcing our Series B: Tetra 1.0 and 2.0 | The Noise and the Signal

Read Blog
March 29, 2020

Allotrope Leaf Node Model — a Balance between Practical Solution and Semantics Compatibility

Read Blog
March 13, 2020

Choose the right alert set points for your freezers, refrigerators, and incubators

Read Blog
August 27, 2020

99 Problems, but an SDMS Ain't One

Read Blog