Design and Implementation of Informatic Systems Enabling High-throughput Cell Line Development Modeling and Optimization


Our objective was to outline a reproducible and scalable process of design and technology utilization to support the automation of cell line development at a major U.S. pharmaceutical company.

The hypothesis was that automation can bring more accurate process modeling at scale and result in better optimization through analytics of that modeled process.


Bioprocess workflows such as cell line development, on average, require 15+ instruments for the collection of process parameters & result data. We present repeatable steps to implement a high-throughput system. We found component considerations in connection methods, cloud technology selection, data model design, ontology selection and data consumption mechanisms.

We review our:

  • Design process
  • Cloud technology selection within AWS
  • Data consumption mechanisms and use cases
  • Scale up conclusions


We leveraged a scientific workshop to outline existing cell development processes. Identifying available data through integration we engineered a technology solution to focus on methods of acquisition, storage, and processing data in the cloud. Overall method throughput was contrasted with manual process data. General flow presented in Fig-1.

1. Process Mapping Workshop

2. Technology Architecture

3. Modeling & Throughput Analysis

Fig-1. Stepwise method categories flow chart.


In person interviews of each scientist responsible for every step in the process help determine an inventory of systems, identify data fields important for analysis and determine where data is stored at present. Output seen in Fig-2.

Fig-2. Resulting process flow mapping from workshop, identifying stepwise data sources, data formats and storage locations.

Fig-3. Connection mechanisms chosen for each instrument data source.


Getting data to the cloud to preform parsing, standardization & facilitate consumption means connections to each system must be setup. We identify methods of integrating each system in the process in Fig-3.

Parsing & Standardization

  • Parsers were developed as standalone container scrips implemented in AWS Lambda.
  • Data models and ontologies where possible were leveraged from Allotrope foundation. Most helpful in this was use of their chromatography data models for standardization of the AKTA system through UNICORN.
  • Standardization produced JSON files containing both meta-data and results


  • JSONs are indexed in the data lake using Elastic search, search interface is exposed through the API


  • Support for ML applications accomplished through S3 linking & JDBC
  • BI tools require tabular data, connection is via AWS Athena enabling SQL queries on structured JSONs stored in S3 preventing post processing on the tooling side
  • Automation into reporting & other systems is configured using specific API calls w/ JSON returns

Fig-4. Implemented architecture diagram including connections, cloud platform and services, and data consumption elements. ML & BI tools were key focus areas for providing the high-throughput analytic capabilities.


During the first several months of operation to date, only BI tools have been leveraged to process data. In a given week of process development work, the system saved 4-6 hours of time spent on manual data manipulation vs. the automated system.

The system exposed several new fields via direct integration and provided search-ability time savings advantages. Standardized ontologies and an unified data model enabled one set of filters executed in BI tools to display data across instrument types, which was not before possible without extensive manual data manipulation.

AWS Services including S3, Lambda, and Athena provide critical performance and cost advantages at scale. Providing low cost storage, on-demand processing and flattened data via SQL respectively.

Terms and Conditions