Automation is more than robots.
High-throughput screening (HTS) methods are used extensively in the pharmaceutical industry, leveraging robotics and automation to quickly test the biological or biochemical activity of a large number of chemical and/or biological compounds. The primary goal of HTS is to identify high-quality 'hits' that are active at a fairly low concentration and that have a novel structure. Hits generated during the HTS can then be used as the starting point for following ‘hit to lead’ drug discovery effort.
Automation has played a huge role in the development of HTS to date. Tools like the automatic liquid handler and robotic, high-throughput plate readers significantly improve compound screening efficiency and consistency. Robotic automation has transformed the physical aspect of the process. However, experiment data sets remain isolated from one another, requiring manual data acquisition and handling for data storage and analysis. Experiments are automated, but data flow is not.
In order to truly reap the benefits of HTS, biopharma companies need to automate the accompanying data flow. They also need to connect the data with the rest of the Digital Lab to make it accessible and actionable. Otherwise, the vast volumes of generated data will be stuck in yet another silo.
This blog post identifies opportunities for improvement in an example HTS data workflow, based on our experience with biopharma customers, and offers our approach to evolving the HTS data flow to be as efficient and consistent as the physical process.
Analyzing Today's Manual Data Flow
The following diagram illustrates an example biopharma customer's HTS assay workflow for small molecule compound libraries (edited for confidentiality), leveraging best-in-class instruments and tools in the market.
Diagram 1: Example High-Throughput Screening Workflow
Let's break down this example HTS data flow in R&D labs today, based on our experience working with top pharma and biotech companies:
Step 1: Scientists register the compounds in a compound registry, like Dotmatics Register, generating the experiment ID and compound ID. Scientists also enter compound information, such as molecular weight and initial amount, into an ELN like Dotmatics Studies.
Step 2: Scientists create child samples by sample dissolution or aliquoting. Each child sample is identified by a unique barcode in the sample inventory software like Titian Mosaic. The sample inventory system contains extensive information such as parent sample ID, batch ID, amount, solvent type, sample volume, location in the freezer, etc.
Step 3A: After compound registration and child sample preparation, scientists enter the compounds’ information into the liquid handler software to set up the assay plate in the liquid handler workstation, like Tecan. The sample plate output file information is saved locally to the lab Windows computer.
Step 3B: The HTS assay is incubated based on assay design protocol and detected on a plate reader, like Perkin Elmer Envision. The assay result is also saved locally to the lab Windows computer.
Step 4: After the experiment finishes, the scientist manually moves the files back to the office to analyze the assay data with GraphPad to give either IC50 or target binding information.
Step 5: The analysis result is manually updated in the ELN, like Dotmatics Studies, to complete the experiment.
The key takeaway here is that while the robots are automating the physical component of the experiment, there are multiple instruments and software systems involved in the process and data needs to flow seamlessly into and out of all of them. Today, this is not the case. The Dotmatics components may integrate together since they are part of the same product portfolio. The others may have some point-to-point integrations that you can set up, or APIs that you can use to write your own integrations, but this is real work, needs to be maintained, and ultimately is not scalable. We need an easy way to connect all of these instruments and software systems to a common network to knock down the data silos and get the data flowing, both within this work process and across the broader R&D data ecosystem.
Optimizing HTS Data Flow
The TetraScience Platform is that common network. Connecting the instruments and software systems needed to conduct HTS transforms the manual steps of data entry, processing, and transfer into an automated solution, saving time, reducing errors, and increasing throughput. It also harmonizes and transforms the data, preparing it for data science, AI, and other advanced analytics - we'll get to this at the end.
Let's take a look at how it works.
Diagram 2: Automating the High-Throughput Screening Data Workflow
Steps 1 and 2: Scientists register the compounds in the compound registry, as before. Except now, the TetraScience Dotmatics connector automatically detects new or modified compounds in the Dotmatics Register and triggers a pipeline to automatically push the information to the inventory management software. As part of this process, the data is also now available to query via RESTful API in the TetraScience Data Lake.
Steps 3A, 3B, 4, and 5: After setting up the assay plates with the liquid handler, scientists run the assay and read the assay readout with various types of plate readers. The liquid handler output file contains sample plate information, including the sample concentration of each well. The plate reader file contains the assay readout. The TetraScience File connector automatically detects the files produced by Tecan and the Envision plate reader, moves the raw instrument files into the Data Lake and then triggers pipelines to parse, merge, and push to Dotmatics Studies. IC50 is then automatically calculated.
Image: Automatically generated IC50 calculation results
In this optimized workflow, only one manual data workflow step remains - initiating the experiment by registering the compound. Scientists also have to physically set up the experiment, but once the experiment is complete, all the data automatically appears in the ELN, with the calculation results shown above completed, as well as in the Data Lake, ready for further querying and analysis.
The TetraScience Data Integration Platform automates the HTS data workflow, providing greater efficiency by removing painful manual data handling and processing from scientists' daily work and by improving data integrity.
Let's compare the two processes side-by-side:
Beyond Data Automation: Data Science
Now that the data workflow accompanying the high-throughput screening process is automated, what's next? Compound information, sample information, type of assays performed, and screening results are now centralized and harmonized in the TetraScience Platform. This seems like a prime opportunity to apply some data science! Check out a related blog post about our Intermediate Data Schema (IDS) to learn more about how we harmonize disparate data, knocking down the data silos. IDS is the open standards method we use to seamlessly move data between and across all the different HTS instruments and software systems, unifying the unique data structure and format from each.
A benefit of the centralized, harmonized data is that it is also prepared for use with various data science and data analytics tools such as Spotfire, Tableau, or Dotmatics Vortex. Our open standards approach means that scientists and data scientists can use the software, platforms, and languages they already know and use - no need to install or learn something new.
Diagram 3: Applying Data Science to High-Throughput Screening Data
You can now fully utilize your HTS data, including querying and visualization of data sets, using your existing data science and analytics tools. For example, scientists can easily query and visualize all active compounds at a certain threshold level in a particular screen, or the behavior of all compounds of similar structure across different screens. Scientists can derive insights that develop more efficient HTS assays, design more active compound libraries, and significantly speed up the drug discovery process.
Watch this video to see the optimized data flow in action, enabled by the TetraScience Platform.