John Conway of 20/15 Visioneers has an analogy between R&D organizations and a builDing. The capital D is not a typo - he argues that the word is active (builDing) instead of passive (builTing) because it is never really finished. There will always be some kind of upgrade, improvement, enhancement, or repair. He talks about power - electricity and networks that allow for communication. He talks about virtual capabilities like software and advanced analytics. He talks about infrastructure as the critical foundation that protects and allows for proper existence. He also talks about plumbing as the processes and instruments that produce and move data. Here at TetraScience, we think this analogy is spot on, and we are inspired to dive deeper into the concept of "data plumbing."
When you think about plumbing for your home, running water, dishwashers, washing machines, etc. is what what pops into your head. At face value, it doesn't seem too complicated. However, plumbing can be incredibly complex when you think about how it needs to work in a NYC skyscraper, sports stadium, or the New England Aquarium. For biotech and pharmaceutical companies today, data flow in the lab is equally - if not more - complex than the water plumbing systems in those buildings. In the lab, there are heterogeneous instruments - different brands and models producing disparate file formats. There are distributed partners, such as CROs and CDMOs, that are doing outsourced research. Different applications are also found in the lab, such as registration, inventory, ELN, LIMS, and home-grown applications. There are many workflows that scientists are running, different kinds of experiments and assays, and increasingly, different visualization and analysis tools that data scientists are introducing into the ecosystem.
We define lab data plumbing as the collection, cleansing, harmonization, and movement of all lab data.
Just like the plumbing in a building, it can be messy and dirty. It involves the unrestricted flow of a valuable substance, (information or water), through a complex system that can be simple to start, but quickly becomes complicated. This requires special skill sets, tools, and careful design to properly implement.
Let’s consider some examples of data plumbing in today’s lab:
This type of primitive data plumbing is happening every day in every biopharma lab around the world. It isn’t automated, but it is already present - like manually pumping water out of a well instead of simply turning on the faucet.
Using building plumbing as inspiration, let’s think about some design requirements for a modern Digital Lab Data Plumbing system:
These are just some of the parallels we can pull from plumbing in a building. This can inspire to use proper design of a lab data plumbing system.
Of course, labs have their own unique challenges to consider:
Hopefully by now you agree that data flow in the Digital Lab is quite complex and quite important, and deserves a renovation. Just to be safe, let’s put some numbers to the concept.
Let’s go back to the 3 data plumbing examples we talked about: 1) CROs sending a large volume of files that need to be quality checked manually, 2) scientists copy-pasting into notebooks and then manually transcribing into ELNs, and 3) data scientists manually aggregating results. Customers tell us it is common for scientists to spend as much as 10-20 hours per week on these types of manual data wrangling activities.
If we do some quick, back-of-the-envelope math to scale this up to a large organization with thousands of scientists, it could look something like this:
This is a meaningful, sizable, fundamental problem in the lab. In terms of opportunity cost, what could your scientists do with that 1 million hours of productive or uninterrupted time? Could they run 10% or even 20% more experiments?
What could you data scientists do with clean, accessible, prepared data at their fingertips? What is the implication, or rippling effect, of transcription errors or the lack of reported information propagating downstream?
First, plan the data plumbing system for your lab as a first level architecture consideration, not as an afterthought. This does not mean that you have to be building a new lab. For your existing labs, the typical approach is to buy the instrument, buy the ELN, and then figure out how to connect them. In your building, you design the plumbing system knowing that there will be a dishwasher and a sink, even if the exact fixture hasn’t been purchased yet. Same for the lab, you know you will need to move data around, so plan a configurable “message bus” (for our IT readers) that is capable of plugging into a variety of instruments and applications. And then plug them in.
Second, START NOW! Don’t wait. Don’t let scientists and data scientists waste their time on data wrangling. Every little bit helps make a dent in that 1 million hours and $125 million dollars wasted every year. If the job seems daunting, just pick off one small component at a time. You are not in this alone! We are here to help. Follow our blog and our social media for tips, or check out our website to learn more about our data engineering platform that provides the data plumbing for the Digital Lab.
Follow TetraScience for ongoing updates about R&D data in life sciences and other related topics: