In Part one of this blog, we discussed some of the reasons DIY data platform projects often fail to deliver promised benefits, or do so at unexpectedly high cost.
To review: building a do-it-yourself (DIY) data solution from non-R&D-focused components means assuming responsibility to select, integrate, and lifecycle-manage all the pieces and parts, and researching, architecting, building, and maintaining all the integrations. That's a tall order, requiring headcount and specialized skills — mostly focused on building and operating and getting data into the platform, and much less focused on extracting maximum value from the data itself.
Making matters worse, there's an "impedance mismatch" between the capabilities offered by generic data components and services (ingestion, transformation, cloud storage and search, etc.) and biopharma infrastructure, workflow, regulatory, and scientific characteristics and requirements. So a DIY project that aims to create a quality solution needs to fill the gap: providing the needed scientific and process knowledge to extract, parse, and enrich data close to its sources (data without context has limited utility), and map data into an ontology/schema that makes it readily findable, accessible, interoperable, and reusable (FAIR).
The result: DIY efforts can easily consume vast resources and compel non-strategic organizational spread, while delivering sub-par benefits: e.g., brittle, inflexible, hard-to-maintain integrations between data sources and targets (logically equivalent to one-off, point-to-point integrations, despite traversing a transformation tier and data lake/warehouse), and aggregated data that's devoid of context and non-schematized, so hard to find and use for automation, analysis, visualization, and discovery.
To meet these challenges requires a different approach. The Tetra R&D Data Cloud represents a fundamental shift in terms of how to bridge life sciences data sources and data targets to accelerate R&D.
Data-centric: treating data as the core asset, providing stewardship of data through its whole life cycle. Tetra R&D Data Cloud is architected to treat data as the continual focus: from acquisition, harmonization, data management, data processing, through to preparing the data for AI/ML and pushing to informatics/automation applications. TDP includes:
This data-centric architecture (plus the data-centric focus applied in building true R&D data integrations) ensures that:
Cloud-native: The Tetra R&D Data Cloud incorporates best-of-breed open source components and technologies (e.g., JSON, Parquet) and popular standards favored by scientists and by life sciences and data sciences professionals (e.g., SQL, Python, Jupyter) in an aggressively cloud-native architecture that ensures easy, flexible deployment, resilience, security, scalability, high performance, and minimum operational overhead, while optimizing to provide lowest total cost of ownership.
R&D-focused: with connectivity, integration and data models purpose built for experimental data at the core. TetraScience has created a large (and growing), broadly-skilled (software development, cloud computing, and life sciences), and disciplined organization, and evolved a mature process for identifying, building, and maintaining a library of fully-productized integrations to biopharma R&D data sources and targets, and also creating data models for common data sets. These integrations (including the associated data models, comprising the open IDS) are purpose-built and tailored to fulfill informatics and data analytics use cases in R&D.
Open and vendor-agnostic, leveraging a broad partner network: TetraScience has partnered and actively collaborates industry’s leading instrument and informatics software providers. This partner network collaborates closely, benefiting all the network members (and TetraScience customers) as our collective ecosystem grows. Partnering this way significantly accelerates integration development and productization, helps ensure integration quality, keeps integrations in sync with product updates, and helps guarantee that integrations fully support high-priority, real-world customer use-cases.
Biopharma can best exploit its most important asset — R&D data — by implementing a purpose-built, end-to-end solution that's data-centric, cloud-native, R&D-focused, and open. Hewing closely to these principles, Tetra Data Platform can help reduce non-strategic organizational spread, letting R&D IT organizations focus on adding value and pioneering new applications that accelerate discovery. At the same time, TDP can deliver to scientists, data scientists and other end-users a self-service, unified data experience — where data experts are in control of processing and data modeling, and can configure, manage, and track dataflows from end-to-end.