Allotrope Leaf Node Model — a Balance between Practical Solution and Semantics Compatibility

Spin Wang
|
March 29, 2020

Using the Leaf Node model, scientists, software engineers, and data scientists can rapidly iterate on the data model, create and query standardized data sets, while semantic experts focus on the ontology and semantics. It’s a division of labor, allowing common activities to be easy & fast, while maintain compatible with semantic world.

An added benefit of the Leaf Node model is that it can be easily transformed to / from popular formats such as JSON, Tabular formats like CSV and Columnar formats Parquet. These formats make the data easily compatible with common software engineering, data sciences and visualization tools. Being interchangeable with popular software and data science ready formats is another major advantage of the Leaf Node model.

Written by: Spin Wang

Earlier this year, we discussed the basic concepts of Allotrope Data Format (ADF) in this blog post: Allotrope 101. One of the keys of Allotrope is its Data Description, a triple store leveraging the semantic web and Resource Description Framework (RDF) graphs.

Traditionally Allotrope data description is presented by what we call a “Full Graph” stored in RDF, something that looks like the following --

Full Graph

The goal of the full graph is very exciting, it captures the relationship of the entities. For example

  • material A has role of experiment sample
  • the experiment sample is realized in an HPLC injection
  • the HPLC injection has participant an autosampler

Such relationships can potentially allow the machine (computer software) to understand what is a sample, a chromatography injection, and autosampler and how they are related to each other in an abstract way, like a lab scientist. However, the downside is that it introduces a LOT of ontological, taxonomy and even philosophical complexity & overhead. In preparation for the machine to understand the data in the future, scientists now incur more overhead and get stuck.

Quickly the community realizes that it is non-trivial to build and use a Full Graph data model, for the following reasons

  • The semantic meaning and relationships of different entities are not something that a typical scientist would understand, significant semantic and ontology knowledge is needed to build, understand the Full Graph.
  • Lack of an effective validation mechanism. Namely, it is not yet possible (or at least extremely challenging) to break the Full Graph into smaller pieces, validate the each module separately, combine the modules and then ensure that validation still passes. In fact, the SHACL, the shape constraint language used to validate the Full Graph, was only recently published in summer 2017.
  • Even when the Full Graph is available, its complexity quickly becomes the barrier for scientists, data scientists, software developers and 3rd party software to consume the data. (Most of the vendors in the lab informatics market do not support graph, not to mention Full Graph.)

In light of this observation, the community proposed the concept of Leaf Node model, which is the theme of this blogpost.

Essentially, the Leaf Node model says -- “What if we only focus on the Leaves of the Full Graph, namely those nodes that are directly associated with the data?”.

This allows the scientists, software developers, and data scientists to quickly zoom into what is the important part of the graph, the actual data fields, and their values. Example Leaf Nodes are listed below. They look like

  • Experiment name is “my test”
  • Sample id is “123”
  • Sample batch barcode is "xyz"
  • The injection volume is 1 microliter
  • Cell viability is 60.5 percent

When represented in RDF and saved in ADF, Leaf Node is composed of the following triples.

Leaf Node

Leaf Node model is essentially whole bunch of Leaf Nodes. Leaf Node model also supports an array of Leaf Nodes which are related to each other, such as an array of chromatogram peaks. We will leave this to future articles to explain in more detail.

You will probably ask: how about the semantic meaning and relationship? How is the information such as “sample is input to an experiment” captured and how can we tell the machine to understand what is a “sample”?

The answer to this actually quite delightful, since an IRI is attached to node (for example, result#AFR_0001111 is attached to cell viability), and because that IRI is already part of an ontology, then the relationship between the nodes have already been rigorously and elegantly defined in the ontology. That IRI serves as a semantic hook or label, that explains what is “viability” and bridges the Leaf Node graph with the ontology. If you were to want a “Full Graph”, simply combine the Leaf Node graph with the ontology.

With this approach, lab scientists, software engineers, and data scientists can rapidly iterate on the data model, easily create standardized data sets, easily query the data sets. While the knowledge engineers and semantic experts can focus on the development of the ontology, which is the right place to describe what is a sample and it can be part of an experiment.

Leaf Node approach is essentially a division of labor, allowing what should be easy & fast to be actually easy & fast, while enabling compatibility with the semantic world via the IRI. The delineation of data (captured in leaf node) and semantics (captured in the ontology) enables scientists, software engineers and data scientists to easily create, validate and read the data, while sementic experts can focus on the ontology.

An added benefit of the Leaf Node model is that it can be easily transformed to / from popular formats such as JSON, Tabular formats like CSV and Columnar formats Parquet. These formats make the data easily compatible with common software engineering, data sciences and visualization tools. Being interchangeable with popular software and data science ready formats is another major advantage of the Leaf Node model.

{
 "experiment": {
   "name": "my test"
 },
 "sample": {
   "id": "123"
 },
 "injection": {
   "volume": {
     "value": 1,
     "unit": "Microliter"
   }
 }
}
EXPERIMENT_NAMESAMPLE_IDINJECTION_VOLUME_VALUEINJECTION_VOLUME_UNITmy test1231Microliter

To find out more about how you can use Allotrope Leaf Node Model and automatically standardize your lab data for analytics, automation and archiving, please reach out to us at www.tetrascience.com/contact-us!

Share this article

Previous post

There is no previous post
Back to all posts
September 15, 2022

Creating a Treasure Trove of Scientific Insights

Read Blog
September 8, 2022

Pragmatic Compliance Solutions: Adding Value Effectively to GxP

Read Blog
August 10, 2022

Reinvented Resource Management Powers Innovation Cycles

Read Blog
August 11, 2022

Introducing Tetra Data Platform v3.3

Read Blog
August 4, 2022

Automating qPCR Workflows for Better Scientific Outcomes

Read Blog
July 28, 2022

3 Ghosts of Data Past (and how to eliminate them)

Read Blog
July 26, 2022

Science at Your Fingertips - Across the Enterprise

Read Blog
July 22, 2022

Building The Digital CDMO with TetraScience

Read Blog
June 27, 2022

Barrier Busting: Bringing ELN and LIMS Scientific Data Together

Read Blog
May 31, 2022

Committed to Curing Diabetes

Read Blog
May 23, 2022

New Frontiers: World’s First Community-Driven AI Store for Biology

Read Blog
May 18, 2022

Tetra Blasts Off at Boston’s Bio-IT World

Read Blog
May 9, 2022

Give Your in vivo Data the Attention it Deserves

Read Blog
May 2, 2022

Customizing Digital Lab Experiences With Ease

Read Blog
April 14, 2022

Sharing a Vision and Deep Customer Commitment

Read Blog
April 11, 2022

Escaping the Scientific Data Quagmire

Read Blog
April 1, 2022

Innovating with a HoloLens and Drones

Read Blog
April 6, 2022

Digital Twins: Seeing Double with a Predictive Eye

Read Blog
March 28, 2022

Automated Anomaly Detection and Correction

Read Blog
March 30, 2022

Making Labs More Efficient

Read Blog
March 4, 2022

Introducing Tetra Data Platform v3.2

Read Blog
March 2, 2022

Are you prepared to utilize ML/AI and Data Visualization?

Read Blog
February 22, 2022

SLAS 2022: The Industry’s “Hyped” for Accessible and Actionable Scientific Data

Read Blog
February 21, 2022

BIOVIA partners with TetraScience

Read Blog
February 16, 2022

Tetra Partner Network: An Interview with Klemen Zupancic, CEO, SciNote

Read Blog
February 4, 2022

Closing the Data Gap in Cancer Research

Read Blog
January 27, 2022

Waters & The Tetra Partner Network: Making Data Science Possible

Read Blog
December 16, 2021

Announcing Acquisition of Tetra Lab Monitoring Business by Elemental Machines

Read Blog
November 29, 2021

Move From Fractal to Flywheel with The Tetra Partner Network

Read Blog
March 26, 2021

How an IDS Complements Raw Experimental R&D Data in the Digital Lab

Read Blog
July 30, 2021

What is an R&D Data Cloud? (And Why Should You Care?)

Read Blog
March 26, 2021

What is a True Data Integration, Anyway?

Read Blog
June 1, 2020

Data Science Use Cases for the Digital Lab: Novel Analyses with Waters Empower CDS Data

Read Blog
April 20, 2022

Unlock the Power of Your ELN and LIMS

Read Blog
July 23, 2020

The Science Behind Trash Data

Read Blog
August 20, 2021

The 4 Keys to Unlock the Lab of the Future

Read Blog
September 29, 2021

TetraScience Achieves SOC 2 Type 2 Validation, Advances R&D Data Cloud GxP Compliance Capabilities

Read Blog
April 20, 2020

Round-up of Semantic Web thought leadership articles

Read Blog
May 11, 2021

R&D Data Cloud: Moving Your Digital Lab Beyond SDMS

Read Blog
September 10, 2021

Principles of Deep Learning Theory

Read Blog
July 8, 2020

Powering Bioprocessing 4.0 for Therapeutic Development

Read Blog
March 30, 2022

Why Biopharma Needs an End-to-End, Purpose-Built Platform for Scientific Data — Part 2

Read Blog
August 19, 2021

Part 2: How TetraScience Approaches the Challenge of Scaling True Scientific Data Integrations

Read Blog
March 23, 2022

Why Biopharma Needs an End-to-End, Purpose-Built Platform for Scientific Data — Part 1

Read Blog
January 18, 2021

New Matter: Inside the Minds of SLAS Scientists Podcast

Read Blog
June 29, 2020

Enabling Compliance in GxP Labs

Read Blog
May 14, 2020

LRIG-New England: Lunchtime Virtual Rapid Fire Event - May 26, 2020

Read Blog
June 10, 2020

Remote Lab Scheduling is No Longer Optional, it is a Requirement

Read Blog
August 2, 2020

Incident Reporting for GxP Compliance

Read Blog
October 15, 2020

Protein Purification with Cytiva UNICORN: Enhanced Analytics through Harmonization and Integration

Read Blog
July 29, 2020

Cloud-based Data Management with Lab Automation: HighRes Biosolutions Cellario + TetraScience

Read Blog
August 20, 2020

Understanding Why Freezer Temperatures May Not Be Uniform

Read Blog
July 14, 2021

Find Experimental Data Faster with Google-Like Search in Tetra Data Platform 3.1 Release

Read Blog
July 22, 2021

Experimental Data in Life Sciences R&D — It’s How Many Copies of Jaws?!

Read Blog
April 26, 2020

The Digital Lab Needs an Intermediate Data Schema (IDS): a First Principle Analysis

Read Blog
April 6, 2020

TetraScience ADF Converter -- Delivering on the Promise of Allotrope and a Startup’s Journey

Read Blog
August 6, 2020

"Data Plumbing" for the Digital Lab

Read Blog
June 8, 2020

Data Automation for High-Throughput Screening with Dotmatics, Tecan, and PerkinElmer Envision

Read Blog
May 15, 2020

Applying Data Automation and Standards to Cell Counter Files

Read Blog
June 11, 2020

AWS Healthcare & Life Sciences Web Day | Virtual Industry Event

Read Blog
February 12, 2021

AWS Executive Conversations: Evolving R&D

Read Blog
April 15, 2021

Announcing Our Series B: The What, When, Why, Who, and Where

Read Blog
April 15, 2021

Announcing our Series B: The DNA Markers of Category Kings and Queens

Read Blog
April 15, 2021

Announcing our Series B: Tetra 1.0 and 2.0 | The Noise and the Signal

Read Blog
March 29, 2020

Allotrope Leaf Node Model — a Balance between Practical Solution and Semantics Compatibility

Read Blog
March 13, 2020

Choose the right alert set points for your freezers, refrigerators, and incubators

Read Blog
August 27, 2020

99 Problems, but an SDMS Ain't One

Read Blog