Self-Service Pipelines democratize the creation and usage of data pipelines

December 15, 2023
Sean Johnston

TetraScience provides many pre-built pipelines that can help you create queryable, harmonized Tetra Data, and then enrich and push that data to downstream systems. To extend these capabilities of the Tetra Data Platform (TDP), Self Service Pipelines (SSPs) empower the creation of custom task-scripts and protocols for all customers in TDP v3.6.  SSPs add simplicity and flexibility to the user experience, enabling customers to independently adapt processes to address an evolving landscape of analytical tools and data systems.

Your data, your way

Whether transforming raw instrument data, adding data labels, enriching data with external sources, or transferring processed data to another application, the power to create custom solutions is now in your hands.  Task-scripts are the building blocks of every data journey, and all customers can now create new building blocks for custom pipelines with TDP v3.6. They are assembled by the protocol and contain python code that performs a specific function (e.g., parse an instrument file, transform data, or push files to 3rd party software). Custom task-scripts allow scientists to easily create their own building blocks to dynamically integrate elements of the modern laboratory.

Simple design for complex processes

Data journeys take many paths, from simple step logging to elaborate transformations and bidirectional connections.  There is no one-size-fits-all solution. The business logic of your data pipeline is specified within its protocol. Protocols assemble building blocks into custom pipelines to automate business processes that are adaptable to most complex workflows.  In TDP v3.6, protocol design has been condensed to a single YAML file (protocol.yml), which specifies configuration elements and outlines the execution order of task-script functions.  The YAML protocol format eliminates the previous javascript format and incorporates JSON into one simple, human readable file. 

Building at your convenience

With busy schedules and pressing deadlines, self sufficiency rules the day.  TetraScience provides a Context Application Programming Interface (API) and Software Development Kit (SDK) that enable customers to build and tailor the software to their needs or create connectors to their services.  Using the TetraScience SDK, you can deploy new functionality to the TDP on your schedule.  The TetraScience SDK is used to push custom task-scripts and protocols to the TDP and has been updated to SDK v2.0 to support the new YAML protocol format.

TetraScience SDK is used to push custom task-scripts and protocols to TDP.

Whether you wish to create new parsers for scientific instrument data, add your own labels to data to make it searchable, enrich Tetra Data with metadata from third-party sources, automate data reprocessing, send processed data to other applications, or even create a multi-step pipeline to contextualize, harmonize, enrich, and push data to third-party applications within a single protocol- the power to innovate is in your hands!

A closer look

A protocol in the new YAML format consists of a single file with a header, configuration details, and human readable JSON with pointers to a task-script for each step in the process.  This unified format eliminates the need of a separate javascript file. The task-script is made up of two files:  config.json, which provides context for the task-script including programming language and a list of available functions, and a script file that holds the code. The example below shows a Python script file. For multi-step protocols, task-script outputs can be leveraged in subsequent steps, affording continuous processing through a single protocol.

Single step protocol and task-script to contextualize data with labels.
Flow chart of a multi-step protocol to contextualize, harmonize, enrich, and push data to an external application.

See how creating custom pipelines allows engineers to transform raw scientific data into AI-ready datasets

For full details on Self-Service Pipelines, read release notes.