Data benchmark

Overview

After defining your assays, uploading data, and setting objectives, you can run a benchmark to evaluate how well your data fits Cradle's models. This process provides insights into your data quantity, quality and how well Cradle's models fit.

What is a data benchmark?

The data benchmark results in a report with insightful metrics and graphs that indicate:

  • Whether the data quantity and quality is sufficient for machine learning.

  • The performance of Cradle's machine learning algorithms in learning from your data.

This process helps both you and Cradle determine how well Cradle's models learn from your data and whether the project is worth pursuing.

How does the benchmark assess data?

The type of analyses that the benchmark will run is dependent on your dataset and what type of assays you are measuring.

For data quantity, the benchmark can look at a number of rounds and data points. For data quality, the benchmark can look at the way you use replicates as well as correlation metrics of your data points.

The report will provide a detailed assessment of your data. If you want to learn more about preparing your data for machine learning see Data guidelines.

How does the benchmark assess Cradle's performance?

Cradle will use your dataset to test how well we can predict the performance that the platform generates.

To do this, the benchmark creates a train/test split:

  • Train. The benchmark uses a portion of your dataset to train the predictor model, allowing it to learn about your protein of interest and its associated performance scores.

  • Test. The benchmark reserves a portion of your dataset that remains unseen by the predictor model during training. Once the model is trained, the benchmark evaluates its performance using this test data.

By doing so, the benchmark can evaluate the performance of Cradle's algorithms in learning from your data.

How to run a benchmark

After your data is uploaded, click on Start benchmark. The platform will show you how much time it will take to run the benchmark.

Report & results

Once the benchmark report is ready you will receive a notification from Cradle. Log in to the platform to view your report.

Note: the benchmark report is different for each customer and tailored to your experimental data

The benchmark helps indicate the following possible results and next steps:

  • Good fit: The quantity and quality of the data are good enough to enable strong performance of machine learning models. We suggest moving forward with generating & testing sequences for this project.

  • Discussion needed: The quantity or quality of the data does not enable us to train a machine learning model that has good performance. We either need to find more data, use the data in a different way, or start the project with a zero-shot round instead (where sequences are generated from a large model of the generic protein landscape). The Customer Success and Machine Learning teams will have an in-depth discussion with you to make a decision.

Next steps

After you have completed the benchmark and want to use Cradle for your project, configure Rounds and proceed to Sequence generation.

Last updated