Preparing your data

Overview

Prepare a file in .csv, .xlsx or .tsv format with the required protein sequence, assay values, and sample ID columns. To provide additional context for the model, consider including columns for Round ID and Batch ID.

File requirements

Before uploading, structure your file with the required information. You can import data by uploading a .csv, .xlsx, or .tsv file with the following columns:

Round ID and Batch ID serve as additional data for our models to better understand the context of your measurements.

For guidelines on the quality and quantity of your data, see Data guidelines.

Protein sequence

This column contains each variant's raw amino acid sequences. If multiple measurements are taken for the same sequence, report them separately in distinct rows and do not group them (e.g., with an average):

Sequence strings should only include capital letters corresponding to the canonical amino acids. Follow these guidelines:

Supported characters

A C D E F G H I K L M N P Q R S T V W Y

Unsupported characters

-_ * . : ;

His-tag and other purification segments should not be included.

Assay values

Your data file should contain separate columns for each assay you want to import:

At least one assay values column must be included in the sheet. Each data cell must contain either a numeric value or be empty.

Non-numeric characters, such as %, NA, or -, are not supported.

Replicates

You can import replicate assay values by organizing them as either rows or columns.

To import replicates as rows, create multiple rows with the same protein sequence, each containing a different assay value. During upload, you’ll need to match the assay column with the corresponding assay.

To import replicates as columns, create separate columns for each replicate using unique column headers (e.g., Tm, Tm - Replicate). During import, you can assign multiple columns to the same assay.

Sample ID

A Sample ID is a unique identifier assigned to a physical protein sample, serving as the primary reference for all derived samples. It ensures that our machine learning models can accurately link assay values to the correct biological sample, which is essential for model training and improving data quality.

Including this identifier in the file you upload is required, as it cannot be generated or assigned within the platform.

Learn more in Sample ID.

Round ID and Batch ID

If your file contains multi-round data or measurements from multiple batches, including a Round ID column and/or a Batch ID column, as applicable, is required.

Learn more in Round ID and Batch.

Next steps

Once you have prepared your file, move on to Uploading your file.

PreviousImporting data NextUploading your file

Last updated 6 months ago

Was this helpful?