Skip to main content

Cradle's generation philosophy

Cradle generates plates, not sequences

By design, Cradle doesn't make it easy to work with individual protein sequences. Cradle requires users to commit to a plate size upfront and we design exactly that number of variants. When evaluating generated sequences in engineer or diversify, reports steer you away from focusing on individual variants and instead aim to display plate-level forecasts.

In the same spirit, Cradle doesn't expose a prediction service where you can query our models on a list of sequences, but you can generate a plate of variants from a pre-defined set you provide with select.

These constraints are intentional, and a consequence of one of our core principles: we use machine learning to guide experiment design end-to-end rather than providing granular information that doesn't align well with the end goal of protein optimization. Cradle is designed to maximize the chances that you reach your target product profile (TPP; the full set of properties your final protein needs to hit) as soon as possible, using high-throughput protein assays.

If you want to learn more about how Cradle is built with this end goal in mind, look out for our blog post due for publication shortly. In short, Cradle designs complete plates of 96, 192, 384 or more variants using probabilistic models that quantify uncertainty. This lets us assign a holistic value to an entire experiment rather than scoring individual designs. As a result we can jointly

  • balance multi-property optimization and trade-offs across assays
  • hedge risk by diversifying in terms of modeling assumptions
  • maximize exploration in early rounds

How this philosophy impacts how you should use Cradle

1. Provide your exact experimental capacity

When designing a plate, specify exactly how many variants you can realistically build and test. If you can screen 96 variants, indicate 96. If your capacity is 48 variants, fill in 48.

This isn't just bookkeeping: the optimization process makes different choices depending on whether it's filling 24 wells or 384 wells when balancing defensive, risk-aware choices against risky bets.

If you don't have capacity for the full plate: Whether your throughput has changed since you generated the design, or you encountered unexpected issues with variant expression or degradation, the solution is the same: regenerate a new plate at your actual capacity. Don't subsample an existing design.

If you have N non-negotiable controls you want to test: use the ability to document these in engineer in order to generate 96-N variants for your test. Our generators will take the true experimental budget for ML-generated designs into account as well as account for these controls when estimating diversity.

If you want to mix Cradle with other designs: also include these! This will let Cradle know about what's already being tested and it will compensate by exploring other regions of protein space.

2. Make your design constraints and TPP as complete as possible upfront

The theme here is clear: post-hoc fine-tuning is strongly discouraged. The more completely you can specify your constraints, requirements, and capabilities when setting up a design, the better Cradle can optimize the plate-level composition.

Don't hold back defining constraints thinking "I'll fix for those later." Don't simplify your requirements thinking "close enough." The design algorithm is sophisticated enough to handle complex, multi-faceted specifications, but only if it knows about them (note that computational objectives can be defined as custom predictors). The more assays you can run from the start to define a complete TPP, the more efficient Cradle will be at going straight toward that goal.

Your target product profile, your experimental constraints, your prior knowledge about liabilities or requirements; all of this should go into the system upfront. That's how you will get the most out of Cradle.

3. Don't cherry-pick variants from designed plates

We strongly discourage selecting a subset of variants from a designed plate. For instance, picking the "top 10" from a 96-variant design. There are two common reasons people want to do this, and Cradle has a better solution for both:

If you need to reject variants because you spot potential liabilities: This is valuable domain knowledge! But post-hoc filtering isn't the right place to apply it. Instead, input these requirements directly into the system when you set up your design.

Cradle can accommodate sequence constraints, physicochemical requirements, developability filters, and other considerations, but only if we know about them upfront. The plate-engineering algorithm can then navigate around these constraints while still balancing risk management, Pareto exploration, and learning opportunities across the full plate. Remember that if you have custom prediction tools for liabilities, we can onboard them as custom predictors.

If some of your rejection criteria are genuinely not suitable for an automated approach, prefer this human-in-the-loop filtering workflow

human-in-the-loop filtering with `engineer`

Once again, a 48-variant plate is fundamentally different from 48 variants selected from a 96-variant plate. The algorithm will rebalance how it manages risks, explores trade-offs, and allocates learning opportunities given the new budget constraint.

4. Embrace more elaborate plate definitions

Cradle's internal lab uses our ML tasks to deal with plate design with a rather intricate workflow. In a typical internal design round, we'll use engineer to over-generate variants, ensuring that we'll get at least 96 transformed colonies to test. If everything runs well, we have an abundance of choice and need to pick a maximally useful set of 96 wells, including a handful of replicated measurements that we consider the most promising and therefore deserve extra attention. This can entirely be automated through Cradle's select task!

An even cooler workflow