logoalt Hacker News

An idiot’s guide to lead optimisation for proteins

160 pointsby magni121last Monday at 11:21 AM16 commentsview on HN

Comments

patrickkidgeryesterday at 4:24 PM

Oh heck, this is awesome to see on the front page! I wrote the underlying Cradle-1 paper that is being discussed!

I used to work for Cradle and writing this paper was the last thing I did before leaving – on good terms – to found my own startup. :D And we'll 100% be using Cradle for our lead optimization.

(On the off-chance: I'm at PEGS Boston this week chatting all things AI+antibodies, in particular for rare diseases. If this topic is of interest to any other protein+tech geeks here then send me an email, let's grab coffee.)

thadkyesterday at 11:10 PM

Anyone else read this as "An idiot's guide to Pb optimization for proteins," as in avoiding contaminated dietary protein isolates?

show 3 replies
theophrastusyesterday at 3:36 PM

After spending an entire career doing 'by hand' (and a helluva lot of molecular orbital calculations) on the problem this post is about, i've got to tersely weigh in with: there's (still) not enough available data given the size of protein 'phase space' to hope for a proper covering with one's trained up linear algebra model. Or typed another way: you've got to include at some stage some physical modeling parameters, like molecular orbitals [1], otherwise the 'response curve' will only optimize if one gets quite lucky, (which is actually unlucky as then you'll delude yourself into thinking it's a generally applicable, which it isn't). For instance, swap in a carboxylic acid moiety where there was previously an aldehyde, a protein side-chain flips over, and you're in a completely different corner of the energetic 'galaxy'.

[1] e.g. https://proteindf.github.io/

show 2 replies
the__alchemistyesterday at 5:38 PM

It sounds like this is mostly (or exclusively?) operating directly on AA seqs. I wonder what the upper limit of capability this is for the intended use case. As in, without incorporating the 3d chemistry or spacial reasoning. E.g. classical MD, DFT etc like ORCA performs etc. Of particular interest: Does this upper bound (assuming it exists; I suspect it does) preclude its utility in practical protein design/gen.

I speculate Cradle is taking the approach they are vs structural/spacial, as structure spacial models don't work very well on big molecules like proteins! (And/or are too slow; errors accumulate over space etc)

BigTTYGothGFtoday at 1:12 AM

> amino acids of which there are 20 different types

20 different types coded for, but once you get into PTMs that number goes way up.

dnauticsyesterday at 5:28 PM

how many therapeutic proteins are there that aren't mabs or ~naturally occurring proteins (insulin, modified insulins, hirudin, cerezyme etc)?

I can think of:

etanercept

show 1 reply
evaluyesterday at 6:36 PM

future of protein engineering?