Friday, September 27, 2013

Biological screening hit rate

Please file this in the "for what it's worth" department.  I was going to use this in a paper I'm but I've decided to slant my argument in another direction.

Anyway, given a huge surge of activity in chemical biology screening in the last decade or so, what is the likelihood that a given tested compound will be a biological hit?  If you take PubChem at face value, the answer is about 1.1%: for ballpark purposes one might observe that as of Sept. 25, 2013, the PubChem database reported that 2,306,975 hits had been documented from among 207,698,183 bioassay data points.

I was going to then compare this with the old Yvonne Martin rule of thumb that a Tanimoto coefficient of molecular similarity of greater than 0.85 suggested a 30% chance that a pair of molecules would have analogous bioactivity.  Can we conclude that the Martin rule gives a 30-fold enhancement in hit detection over purely random pair selection?

Well, nah.  Maybe.  I dunno.  The fact of the matter is that much has changed in the 12 years since Martin made her assessment.  For starters, nearly the entire mammoth PubChem data collection came after then.  Couple that with revolutionary enhancement of screening collections, an ever-shifting definition of what constitutes a hit, emerging understanding of aggregation and reactivity as effectors of promiscuity, etc., and the fact of the matter is that we need a whole new study.

We also could use a new consensus way of defining molecular similarity, but that's another story.