Graeme Robb, Associate Principal Scientist, AstraZeneca
The number of potential organic molecules that could exist is estimated to be more than 10^60, yet high-throughput screening (HTS) methods are restricted to 10^6 – 10^9 molecules, of which frequently less than 10^3 molecules will show any desired biological activity. For this approach to be successful, we must ensure our 10^6 subset of molecules is representative of the greater 10^60 set. However, the molecules in a pharma company’s historical collection are typically unrepresentative of the greater chemical space. Are there means of supplementing this set to be more representative? Following on from this challenge of drug discovery is: Which molecules should we design next in order to maximise information and minimise costly synthesis of new molecules? Normally a data scientist might consider design of experiments (DOE) to achieve this, but in the multi-dimensional world of chemicals, this is a challenging task. The unique combination of interactive visualisations, DOE capabilities and data manipulation tools within JMP enable us to incorporate chemically aware methods to systematically explore and assess large, complex data sets. In this way we analyse the existing data in order to determine what to make, so as to maximise input for the next iteration, accelerating progress in drug discovery.