Tuesday, 3 November 2009

Screening libraries: Sampling Chemical Space

<< previous || next >>

I am currently in Rapa Nui (aka Easter Island) and it seemed fitting to continue the series on compound library design from here since the first two posts have been from less commonly visited places like Asuncion and Tierra del Fuego. In the previous post I discussed 2D molecular similarity and showed how this can be used to define diversity and coverage, two important compound library characteristics. In general, compounds in a library need to be mutually diverse in order to provide good coverage although high diversity does not guarantee optimal coverage.

In this post, I’ll take you through an approach to library design called ‘Core and Layer’ (CaL). Although we used this to select compounds for generic fragment libraries and more specialised NMR screening libraries, the method is quite general and I have used it to design a compound library for black box cell screening and to select compounds to complement high throughput screens. The software tools (Flush and BigPicker) used to apply CaL were created at Zeneca by Dave Cosgrove and are described in our article in some detail. Although you might think that the tools were developed in order to apply the CaL method, things actually happened the other way round and it was the availability of the software that led to CaL being adopted as an approach to library design.

Figure 1 shows a schematic view of CaL. The core consists of the compounds currently in the library at any point of the design process and a layer is a set of compounds that have been selected to be diverse with respect to the core. Once a layer has been selected, it is added to the core and the combined set of compounds becomes the new core. The process of selecting layers goes on until you’re either happy with the library or you run out of patience.

You’re probably thinking that this is a very tedious and time-consuming way to build a compound library and might ask whether it would be better to select a maximally diverse set of compounds in a single step. However, there are advantages in building up a library in this manner. In library design, all compounds are not equal and CaL allows you to bias compound selection in a highly-controlled manner. I’ll discuss fragment selection criteria in some detail in future posts in this series so please just assume for now that there are some fragments that you would prefer to have in your library than others. The initial core consists of a sampling of your favourite fragments and as you add layers the compounds in them become progressively less attractive. Another feature of CaL is that it provides a solution to the problem of selected compounds proving to be unavailable as can be the case when trying to source relatively large samples from commercial suppliers.

I think this is a good place to stop as it’s dinner time in Rapa Nui. CaL is an approach to biased sampling of chemical space but it doesn’t tell us about which regions of chemical space should be sampled preferentially. In the next posts of this series I’ll take a look at what makes one fragment better than other. On the travel front, I fly into Auckand in a couple of week’s time for a month and a half in New Zealand and expect to be around Melbourne for the first four months of the New Year. Feel free to get in touch if you’ve got fragment stuff that you’d like to discuss.

Literature cited

Blomberg et al, Design of compound libraries for fragment screening. JCAMD, 2009, 23, 513-525 DOI

Saturday, 3 October 2009

Screening Libraries: Diversity & Coverage

<< previous || next >>

I’m guessing that this may be the first blog post on screening library design to be written in Tierra del Fuego. The weather is currently rather unpleasant although less so than an hour ago when the snow was horizontal.

I introduced screening library design in the previous post with a generalised view of the work flow for fragment based lead generation. When selecting compounds for screening it can be helpful to think in terms of a chemical space in which all possible compounds (real or virtual) can be found. Now you’ve just got to sample the regions of chemical space that you like and you’ve got your library.

Life of course is not so easy. The main problem is that, despite the occasional claim to the contrary, nobody has found a convincing set of coordinates with which to describe chemical space usefully. You can sort of describe organic molecules by size and polarity without having worry about minor irritations like conformational flexibility, ionisation and tautomers. However, molecular recognition also depends on the shapes of molecules which, even for rigid species, are not so easy to turn into coordinates. Especially when you want these coordinates to be predictive of biological activity.

All is not lost since structurally similar molecules often have similar biological properties. One way that the similarity of a pair of compounds can be quantified is by comparing their molecular connection tables (the structures that you would write down on a piece of paper) and for this reason we sometimes talk about 2D similarity. There is no need for 3D molecular structures when you calculate molecular similarity in this way which means that there is no need to deal with conformations. Molecular fingerprints are used frequently to calculate similarity and the idea behind this is that the fingerprints encode the presence or absence of structural features in molecules. Many shared features suggest that two molecules are likely to be very similar. I’ll not go into the details of fingerprints in this post although you’ll be able to find some detailed discussion in our screening library design article.

The downside of 2D molecular similarity measures is that they are unlikely to reveal any but the most trivial shape match or pharmacophoric (e.g. oxadiazole replaces ester) similarity between molecules. This is not too much of a problem in library design because you’ll often to want select both molecules if they are based on different scaffolds, even if they can both orient their hydrogen bonding groups in a similar way. Once you’ve found some active compounds it becomes a very different game because now you’ll be looking for less obvious similarities between these actives, either to extract structure activity relationships or to define search queries.

So even though we don’t have a set of coordinates that defines chemical space in a way that is predictive of biological properties of molecules, we can still use molecular similarity to sample from a collection of compounds. Figure 1 illustrates how this sampling works and will give you an idea of what we mean by the terms diversity and coverage. The key thing to remember when looking at Figure 1 is that similar compounds are close to each other so there is an inverse relationship between distance and similarity. The stars are selected to cover the chemical space occupied by all the molecules and a star can’t cover its neighbourhood effectively the compounds in it are too far away

Although I needed put the molecules in particular positions (i.e. give them coordinates) to generate the graphic, you only need the distances between molecules to select representative subsets. In our paper we described in house software which can be used to do this and the two programs (Flush and BigPicker) are actually quite complementary to each other. Left to its own devices, BigPicker tends to select compounds with no near neighbours and we typically use Flush to ensure that the compounds that BigPicker is selecting from all have sufficient number of neighbours.

This is probably a good point at which to leave things. In the next post, I’ll describe the Core and Layer approach to selecting compounds for screening. This method is not specific to fragment libraries and in fact I’ve used it in work up of high throughput screening output and selection of compounds for cell-based assays.

Literature cited

Blomberg et al, Design of compound libraries for fragment screening. JCAMD, 2009, 23, 513-525 DOI

Grant & Pickup, A Gaussian Description of Molecular Shape. J. Phys. Chem. 1995, 99, 3503–3510 DOI

Wednesday, 22 July 2009

Screening Libraries: Introduction

|| next >>

Tonight seemed like a good time to start the series of posts on screening library design. For those of you who don’t know, I’m taking a year out to travel and am currently in Asuncion, Paraguay and it is currently raining heavily with excellent electrical activity. It’s not all holiday and I’ll be dropping in on friends in Brasil and Australia to see if I can make myself useful in their research groups. It’s also a good time to send best wishes to my friends who are currently attending the Gordon Research Conference on Computer Aided Drug Design.

The special issue of the Journal of Computer-Aided Molecular Design devoted to FBDD has just come out and this includes an article on screening library design that three of my friends and I put together. I’ll certainly be drawing on the article in the series of posts on screening library design but these should complement the article rather than reproduce it verbatim.

A good place to start is a graphic of the generalised work-flow for fragment based lead generation which illustrates the process from the perspective of the library designer. Both protein structures and known ligands can be used to select compounds for screening but selections can also be made generically. Note the colour-coding of the arrows which illustrate flows of compounds (red) and information (blue). You keep cycling around until you find something cool or your management runs out of patience.

The graphic represents a generalised work-flow and in some cases selections based on target and/or known ligands may not be made. There are a number of reasons why this may be the case. For example, there may not be any known potent ligands for the target. Also specialised screening technologies may require specialised library formatting which favours the use of generic screening libraries. Even with protein structures available, there is still a role for generic libraries because the current state of the art for fast prediction of binding affinity still falls short of what is required for selection of compounds for screening against an arbitrary target. That is not to say that docking, scoring and affinity prediction methods are completely useless but just that at this stage they represent a basket into which you would not want to put all your eggs. Deciding how many eggs to put in the ‘generic’ and ‘targeted’ baskets depends on how much you know (or think you know) about your target. My own view is that we’re still a long way off being able to usefully predict affinity and much of this is due to the difficulties in modelling the displacement of water molecules from contact with protein molecular surfaces. Time will tell and I’ll be delighted if somebody proves me wrong!

Literature cited

Blomberg et al, Design of compound libraries for fragment screening. JCAMD 2009, 23, 513-525 DOI

Sunday, 7 June 2009

Scaling potency by lipophilicity and molecular size

<< previous || next >>

I’m looking forward to posting on design of compound libraries for fragment screening. However, before launching that series I wanted to comment on an article entitled ‘The influence of lead discovery strategies on the properties of drug candidates’ that has already been reviewed by Dan.

My first comment is that the analysis uses a database of 335 hit-lead pairs from HTS. I think it’s quite difficult to define meaningful hit-lead pairs for HTS. I first got into analysis of HTS output a decade and a half ago and from the start we looked for groups of structurally similar compounds in the actives. The larger the group of similar compounds, the greater the interest since observation of these active clusters increases confidence that activity is real. I’m really not sure that hit-lead pairs can be defined meaningfully for HTS-derived leads, especially when journal articles are the primary information source.

That said, the main reason for this post is to take a closer look at ligand efficiency defined in terms of lipophilicity. The most obvious way to define a ligand efficiency metric in terms of lipophilicity is simple to subtract logP from pIC50. There is the issue of whether one should use logP or logD as the measure if lipophilicity but, particularly when using calculated partition coefficients, I think it’s best to use logP. Some of this was discussed in the AstraZeneca fragment based lead generation review from a couple of years ago although I’m sure that the idea of subtracting logP from pIC50 was not exactly new then. The difference (pIC50 – ClogP) has since become known as ligand lipophilicity efficiency (LLE).

Unlike molecular size measures of ligand efficiency, (pKd – logP) has a firm, although somewhat obscure, thermodynamic basis (at least for neutral molecules). The product (Kd x P) is an equilibrium constant in its own right. Just as Kd is a measure of the relative stabilities of bound and aqueous ligand (Kd x P) is a measure of the relative stabilities of bound ligand (in an aqueous medium) and ligand at its standard state in octanol. The product (Kd x P) and its negative logarithm (pKd – logP) both quantify the extent to which the ligand would ‘prefer’ to be bound to protein or solvated in octanol. It’s worth noting at this point that octanol is quite polar and alkane/water partition coefficients would probably represent a better measure of lipophilicity if they were more accessible. I’ll probably discuss this in more detail at some point in the future but for now you might want to take a look at a recent article on prediction of alkane/water partition coefficients because it reviews a lot of the earlier work.

The authors of the featured article assert that LLE does not include ligand efficiency. I don’t completely agree with that statement because one could say that LLE is a measure of how efficiently a ligand exploits its lipophilicity to bind to the target protein. However it is clear that no explicit measure of molecular size is used in the definition of LLE. The authors propose dividing logP by ligand efficiency (LE) and call this function LELP and suggest that this should be between -10 and 10 (no units specified) for acceptable leads.

I must confess that I just don’t get it. Using this metric, a compound with logP = 1 and LE = 0.1 (in whatever units they’re using) is equivalent to one of logP = 3 and LE = 0.3. Also a compound with logP = 0 becomes an acceptable lead even with a millimolar Kd. I’m not convinced that LELP gets the balance right between penalising size and lipophilicity.

I’m not a great fan of ligand efficiency metrics although they have a place for comparing hits from the same assay and I’ve certainly used them for that purpose. My favored approach to combining size and lipophilicity into a single efficiency measure would be to use one of the following two functions depending on which measure of affinity/potency is being used:

where HA is the number of non-hydrogen (heavy) atoms and measured or calculated values of logP can be used depending on which are available. It’s worth pointing out that the lipophilicity that gets measured is actually logD (typically at pH = 7.4) rather than logP. For neutral compounds the two are the same (unless you get self-association in one of the phases as might happen with a lactam in hydrocarbon) but for compounds that are ionised you’ll either need to know the pKa or determine logD as a function of pH in order to get logP.

Something this article flagged up for me is the difficulty in finding concise but meaningful names for ligand efficiency metrics. I think of efficiency as something associated with a process or action rather than an object and therefore prefer to talk about ‘binding efficiency’. The other benefit of doing this is that it reminds us that the efficiency is defined for the combination of ligand and assay system (protein, co-factors, substrates, buffer components etc) and not the ligand in isolation.

To get us thinking about how we might improve matters, I’ll make some suggestions. I think we should be using pIC50 or pKd to quantify binding since units of energy never seem to get quoted when binding free energies are used to define what I’ll now refer to as binding efficiency. Also we tend to think more in terms of pIC50 than energy when we talk about potency and binding. Here are three equations that we can use to define binding efficiency with the subscripts indicating the property or properties used to scale potency. I suggest calling each quantity binding efficiency by the appropriate property. For example equation 3 defines ‘binding efficiency by size and lipophilicity’.

Common measures of size include number of heavy atoms, molecular weight (molar mass), surface area and volume and equations 1 and 3 can be used with either provided that the appropriate units (Heavy atoms, g/mol, Da, square Angstrom, cubic Angstrom) are quoted. If the unit of size is included, it is possible to tell which measure of size has been used to scale potency. As an aside, note how potency, which has units of concentration, is converted into a dimensionless number by dividing by M (mol/litre). I’m a bit less happy with using subscript-L to denote lipophilicity because it doesn’t allow us to distinguish binding efficiencies derived using logP and logD. Nevertheless, this represents a start although it’s likely that other folk will have ideas that we can use to refine the definitions.

Literature cited

Kesuru & Makara, The influence of lead discovery strategies on the properties of drug candidates. Nature Rev. Drug Discov. 2009, 8, 203-212 DOI

Albert et al, An Integrated Approach to Fragment Based Lead Generation: Philosophy, Strategy and Case Studies from AstraZeneca's Drug Discovery Programs. Curr. Top. Med. Chem. 2007, 7, 1600-1629 Link

Leeson & Springthorpe, The influence of druglike concepts on decision-making in medicinal chemistry. Nature Rev. Drug Discov. 2007, 8, 203-212 DOI

Toulmin et al, Toward prediction of alkane/water partition coefficients J. Med. Chem. 2008, 51, 3720-3730 DOI

Tuesday, 14 April 2009

The upper limits of binding: Part 2

<< previous || next >>

Mel reviewed an important article on maximal affinity of ligands to kick off our sequence of posts on ligand efficiency. There are a number of reasons that this upper limit for potency might be observed and it's worth having a bit of think about them.

One interpretation of the upper limit is that it represents a validation of the molecular complexity concept. If a ligand makes many interactions with the protein they are less likely to be of ideal geometry. Hydrogen bonds between the binding partners and water are more likely to be of near-ideal geometry. Another factor that can impose limits on affinity is the finite size of a binding site. Once the site has been filled, increasing the size of the ligand does not lead to further increases in affinity because all the binding potential of the protein has already been exploited.

However, there is another reason that an upper limit for affinity might be observed and it has nothing to do with molecular complexity or fully exploited binding sites. Measuring very strong binding is not as easy as you might think it would be. In a conventional enzyme assay, you normally assume that the concentration of the ligand is much greater than that of the enzyme. This works well if you’ve got 10nM enzyme in the asaay and a micromolar ligand. However, things will get trickier if you’re trying to characterise a 10pM inhibitor since you’ll observe 50% inhibition of the enzyme for a 5nM concentration of the inhibitor. And you’ll see something very similar for a 1pM inhibitor…

This behaviour is well known and is called tight-binding inhibition. If you want to characterise very potent inhibitors you need reduce the concentration of the enzyme and be a bit more careful with the math. However, not everybody does this and I suspect that this may be one reason there appears to be an upper limit for affinity.

Literature cited

Kuntz et al, The maximal affinity of ligands. PNAS 1999, 96, 9997-10002. Link to free article.

Williams & Morrison, The kinetics of reversible tight-binding inhibition. Methods Enyzmol. 1979, 63, 437-467 DOI

Monday, 6 April 2009

The upper limits of binding: Part 1

<< previous || next >>

I originally intended to discuss some of the factors that impose the upper limits on binding that are observed. Unfortunately the introduction got a bit out of hand so this is going to have to be a two-parter.

It’s been a while since I said anything about ligand efficiency although its hard core enthusiasts appear to have worked the concept into something that approaches discipline status. My view is that molecules interact with their environments by presenting their molecular surfaces to those environments. Dividing the standard free energy change for an interaction by the area of the molecular surface is effectively a statement of how effectively the molecule makes use of its surface in making the interaction. I believe that this is the most fundamental measure of ligand efficiency.

Assays are not normally set up to measure standard free energy changes for binding. Ligand efficiency is frequently calculated from an IC50 rather than dissociation constant for the ligand protein complex. The IC50 that you’ll measure for competitive inhibitors depends on the concentration of whatever you’re trying to compete with. This can make comparing IC50 values for different assays risky. For example, you might run inhibition assays for two different kinases at their respective Km values with respect to ATP despite both kinases being exposed to the same intracellular concentration of ATP. If you’re using ligand efficiency to compare hits from the same assay, the distinction between IC50 and dissociation constant is not too much of an issue as long as you remember the two quantities are not the same.

Molecular surface area is not the easiest quantity to deal with if you’re looking for a quick metric with which to compare hits from a screen. You’ll need a 3D model of the molecule in order to calculate this quantity properly and that means that you’ll need to deal with multiple conformations. If you’re going to deal with multiple conformations, you need to be thinking about energy cutoffs and how many conformations you want to use to sample the conformational space of your molecule. You also need to be thinking about how to deal with surface area that is inaccessible even though it is on the molecular surface. All very messy!

A while ago, some folk at Novartis showed that it is possible to calculate molecular surface area directly from the molecular connection table which is more commonly called the 2D structure because that’s what you get when you write it on a piece of paper. It turns out that surface area is roughly proportional to the number of non-hydrogen (often termed heavy) atoms in the molecule. Counting heavy atoms involves nice, predictable integer math and is much better suited for defining ligand efficiency than all the horrid floating point math demanded by 3D structures.

My preferred measure of ligand efficiency is to divide minus the log of whatever potency measure the assay generates by the number of non-hydrogen atoms in the molecule. Because you can’t take a log of a concentration the potency measure should be divided by the appropriate units of concentration. This means that if you use different units of concentration, you’ll get different ligand efficiencies. This isn’t a problem if you’re aware of it and using ligand efficiency to compare hits from a single assay. However, it’s probably pushing it a bit to use a different concentration unit and claim that you’ve found a new ligand efficiency metric. Put another way, you can make the standard free energy of binding for a 10nM compound positive simply by using 1nM as your standard state. If you think this is a crazy idea, imagine what a molar solution of your favourite protein might look like!

Another reason that I prefer to define ligand efficiency in terms of pIC50 or pKd is that these measures of potency/affinity are unitless so that the ligand efficiency has units of reciprocal number of heavy atoms. Once you convert your potency into a free energy you need to state your energy units when you use ligand efficiency. People often don’t bother although it is unlikely that the authors of anything that I have reviewed for a journal will be presenting ligand efficiencies without having defined the appropriate units. The other reason I don’t like converting IC50 of Kd values to energies is that I believe this conveys an impression of thermodymamic rigour which is normally unjustified.

This is is a natural break point. Some what is discussed above is also presented in the AstraZeneca fragment based lead generation paper from a couple of years ago. In the next post I’ll be taking a look at some of the factors which may place upper limits on ligand efficiency.

Literature cited

Albert et al, An Integrated Approach to Fragment Based Lead Generation: Philosophy, Strategy and Case Studies from AstraZeneca's Drug Discovery Programs Curr. Top. Med. Chem. 2007, 7, 1600-1629 link

Ertl et al, Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. J. Med. Chem. 2000, 43, 3714-3717 DOI

Physicochemical properties

Fragments typically have to be screened at high concentration because they normally only bind weakly to their targets and the physicochemical property most relevant to FBDD is aqueous solubility. Both charge state and lipophilicity influence solubility in aqueous media.


Avdeef, Physicochemical profiling (solubility, permeability, and charge state). Curr. Top. Med. Chem. 2001, 1, 277-351 Link

Hydrogen bonding

Kenny, Hydrogen bonding, electrostatic potential, and molecular design. J. Chem. Inf. Model. 2009, 49, 1234-1244 DOI

Laurence & Berthelot, Observations on the strength of hydrogen bonding. Perspect. Drug Discov. Des. 2000, 18, 39-60 DOI

Kenny, Prediction of hydrogen bond basicity from computed molecular electrostatic properties: implications for comparative molecular field analysis. J. Chem. Soc., Perkin Trans. 2, 1994, 199-202 DOI

Abraham et al, Hydrogen bonding. Part 9. Solute proton donor and proton acceptor scales for use in drug design. J. Chem. Soc., Perkin Trans. 2, 1989, 1355-1375 DOI

Partition coefficients

Toulmin et al, Toward Prediction of Alkane/Water Partition Coefficients. J. Med. Chem. 2008, 51, 3720-3730 DOI

Leahy et al, Model solvent systems for QSAR. Part 2. Fragment values (f-values) for the critical quartet. J. Chem. Soc., Perkin Trans. 2, 1992, 723-731 DOI


Colclough et al, High throughput solubility determination with application to selection of compounds for fragment screening. Bioorg. Med. Chem. 2008, 16, 6611-6616 DOI

Friday, 20 March 2009

Fragment Library Design

The first law of computing (garbage in, garbage out) applies equally well to screening fragments. Selection of compounds for fragment screening is a theme that I will explore in some depth in future posts. For now, I'll just let folk know that we have already compiled some literature on the topic and that our article on this topic is now available 'online first' at the Journal of Computer Aided Molecular Design. Three of my friends and I put this together as a contribution for the special issue of this journal on FBDD so you can expect to see more articles on the subject appearing soon.

There's going to be some fragment action at the ACS. Mel will be there so look out for her.

Literature cited
Blomberg et al, Design of compound libraries for fragment screening, J. Comput.-Aid. Mol. Des., in press (online first) DOI

Sunday, 8 March 2009

RSC BMCS Fragments 2009

| next >>

It is during the opening remarks for the RSC BMCS Fragments 2009 Conference that I think to myself that Stalin anticipated the emergence of high throughput screening with his comment, ‘Quantity has a quality all of its own’. Fragment based methods can be seen as representing an attempt to re-introduce manoeuvre to a battlefield that is increasingly dominated by grim attrition and the intellectual property tar pit.

More than one speaker notes the relative maturity of FBDD although I am struck by the high mission statement to results ratio for more than one presentation. I am also struck by the relatively narrow range of targets that appear to be getting tackled. I can’t help thinking that a realistic survey of the target class scope of FBDD would not have been out of place in this program.

There are three talks that I particularly like. Rod Hubbard (University of York | Vernalis) starts his presentation with a look at some of the ‘pre-history’ of FBDD (e.g. GRID | MCSS | MSCS) and notes the increased use of surface plasmon resonance (SPR) to detect fragment binding. I am really pleased to see this ‘pre-history’ mentioned because we’ve done something similar in the introduction to our article on fragment library design that is currently in press in JCAMD. In some ways this would have been a good talk with which to start the conference rather than finish.

Mark Whittaker (Evotec) shows how biochemical assays (see lit) can be used screen fragments and stressed the need for orthogonal detection methods (e.g. NMR) to ensure that activity is real. The Evotec group use a range of fluorescence experiments to quantify and characterise binding and appear to also have significant expertise in design and maintenance of fragment libraries.

One personal highlight of the conference is that I finally get to meet Dan Erlanson (Carmot Pharmaceuticals). Dan runs the Practical Fragments blog with Teddy Zartler and he does an excellent talk on fragment-based chemotype evolution. The idea is to build a reactive group into a fragment that binds (the ‘bait’) and allow this to react with library compounds while it is bound to the protein. The target protein effectively selects the combinations of bait and library compounds that bind most strongly to target and sulfur chemistry (disulfide formation; displacement of halogen) is particularly useful. This probably reflects ease of reaction in aqueous media and similarity of reactant and product hydrogen bonding and charge characteristics when sulphur chemistry is used.

I am surprised that there are no talks on selection of compounds for fragment screening. The rule of 3 gets frequent mention and but at least my ‘efficiency metric fatigue’ is not aggravated by a presentation devoted entirely to the topic. I am refreshed by the fact that David Banner’s (Roche) talk makes no reference to ligand efficiency. David makes reference to ‘needle screening’ which represents an alternative way to think about molecular complexity, although the Roche folk never used the term when they introduced the idea in 2000. As a physical-organic chemist, I am also refreshed by Nino Campobasso’s (GSK) comment that FBDD gets folk thinking more about the molecules and, in some ways, represents a return to traditional medicinal chemistry.

The conference has its amusing moments and we particularly enjoy Molecular Simpleton’s attempts to extract information from Vicki Nienaber. No doubt they run an ‘Interrogation Styles’ course where Molecular Simpleton works and it will be no surprise if he finds himself booked on it.

The conference concludes shortly after Rod’s presentation and I leave, wondering about how broadly applicable FBDD really is. Targets that bind ATP are relatively easy to hit with fragments especially if you’re not worried about physiological ATP concentrations. A number of serine proteases can be nailed with something as small as benzamidine at concentrations that are well within the reach of standard biochemical assays. However, there are plenty of interesting targets that do not recognise adenine or arginine. I guess that time will tell and hopefully a clearer picture will emerge at Fragments 2011.

Wednesday, 18 February 2009

Substituents and complexity

<< previous || next >>

In the previous post, I noted that two Astex kinase inhibitors were derived from fragments that lacked acyclic substituents. Dan points out that this is actually uncommon and wonders if this reflects a reluctance of medicinal chemists to work on fragments that were seen to be too simple.

The presence of certain molecular recognition elements, for example hydroxyl or carboxylate, implies that at least one acyclic substituent be present. I think this it probably the main reason that fragments are normally encountered with acyclic substituents. However, I do agree with Dan that some fragments can be seen as too simple and re-iterate my point that in the Brave New World of FBDD we really need to start seeing phenyl rings as synthetic handles.

A lack of acyclic substituents typically implies the presence of one or more polar atoms in a ring or spacer. When assembling screening libraries, I do try to select compounds that present heterocyclic molecular recognition elements without acyclic substituents (e.g. 4-phenypyrazole, 2-anilinopyrimidine). Interestingly compounds like these are not as easy to source as you might think.

Saturday, 14 February 2009

Molecular complexity and extent of substitution

<< previous || next >>

Having introduced extent of substitution as a measure of molecular complexity in an earlier post, I was particularly interested by Dan's posts on AT7519 and AT9283. In each case, the screening hit used as a starting point for further elaboration lacked acyclic substituents.

You might wonder how you could impose this substructural requirement when selecting compounds for screening. This is actually very easy using SMARTS notation (Daylight SMARTS tutorial | OpenEye SMARTS pattern matching | SMARTS in wikipedia). The requirement that terminal non-hydrogen atoms be absent can be specified as:

[A;D1] 0

D1 indicates a non-hydrogen atom (A) that is connected to only one other non-hydrogen atom and 0 requires that these cannot be present in acceptable molecules. A requirement like this can be combined with a requirement for 10 to 20 non-hydrogen atoms:

* 10-20

I will discuss the use of SMARTS for compound selection in more detail in connection with design of screening libraries so think of this as a taster. I've also tried to keep things simple by assuming that hydrogen atoms are implicit which means that they are treated as a property of the atoms to which they are bonded rather than as atoms in their own right.

Tuesday, 3 February 2009

Ligand efficiency and molecular size

<< previous || next >>

Molecular complexity models of ligand binding typically predict that ligand efficiency (LE) will decrease during the process of elaborating a fragment hit. This is the basis of the fit quality (FQ) metric that Mel reviewed in her last post. It’s interesting to take a look at a 2006 article which actually pre-dates the FQ articles. This study attempted to track LE through the lead optimisation process for 18 drug leads from 15 different projects. The main conclusion of the study was that ‘a nearly linear relationship exists between molecular weight and binding affinity over the entire range of sizes and potencies represented in the dataset’. In other words, LE changed little during the optimisation process for these leads.

Comments anyone?

Literature cited:
Hajduk, Fragment-Based Drug Design: How Big Is Too Big? J. Med. Chem. 2006, 49, 6972-6976 DOI

Wednesday, 28 January 2009

Ligand Efficiency (or why size doesn't always matter)

next >>
Ligand efficiency (LE), a term that has received a lot of attention recently in the drug discovery world, is defined generally as the binding energy of a ligand normalized by its size.   Being the avid bargain shopper that I am, the concept of LE excites me, similar to the thrill I get shopping at the sales after Christmas. And as a staunch advocate of FBDD, the idea of getting the most affinity bang for your chemical buck in general appeals to me. However, the definition of LE raises several questions, the obvious being what appropriate measures of binding energy and size are (more on this later).   But perhaps a larger nagging question, though, is why this metric is useful at all or, put more bluntly, aren't bigger molecules always better?  The first large scale study to address this question was published in 1999 by Kuntz et al, where they analyzed binding data for a set of metal ions, inhibitors, and ligands containing up to 68 heavy (i.e. non-hydrogen) atoms (HAs) [1].  The bulk of the results of this study are contained in Figure 1 from the paper, with free energy of binding, derived from both Ki and IC50 data, plotted against number of HAs.  From this plot a linear increase in binding free energy of -1.5 kcal/mol/HA is observed between molecules consisting of up to 15HAs, whereupon, strikingly, the gain in binding free energy with increased size becomes negligible.  Using a larger data set, this topic was revisited in a 2007 study published by Reynolds et al.  In their study [2] binding data for 8000 ligands and 28 protein targets were utilized to probe the relationship between molecular complexity and ligand efficiency. Using both pKi and pIC50 data, a linear relationship between affinity and size could not be established (Figures 1 and 2).  However, a trend was observed between the maximal affinity ligands and their size (Figure 3).  Starting with ligands containing roughly 10 HAs, an exponential increase in affinity was observed for ligands up to 25 HA in size but, similarly to the Kuntz study, affinity values plateaued after 25 HA.  The authors then plotted LE values, calculated as either pKi/HA or pIC50/HA, against HA to show that LE values decline drastically between 10 and 25 HA (Figures 4 and 5). Since LE values are demonstrably higher on average for smaller molecules, the authors warn against using LE values to compare compounds of disparate sizes.  For such purposes they propose a 'fit quality' (FQ) metric, where LE values are normalized by a scaled value that takes size into account.  

 The logical question that arises from these studies is why do we see a precipitous decline in affinity gains after a certain molecular size?  Since ligand binding affinity is attributed largely to van der Waal interactions, larger molecules should exhibit higher affinities.  In the Kuntz study they conjecture that their findings may be attributable to non-thermodynamic effects.  In particular, the use of tight-binding high molecular weight compounds may be selected against in the pharmaceutical community for pharmacokinetic and/or pharmacodynamic considerations, resulting in a lack of these molecules in their sample set.  Entropic penalties and molecular complexity arguments also come into play here.  The authors of the 2007 study note in their discussion that the surface area of a ligand available for interaction and its heavy atom count are not correlated, suggesting that the definition of size itself may be overly simplistic.  

So, what are the implications of LE in fragment-based drug design? Expounding on the 2007 study discussed above, where fragment-sized molecules exhibited significantly increased LE values as compared to larger molecules, a new study published by the same authors [3] looked closer at the purported advantages of using fragments as starting points for lead generation.  In this study LE and fit quality values of starting fragments and optimized leads for a variety of targets were analyzed (Table 1).  Interestingly, while LE values fall off as expected with an increase in size, fit quality values remain steady or improve, suggesting that optimization from fragment leads may be more efficient.  That said, the data presented in this study is limited and should be compared to leads generated via HTS campaigns or other strategies for more validity.  Let's hope the new year brings us such studies.

Literature cited:

1.Kuntz ID, Chen K, Sharp KA, Kollman PA. The maximal affinity of ligands. PNAS 1999 96:9997-10002. Link to free article.
2.Reynolds CH, Bembenek SD, Touge BA. The role of molecular size in ligand efficiency. Bioorg Med Chem Lett. 2007 17(15):4258-61. DOI
3. Bembenek SD, Touge BA, Reynolds CH. Ligand efficiency and fragment-based drug discovery. Drug Discov Today. In press.  DOI


Tuesday, 20 January 2009

Molecular Complexity (follow up)

<< previous || next >>

Dan Erlanson, who needs no introduction in this forum, commented on the previous post. I have to agree with him that the Hann complexity model is not easy to apply in practice. It predicts that there will be an optimum level of complexity for a given assay system (detection technology + target) but doesn’t really tell us where a specific combination of molecule and assay system sits relative to the optimum.

Screening library design, as Dan correctly points out, involves striking a balance. One needs to think a bit about screening technology and the likely number of compounds that you’ll be screening. Another consideration is whether the screening library is generic or directed at specific targets or target families. I’m very interested in screening library design and expect to post on this topic in the future.

Dan notes that low complexity molecules often don’t find favour with medicinal chemists and I‘ve experienced this as well. Having structural information available gives us confidence to do something other than what a former MedChem colleague called ‘pretty vanilla chemistry’. Put another way, to make the most of the output from fragment screening, the medicinal chemist needs to be seeing a phenyl group as a synthetic handle

Tuesday, 13 January 2009

Molecular Complexity

next >>

Molecular complexity is perhaps the single most important theoretical concept in fragment-based drug discovery (FBDD). The concept was first articulated in a 2001 article by Mike Hann at GSK and has implications that extend beyond FBDD. You might ask why I think that molecular complexity is so much more important than ligand efficiency and variations on that theme. My response is that the concept of molecular complexity helps us understand why fragment screening might be a good idea. Ligand efficiency is just a means with which to compare ligands with different potencies.

A complex molecule can potentially bind tightly to a target because it can form lots of interactions. Put another way, the complex molecule can present a number of diverse molecular recognition elements to a target. Sulfate anions and water molecules don’t have the same options although you’ll have ‘seen’ both binding to proteins if you’ve looked at enough crystal structures. There is a catch, however. The catch is that the complex molecule has to position all of those molecular recognition elements exactly where they are needed if that binding potential is to be realised.

Let’s take a look at Figure 3 from the article (the success landscape) in which three probabilities are plotted as a function of ligand complexity. The red line represents the probability of measuring binding assuming that the ligand uses all of its molecular recognition elements. This probability increases with complexity but can’t exceed 1. This tells us that if we just want to observe binding, nanomolar is likely to work just as well as picomolar. The green line is the really interesting one and it represents the probability of the ligand matching one way. It is this requirement for a one way match that gives this curve its maximum. Multiply the probability of a one way match by the probability of measuring binding and you get the probability of a useful event (yellow line) which also has a maximum. This tells us that there is an optimum complexity when you’re selecting compounds for your screening library. This optimum is a function of your assay system (i.e. target + detection technology) and improving your assay will shift the red line to the left.

This molecular complexity model is a somewhat abstract and it’s not easy to place an arbitrary molecule in Figure 3 for an arbitrary assay system. I’m not convinced of the importance of a unique binding mode for fragments because one fragment binding at two locations counts as two fragment hits. This is not a big deal because relaxing the requirement for unique binding leads gives a curve that decreases with complexity and we still end up with a maximum in the probability of a useful event.

I’ve used a different view of molecular complexity when designing compound libraries for fragment screening. This view is conceptually closer to ‘needle screening’ which was described by a group at Roche (11 authors, all with surnames in first half of the alphabet) in 2000. The needles are low molecular weight compounds which can ‘penetrate into deep and narrow channels and subpockets of active sites like a fine sharp needle probing the surface of an active site’. The needles are selected to be ‘devoid of an unnecessary structural elements’. My view of molecular complexity is that it increases with the extent to which a molecule is substituted. Substituents in molecules can be specified (and counted) using SMARTS notation so low complexity molecules can be identified by restricting the extent of substitution in addition to size. I’ve prepared a cartoon graphic which shows why you might want to do this.

This is a probably a good point to stop although it’s likely that I’ll return to this theme in future posts. Before that I’ll need to take a look at Ligand Efficiency…

Literature reviewed
Hann et al, J. Chem. Inf. Comput. Sci., 2001, 41, 856–864. | DOI
Boehm et al, J. Med. Chem. 2000, 43, 2664-2674. | DOI