Transforming Computed Energy Landscapes into Experimental Realities: The Role of Structural Rugosity

Abstract We exploit the possible link between structural surface roughness and difficulty of crystallisation. Polymorphs with smooth surfaces may nucleate and crystallise more readily than polymorphs with rough surfaces. The concept is applied to crystal structure prediction landscapes and reveals a promising complementary way of ranking putative crystal structures.


S1. Particle Rugosity Method
The algorithm used to compute the normalise crystal rugosities is broken in four steps in figure S1. This algorithm was coded in python making use of the functionalities available in the CSD python API. Figure S1. Steps coded into the crystal rugosity algorithm.
Step 1. The crystallographic information of the structure of interest (with a given Refcode) is retrieved from the CSD (or can be loaded via a cif file). If the molecule has missing hydrogens, these are added using the standard procedure implemented in the CSD Mercury.
Step 2. The BFDH morphology is generated from the crystallographic information of the structure of interest using the python API with its standard settings. With this, the (hkl) values of the important BFDH faces are extracted as well as the BFDH areas for each (hkl). The morphological importance of each (hkl) face is then calculated as w (hkl) =Area (hkl) /Area BFDH where Area (hkl) is the area of an individual (hkl) face and Area BFDH is the total area of the BFDH morphology. The d-spacings, d (hkl) , for those faces are then computed using the method implemented in the CSD python API. If possible, faces where grouped into families making use of crystal symmetry to reduce the number of rugosity calculations since faces of the same family {hkl} will have identical topologies and rugosities.
Step 3. For each of the relevant BFDH (hkl) faces, the minimum value of rugosity was calculated in the following way.
-A number of slices (Nprec) for the same (hkl) face were generated. The original slice was generated at the (hkl) plane. Following slices were generated by shifting the origin of the (hkl) slice by (dhkl/N), Nprec times where N is 2, 3,…, Nprec. For example, for a (001) plane with d -The Bryant, Maloney and Sykes 1 method was used to generate two consecutive slices. A slicing thickness of 14 Å was employed (the standard is 10 Å but this was found to be too small for some large molecules) together with a width of 30 Å and a number of shifts (see above), and the R (hkl) depth for each slice (and shift) calculated. -For each (hkl), the computed rugosities at the different shifts were then compared and the minimum value (in absolute terms) was then taken as the true value of the rugosity parameter for that specific (hkl) face. We refer to this parameter as !"#$%&'() ()+,) .
-An example of the impact of the shift is given in Figure S2 for Paracetamol form I, with refcode HXACAN07. The smoothest surface (in this case the face cut at the origin) would be preferably shown in the crystal morphology since it is most likely to have the lowest surface energy thus be displayed in the morphology. Figure S2. Impact of the slice origin shift for the (011) face in paracetamol form I (HXACAN07). In this case, the -1.43 Å is taken as the value for the (011) face since it is the smoothest surface. The slice of interest is limited by the black line. The layer of molecules in the slice above are coloured in blue to illustrate the degree of molecular interpenetration.
Step 4. The normalised crystal rugosity " %&'() . is then calculated as, To illustrate the impact of the number of shifts explored in the computation of the rugosities, the overall " %&'() . was calculated as a function of the number of shifts (Nprec) for polymorphs I and II of paracetamol and polymorphs II and IV of theophylline (see Figure S3). The value of Nprec=4 was found to be a good compromise between accuracy of the computed rugosity and the required computational time. Figure S4. Impact of NPrec on the calculated particle rugosity for paracetamol forms I & II (HXACAN07, HXACAN08) and theophylline forms II & IV (BAPLOT06, BAPLOT03).

S2. Polymorphic Dataset and rugosity histograms
Conquest was used to retrieve crystal structures of organic compounds from the CSD 2 (version 4.39 Nov 2017) containing only one chemical component and the most common elements in molecular crystals (H, D, C, O, N and halogens). This resulted in 187 895 crystal structures. A python script was then written making the use of the CSD python API to analyse these structures. With the script, each refcode family was analysed and all crystal structures in each refcode family were compared using the COMPACK algorithm 3 (with the standard settings and a clusters of 20 molecules). This allowed for removal of crystal structure redeterminations and for the splitting of the dataset in a Monomorphic and Polymorphic dataset. The polymorphic dataset taken forward for rugosity calculations contained 5611 crystal structures belonging to 2559 refcode families. Of those 5611 crystal structures, we note that only 198 of them were solved from Powder Solution methods whilst the overwhelming majority were solved from single crystal X-Ray diffraction methods (5413 structures). Figure S5 shows the distribution of normalised rugosities for all crystal structures (left) and the maximum rugosity difference between polymorphic families (right).

Figure S5
Distribution of normalised rugosities for all crystal structures (left) and the maximum rugosity difference between polymorphic families (right).

S3. Polymorphic systems which are easy vs. hard to crystallise
We defined a list of pair of polymorphs that are easy or difficult to crystallise. The choice of the category "easy" or "difficult" is made on the basis of the experimental data availability in the literature. We define "difficult", any compound for which a direct crystallisation from solution is not straightforward. This definition might encompass all those polymorphic forms that can be only obtained by solid-solid transformations, solvent-mediated transformations or simply by solvent removal from a solvate form.
Other examples might be systems that can be crystallised only using a specific solvent or polymorphic forms promoted by impurities or particular conditions (late appearing polymorphs). In the following paragraphs we summarise previous experimental results for each compound reported in table S1.

S3.1. Aspirin
Aspirin (acetylsalicylic acid) is perhaps one of the most popular nonsteroidal anti-inflammatory drugs. It is used in the treatment of inflammation, pain, fever and, at low doses, to prevent different heart conditions. It was synthetised more than 150 years ago to improve taste, gastric tolerability and pharmacological properties of salicylic acid. Aspirin was for long time assumed to exist only as one crystalline form (Form I), though, during the 70s, some studies based on the observation of differences in melting points, dissolution and crystal habits, suggested the possibility that aspirin could show polymorphism. With the development of CSP methods, polymorphism of aspirin became a target of different studies. In 1995 Gavezzotti 5 proposed a tentative polymorph using a structure-construction procedure. Few years later, Payne et al. 6 showed that a planar conformation of aspirin could be plausible and suggested that experimental conditions promoting the stability of this conformer could lead to new polymorphs. Only form I can be crystallised with conventional methods; form II can be obtained in its pure form only seeding solutions of aspirin with 15% wt of its anhydrate. The remaining two polymorphs can only be obtained at high pressure and from the melt respectively. The fact that form I is easier to crystallise than form II, seems to be consistent with its smoother nature ( " %&'() .
[II] = -0.121). However, we note here that the structures of the crystals and the surfaces are so similar on so are the rugosities.

S3.2. Dapsone
Dapsone (4,4'-diaminodiphenyl sulfone) is an API used in the treatment of leprosy in combination with rifampicin and clofazimine. Dapsone is known to crystallise as five polymorphic forms (forms I-V) and solvates including an unusual non-stechiometric hydrate that by de-solvation produces an isomorphic dehydrate (Hydehy). [13][14][15][16] The stability order of I-V, determined on the basis of lattice energy calculations and confirmed by calorimetric measurements, is the following: V (most stable) > III > Hydehy > II > I > IV (least stable). Form V is a late appearing polymorph recently discovered 4 and corresponds to the thermodynamically stable form in the range absolute zero-90 °C. Above 90 °C form II is more stable and, at higher temperatures, form I becomes the most stable. Polymorphs II/III, III/I and II/I are enantiotropically related. 4 Form I is the high temperature form, it can be obtained by annealing the melt at temperatures above 170 °C. Desolvation of solvate forms below and above 75 °C results in form III and form II respectively. In both cases, the product also presents some impurities of form I. Crystallisations by slow evaporation or slow cooling from solvents, excluding those forming solvates, produce form III. Form IV cannot be crystallised form solvents but can be produced concomitantly with form III, by annealing quench-cooled melts of dapsone at 50-70 °C. Form V can be obtained form by slurring any of the polymorphic forms in non-solvating solvents, in the range of temperatures 10-90 °C.
Before the discovery of the late appearing form V, form III was indicated as the thermodynamically stable form at room temperature. 17 Form III shows indeed a remarkable kinetic stability and proved to be stable at ambient conditions for approximately 40 years. This, together with the slow nucleation and growth rate of form V, delayed the discovery of the late appearing polymorph V. 4 Although from V proved to be difficult to crystallise when compared to III, the values of rugosity of both forms (table S1) were found to be very similar ( " %&'() .

S3.3. Axitinib
Axitinib is an API developed by Pfizer for the treatment of carcinomas. Polymorphism of axitinib has been extensively investigated both experimentally 18,19 and computationally 20,21 . To date, 71 different forms including anhydrous, solvates and hydrates have been reported in the literature 19 . Five polymorphic forms of axitinib have been discovered (I, IV, VI, XXV and XLI) 18,19 . XLI is the most stable form and is monotropically related to the other polymorphs. IV, VI and XXV are enantiotropically related. Form I is the least stable form and transforms into the monohydrate IX at high relative humidity. On the basis of Differential Scanning Calorimetry data, solubility measurements and Burger's rule, the order of stability is the following: XLI > XXV ≈ VI > IV > I. This order does not correlate with the calculated densities. This discrepancy has been ascribed to the conformational flexibility of axitinib 19 . Due to the high propensity of axitinib to form solvates, the isolation of anhydrous phases is difficult. Standard crystallisations mostly produce solvates. In the majority of these solvates, desolvation temperatures are higher than the boiling point of the solvent. Desolvation of solvates produces form IV and form XXV. The most stable form XLI can be obtained only applying a targeted procedure consisting of slurring solvates at high temperatures 19 . In the case of Axitinib none of the polymorphs can be nucleated directly from solution. In this respect, polymorphs of Axitinib should be all classified as "difficult" to crystallise. The measured rugosity for Axitinib polymorphs shows that the thermodynamically stable form XLI is less smooth when compared to XXV ( " %&'() .
S3.4. pABA para-aminobenzoic acid (pABA) represents a popular model compound for solid state studies. Previous work on pABA covers several aspects of organic crystalline materials field, such as solubility measurements, nucleation and growth kinetics, crystal morphology, 22,23 polymorphism 24,25 and cocrystals. 26,27 In particular, polymorphism of pABA has been extensively investigated in recent years. pABA is known to crystallise as four polymorphic forms. Two of them, α and β are very well known and have been the object of most of studies cited above. Recently, two new forms γ 28 and δ 29 have been isolated by crystallisation in the presence of impurities and at high pressures, respectively. α and β are enantiotropically related, with a transition temperature at 13.8 ˚C. β is the most stable form below this temperature, at higher temperatures, α becomes the most stable. Previous studies show that on heating, β transforms endothermically to α in the range of temperatures 70-100 ˚C. At high pressures, α transforms to δ. More details on the four polymorphs, including descriptions of morphology, kinetic of nucleation and growth, crystal structures and crystal structure prediction are also summarised in a recent paper by  Differently to α, that can be easily crystallised from several solvents and supersaturations, 25 β can be only obtained at specific experimental conditions. Previous studies suggest that β can be consistently obtained from water at supersaturations in the range 1.1-148. Alternative routes, involve slurring α below 14 ˚C or by controlled sonication. 30 A recent work by Nagy 31 shows that the application of a semibatch supersaturation control (SSC) strategy for cooling crystallisation consistently produced β-pABA.
Like the previous cases, α and β-pABA represent a suitable model system for pair of polymorphs that, on the basis of the criteria described above, can be considered as easy and difficult to crystallise. This is consistent with the observation that α is smoother than β ( " %&'() .

S3.5. 5-BrAspirin
Differently to the parent molecule aspirin, there is no much information on 5-bromoaspirin (5-BrAsp) in the literature. The crystal structure of 5-BrAsp (form I) was reported for the first time by Hursthouse et al. 32 as a part of a larger study on the systematic structural comparison of several substituted derivatives of aspirin. Screening from different solvents always produced the known form I. A new polymorph (form II) was reported by Bond et al. 33 in a study on the influence of impurities in promoting polymorphism. Crystallisations from common solvents always produce form I. Form II can only be obtained in the presence of impurities of 5-bromoaspirin anhydrate. Slurring form II in organic solvents always produces form I as the result of a solvent-mediated transformation and indicating that, in the range temperatures used for these experiments, form I is the thermodynamic form. 33 The fact that form II requires nonconvetional procedures to be isolated seems to be consistent with its rougher nature when compared to form I ( " %&'() .

S3.6. Carbamazepine
Carbamazepine represents another popular model compound for polymorphism studies. It is used as antiepileptic drug and as analgesic for the treatment of trigeminal neuralgia. Over 50 different solid forms of carbamazepine have been reported so far. 34,35 These include several solvates, hydrates, co-crystals 36,37 and five polymorphic forms (form I-V). 35,[38][39][40][41] Much of the work reported in the literature is focused on forms I-IV. Form V was only recently discovered 35 and differs to I-IV for the absence of the centrosymmetric dimer, which is a robust structural feature in CBZ crystal structures. Form V represents indeed the first example of polymorph of carbamazepine showing catemeric supramolecular arrangements and, accordingly, it was experimentally isolated applying a computationally assisted strategy, and using the catemeric structure of dihydrocarbamazepine as template.
Forms I-IV are often named according to their crystalline system as follow: form I (triclinic), form II (trigonal), form III (p-monoclinic) and form IV (c-monoclinic). Previous analysis based on density and thermochemical data suggests that form III and form I are enantiotropically related, with form III the thermodynamic stable form from room temperature to 70 °C. Form III is also related enantiotropically to form IV. There is no agreement on the relationship occurring between form IV and form I. Grezsiak et al. 38 described this pair as enantiotropically related. Mielck 42 suggests they are monotropically related. As pointed by Harris et al. 43  [III] = -0.125)., 45 while form V 35 is the only polymorph that requires a specific protocol involving seeds of the templating dihydrocarbamazepine ( " %&'() .

S3.7. Paracetamol
Paracetamol (also known as acetaminophen or tylenol), represents one of the most widely used antipyretic and analgesic drugs. It exists as different polymorphic forms and related forms. 46 Among these, only crystal structures of four polymorphic forms have been solved and refined so far: form I (monoclinic), 47 form II (orthorhombic), 48 III (orthorhombic) 49 and the recently discovered form III-m (monoclinic), 50 obtained from powder diffraction data and later renamed as form VI. 51 Recent studies have also suggested the existence of two high-pressure polymorphs (IV and V) 52 and three more ambient polymorphs (VII, VIII and IX), obtained from the melt. 51 Form I and form II have been extensively investigated and characterised. Form I is the most stable polymorph under ambient conditions, it can be easily crystallised from different solvents. It shows poor compressibility, due to the absence of slip planes in its crystal structure and consequently it requires the uses of additives for tablet preparation. Form II shows better compressibility and higher solubility in water; however, it is metastable and tends to transforms to form I. 53 Furthermore, crystallisation of form II from solution is difficult. It was first crystallised by slow evaporation from ethanol. Recent attempts to obtain form II with this procedure were unsuccessful. A high-throughput polymorphism study by Peterson et al. 54 investigated the polymorphism of paracetamol by thermal cycling in different solvents and at different supersaturations, obtaining form II only using specific experimental conditions. Recent attempts to produce form II involved the use of different procedures. A summary of them is reported in a recent review by Cruz et al. 55 The comparison of the measured rugosity for form I and II shows that form I is smoother than form II ( " %&'() .

S3.8. Rotigotine
Rigotigone is a dopamine agonist known since the 80s and now used in the treatment of Parkinson's disease and restless legs syndrome. 56 It crystallises as two conformational polymorphs, form I and II respectively. Form I was the only known form until a more stable polymorph, form II, was discovered. As in the case of ritonavir (see below), the discovery of this late appearing polymorph, caused in 2008 the withdrawal from the market of the original formulation. 57 The discovery of form II was not expected, no evidence of this more stable form was observed in the pre-formulation screening. Low temperatures limit the growth of form II. However, as shown by Rietveld et al. 57 form I is monotropic and metastable with respect form II in the whole temperature-pressure domain. A recent CSP study predicted a third polymorph of rigotigone lying between form I and form II. 58 Form I can be considered as "easy" to crystallise ( " %&'() . I = -0.402). Conversely, the late appearance of the most stable form II can be seen as an indication of a phase "difficult" to crystallise ( " %&'() . II = -0.372).
In this case, the values of the rugosities are reversed with the form harder to crystallise being smoother.

S3.9. Curcumin
Curcumin is major constituent of the turmeric spice. It is a poorly water-soluble compound with antimicrobial, anti-inflammatory and anti-oxidant properties and potential clinical efficacy towards several diseases. 59,60 Several aspects of the solid state properties of curcumin have been investigated, a summary of these results can be found in a recent paper by Suresh and Nangia. 59 To date, three polymorphic forms, 61,62 four co-crystals 63 and one chloroform solvate 64 of curcumin have been isolated. The first structure (form I) has been solved in 1982 by Tonnesen et al. 61 and corresponds to a monoclinic (P2/n) structure. More recently two new orthorhombic polymorphic forms (form II: Pca21 ; from III: Pbca respectively) have been isolated by Sanphui et al. 62 In all the structures, the molecule exists in the b-keto-enol tautomer form. Form II and III adopt a planar conformation, while in the case of form I this is slightly twisted. Accordingly, the three polymorphs of curcumin have been indicated as a case of tautomer and conformational polymorphs. 59 Form I is the most stable polymorph, form II and III transform to I before melting. The stability order is the following: I > II > III. 62 Form I can be easily crystallised at room temperature from several common solvents. This is consistent with the fact that its measured rugosity is lower when compared, for example, to form II ( " %&'() .
[II] = -0.288). Crystallisation of form II and III is indeed more difficult.
They were initially obtained as the result of cocrystallisation in the presence of 4-hydroxypyridine in EtOH. Further studies [65][66][67] suggest that crystallisation of form II and III can be induced by using ultrasounds and specific additives.

S3.10. Theophylline
Theophylline is a methylxanthine used for the treatment of asthma. It exists as four polymorphic forms (I-IV), a monohydrate, a DMSO solvate and several cocrystals. [68][69][70][71][72][73] Theophylline is a widely studied model system in solid state chemistry. Most of the work previously reported, focus on hydration-dehydration behaviour and cocrystal formation. There are also some works focusing on the polymorphism of theophylline and on the characterization, structural determination crystallisation and stability of its polymorphic forms. 74,75 To date, only crystal structures for form I, 76 form II, 77 form IV 76 have been determined. Until the discovery of form IV, Form II was for long time considered as the thermodynamically stable form. Form II, is metastable and is enantiotropically related to form I. It undergoes transformation to this form at about 232 ˚C. All the polymorphic forms of theophylline convert to the monohydrate M in contact with water or at high relative humidity. 74,78 Dehydration of the monohydrate produces either form II or III; form III then transforms to form II. The stability order for these forms has the following order: Form IV > Form II > Form I; form III is highly metastable. 76 Form II is the kinetic product of crystallisation. All the transformations relating the known polymorphs of theophylline are mediated by form II. Form II can be easily crystallised from non-aqueous solutions at room temperature. Form IV is the thermodynamically stable form; it can be obtained only by solvent mediated transformations. Its late discovery was recently explained on the basis of thermodynamic and structural features. 76 In the context of our study, form IV represents a system "difficult" to crystallise whereas crystallisation of form II is "easy " ( " %&'() . II = -0.025, " %&'() . IV = -0.161).

S3.11. Ritonavir
Ritonavir represents one of the most famous examples of disappearing polymorph. 79,80 It was developed at Abbott laboratories in 1992 as protease inhibitor for HIV and marketed as a semi-solid capsule formulation. Few years after the introduction into the market, due to the appearance of the more stable polymorph II, not observed during the development and manufacturing steps, some lots failed solubility tests and the product was removed from the market. 81,82 The new form II showed a significantly lower solubility and a different molecular conformation. 83 After the discovery of form II, the crystallisation of form I was difficult becoming a case of disappearing polymorph. Form I crystallises as lath-shaped crystals and the molecule adopts a trans conformation around the carbamate torsion angle. Form II adopts the less favourable cis configuration and crystallises as needle-shaped crystals. Form II is the most stable and has higher melting point and heat of fusion if compared to form I. 79 Although form II is the thermodynamically stable form, its crystallisation is more difficult and requires the presence of seeds, impurities or high supersaturations, due to the fact that in order to nucleate it requires the formation of the less stable cis-isomer. In particular, the first appearance of form II was ascribed to a degradation of ritonavir into a cyclic carbamate product, able to template the crystallisation of the cis-isomer. 82 Accordingly, form II represents a case of polymorph difficult to crystallise, being not observed during development and manufacturing steps and resulting in a late appearing polymorph. Except in the case of the presence of seeds of form II, crystallisation of form I is initially favoured, classifying this form as easier to crystallise than form II ( " %&'() .
[II] = -0.265). Beyond the surface rugosity difference, here the key issue is also the conformational change required. Form II has a higher metastable conformer and this is also linked to its difficulty of nucleation and growth.
A more recent high-throughput polymorphism screening identified a new metastable polymorph (form III), one hydrate form and a solvate. 84

S4.1. Glycine
Glycine is known to crystallise as three ambient polymorphs α-, band g-glycine. [85][86][87] Recently, other three forms (named d, e and x respectively) have been obtained at high pressure. 88 For the purposes of this work, we limit our discussion only on α-, band g-glycine. The three ambient polymorphs of glycine crystallise all in the zwitterionic form; α-glycine packs as centrosymmetric dimers, while band g-glycine adopt non-centric arrangements. The stability order at ambient conditions is as follow: g (more stable) > a > b (least stable). 89 b-glycine is unstable, it rapidly transforms to the a polymorph in the presence of humidity or in aqueous solutions. 86 On the contrary, the solvent-mediated transformation of a-glycine into the g polymorph is slow. a-glycine can be easily crystallised from aqueous solutions while b-glycine requires the presence of water-alcohol solutions. 90,91 Crystallisation of g-glycine from aqueous solutions is difficult and it can be obtained by using tailor-made additives or working at pH different than the isoelectric point 92,93 crystallisations at the natural isoelectric pH (5.97) always produce a, despite the higher stability of g. 94 A recent work by Vesga et al. 95 shows that g-glycine can be obtained at in the absence of stirring from highly supersaturated aqueous solutions and suggests that stirring might result in a preferential crystallisation of α-glycine. However, in most of the previous studies a-glycine proved to be easy to crystallise while the g polymorph, generally requires specific conditions for its crystallisation, so it can be defined as difficult to crystallise. 96 Table S2. Calculated " !"#$% & (Å) for α-, β-and g-glycine.
Glycine Stability order*

REFCODE
Form GLYCIN82 g -0.0873 *1 being the most stable and 3 the least stable.
Calculated rugosity values show that the β and α are smoother than γ (see Table S2), consistent with the easiness of crystallisation. The β despite being the least stable, has the smoothest surface which might favour the kinetics for its nucleation and observation. The g polymorph, on the contrary, is the most stable but also the one hardest to crystallise (with the rougher surfaces).

S4.2. Acridine
Acridine is known to form eight polymorphic forms and one hydrate (form I). Experimental work published over the last few years is summarised in a recent review by Stephens et al. 97 In particular, results from previous works shows that the eight polymorphs of acridine can be often obtained applying similar or identical experimental conditions. In some cases, same crystallisation conditions lead to mixtures of concomitant polymorphs. Form II, III and IV have been all prepared by sublimation under vacuum 98 and by evaporation from several solvents. [99][100][101] Form IV was also prepared concomitant with VII by evaporation in the presence of terephthalic acid. 97 Values of calculated rugosity show that II and III are smoother than IV ( " %&'() .
Forms VI and VII have been discovered during a study of the influence of dicarboxylic acids in solution during crystallisation of acridine, 99 suggesting that the isolation of these two forms might require specific experimental conditions ( " %&'() .
[VII]=-0.160). However, other studies showed that VI and VII can be also obtained by quenching molten acridine, form VII was also obtained by crashcooling acetone solutions. 101 . Form IX was only discovered recently and crystallised by slow evaporation from toluene. 97,102 In some case it was concomitant with form III and VII. The fact that it was never observed during experiments mentioned above might suggest that crystallisation of form IX is difficult ( " %&'() .
[IX]=-0.204) and we note that it is also the form with the largest rugosity. The fact that form II was only recently obtained might be explained considering that, even if it nucleates faster, its observation could have been prevented by its quick transformation into I. In 1934 Kofler and Geyr observed two polymorphs of coumarine and, on the basis of optical measurements, identified one of them as monoclinic. 106 All forms III, IV and V have larger rugosity values than forms I and II, consistent with the fact that they are also harder crystallise (table S3).  [107][108][109] and computational studies. [107][108][109][110][111] Due to its conformational flexibility and to the different degree of conjugation between the aromatic rings in the molecular skeleton polymorphs of ROY adopt different colours (red, orange and yellow). Eleven polymorphic forms (Y, ON, R, OP, YN, ORP, Y04, YT04, R05, RPL and P013) have been so far isolated, eight of which (Y, ON, R, OP, YN, ORP, YT04 and R05) have been fully structurally characterised by single crystal X-ray diffraction. 108,112,113 Recently, the crystal structure of PO13 was obtained from PXRD data. 109 Most of ROY polymorphs can be obtained concomitantly from the same liquid and are kinetically stable at ambient-conditions. 107 Among the nine structurally characterised polymorphs, only R, Y, ON, OP, YN, YT04 and ORP have been directly obtained by crystallisation from solutions. R was obtained from ethanol concomitantly with Y. 113 ON and YN were produced by fast crystallisations, OP was obtained concomitantly with ON by seeding experiments. 112 YT04 was initially obtained as the product of a transformation from Y04; single crystals have been then isolated by seeding a saturated solution of ROY with YT04. 114 The most stable polymorph Y was obtained by solution mediated transformation of the other polymorphs at room temperature. 112 The recently discovered R05 was obtained from the melt by crossnucleation on Y04. like R05, PO13 was obtained from the melt during a variable temperature PXRD experiment, starting from YN. 109 The calculated rugosities (see Table S5) suggests that YN, R and ON are smoother polymorphs when compared to the others which is consistent with the fact that they are also easier to obtain.

S5. CSP Methods
The computer programme GRACE was used to generate an accurate lattice energy landscape of diflunisal and compound X. GRACE uses a dispersion-corrected density functional theory method (DFT-D) that combines calculations with the PBE functional in VASP. Every crystal structure prediction starts by fitting a tailor-made force field (TMFF) to DFT-D reference data. The actual crystal structure prediction is a convergence-controlled three-step procedure, executed separately for one and two molecules per asymmetric unit. In the first step, a large number of crystal structures are generated with a Monte Carlo parallel tempering algorithm using the tailor-made force field. In the second step, some of these structures are subjected to a coarse lattice energy optimisation at the DFT-D level. In the final step, a small number of structures are subjected to a more stringent DFT-D lattice energy optimisation. DFT calculations use a plane wave cut-off energy of 520 eV and a k-point spacing of roughly 0.07 Å −1 . All lattice energy minimisations of the final step are converged to within at least 0.003 Å for atomic displacements, 0.00025 kcal mol −1 per atom for energy changes, 0.7 kcal mol−1 Å−1 for atomic forces and 1.0 kbar for cell stress. In the second step the lattice energies are converged to within at least 0.02 Å for atomic displacements, 0.001 kcal mol−1 per atom for energy changes, 7.0 kcal mol−1 Å−1 for atomic forces and 15.0 kbar for cell stress. The convergence criteria of the final step were applied when performing the lattice energy minimisations in the disorder models.