MonteCarbo: A software to generate and dock multifunctionalized ring molecules

Abstract MonteCarbo is an open‐source software to construct simple 5‐, 6‐, and 7‐membered ring multifunctionalized monosaccharides and nucleobases and dock them into the active site of carbohydrate‐active enzymes. The core bash script executes simple orders to generate the Z‐matrix of the neutral molecule of interest. After that, a Fortran90 code based on a pseudo‐random number generator (Monte Carlo method) is executed to assign dihedral angles to the different rotamers present in the structure (ring and rotating functional groups). The program also has a generalized internal coordinates (GIC) implementation of the Cremer and Pople puckering coordinates ring. Once the structures are generated and optimized, a second code is ready to execute in serial the docking of multiple conformers in the active site of a wide family of enzymes.

shows state-of-the-art carbohydrates structure databases and computational techniques to obtain reliable models. 22 Also, French and Johnson reported a review about the most insightful works in the matter of modeling carbohydrates. 23 In the present work, we want to focus on applying Monte Carlo (MC) techniques 24 in the conformational study of carbohydrates. This method's primary basis is to generate random changes in a saccharide structure to search a possible conformation with lower energy and repeat this procedure hundreds of thousands or millions times until ensuring a reliable sampling of the molecule's conformational space.
To show some examples, in 1993, Peters and coworkers applied this technique for the conformational analysis of four disaccharides focusing on the random sampling of the exocyclic dihedral angles (CH 2 OH, φ, and ϕ - Figure 1), 25 Dowd et al. in 2011 also add the OH groups to the Peters approach to study the opened-and closed-ring forms of carbohydrates 26 and in 2017, Zhang and collaborators combined MC and torsion-angle molecular dynamics simulations for oligo-and polysaccharides. 27 Many computational approaches were used to construct conformational free and potential energy surfaces for ring molecules. [28][29][30][31] In this article, we present an MC-based code called MonteCarbo. Its principal function is to generate conformers of multi-functionalized 5-, 6-, or 7-membered ring molecules. Afterward, the program can perform docking calculations with them into the active site of several glycosidases to test their substrate/inhibitor mimic capabilities. While the previous MC-based studies addressed only the exocyclic dihedral angles, our approach increased the versatility of such methods to include the ring's puckering as a random variable. With this cheap-and-fast approximation, we firmly believe that MonteCarbo will become a powerful tool in the field of drug design.
Applying for m = 2, 3…, (N À 1)/2. For systems with an even N, the last puckering coordinate is defined as: F I G U R E 1 Simplified representation of a disaccharide and the rotamers present on it where z j are the normalized coordinates of the N atoms of the ring using their geometrical center as the origin of coordinates.
For a consistent description of the Cremer and Pople coordinates, the ring members are tagged from 1 to N. In this work, the anomeric carbon is indicated by the number 1 and the heteroatom by the number N (X in Figures 2-4). The rest of the members have to be connected consecutively. In case that the ring has no heteroatom, the selection of the first and last atoms will depend on the user's choice.
The conformational space of a 5-membered ring formed by 20 canonical conformers is described by q 2 and ϕ 2 ( Figure 2).
Envelops-E-have four atoms on-the-plane and one only atom above or below the plane. Twists-T-have three coplanar atoms and two consecutive atoms on opposite sides of the plane. 32 The threedimensional (3-D) conformational space of 6-membered rings is defined by q 2 , q 3 and ϕ 2 . However, the scientific community uses the Q, ϕ, and θ polar coordinates and project them into a Mercator representation ( Figure 3). More details about the terminology and symbolism of the different 38 conformers are already described by IUPAC in reference 33.The complex (q 2 , q 3 , ϕ 2 , ϕ 3 ) 4-Dimensional conformational space for 7-membered rings can be simplified to a 3-D representation and divided into three (ϕ 2 , ϕ 3 ) planes at q 3 = 0.6 (Twist-Chair/Chair plane), q 3 = 0.0 (Twist-Boat/Boat plane) and q 3 = 0.4 (Sofa/Twist-Sofa/Sofa-Boat). 34 For clarity and practical reasons, the q 3 = 0.4 plane is not depicted in Figure 4.
It is worth mentioning that the q 3 = 0 plane presents a harp distribution ( Figure 4, down) where the twist-boat (TB)-green strings-and the boat (B)-gray strings-conformations are described by a given value of ϕ 2 while a change in ϕ 3 does not affect to the structure (more details in Supporting Information).
In case the bond distances and angles are known, N À 3 endocyclic dihedral angles are necessary to construct and define a specific conformation for an N-membered ring. In this article's following point, this geometrical property will be used to establish a random selection of conformers.

| GICs-based puckering code for Gaussian 16
The main idea of this work is to develop an algorithm to pick a random conformation from a group of structures. This follows a three-stepped F I G U R E 2 Conformational space for five-membered rings F I G U R E 3 Mercator representation of the conformational space for sixmembered rings pathway: generating the conformers by changing their puckering coordinates, creating a database with the endocyclic structural information, and selecting randomly the array of endocyclic dihedral angles corresponding to a unique conformation. For the first step, we present a strategy based on performing scan calculations with the Gaussian 16 software. 35 The main initial problem was that Cremer and Pople's mathematical expressions are not implemented as generalized internal coordinates (GICs) in the quantum mechanics code.
However, the last version of Gaussian includes adding and defining homemade GICs using the most common mathematical operators.
Following the recipe described in reference 31 and starting from the x, y and z (Cartesian) coordinates of the N atoms of the ring, the code calculates the center of geometry (XCntr, YCntr, and ZCntr functions). It recalculates the N atoms' new coordinates using the center of geometry as the origin of coordinates. After that, employing simple mathematical operators and the function SQRT, the code obtains the values of A m , B m , and q N/2 (for N = 6) defined by Equations (1)-(3).
At the time to get the values of q m and ϕ m for N = 5 and N = 7 and Q, ϕ, and θ for N = 6, one technical problem appeared: Gaussian does not have the function arctan defined in its code and its necessary to use it to get the puckering phases: By a trigonometric relationship between the arctan and the arccos functions, we can transform the Equation (5) into: However, this conversion only defines the interval [0,π] of ϕ m . To solve this problem, we observed that the function A m is antisymmetrical at ϕ m = 0. So, to define ϕ m in the interval [0,2π], we use the following expression: where ε = 10 À6 avoids a division by zero when A m = 0.
In the case of N = 6, the polar coordinates are calculated as follows: One of the limitations of the mathematical interface of Gaussian to define GICs is the absence of periodicity. For this reason, the phase F I G U R E 4 Conformational subspaces for seven-membered rings at q3 = 0.6 (up) and q3 = 0.0 (down) puckering coordinates present problems when their values are close to 0 or 2π. Although we could able to explore around 99% of the conformational potential energy surfaces of 5-, 6-, and 7-membered rings.
The reader can find these codes in the files puckN.gic (for N = 5, 6, and 7) of the MonteCarbo distribution. The distances, angles, and exocyclic dihedral angles are available in the files with extension *.var of the MonteCarbo distribution.

| Random number generator
MonteCarbo generates a z-matrix Gaussian input with the structural information of the molecule of study. However, there are at least N À 3 endocyclic dihedral angles that will change from one structure to another. If the molecule contains rotamers like OH, CH 2 OH, and so on, the number of random degrees of freedom increases.
Inspired by Vilaseca and coworkers' work, 40

| MonteCarbo script
MonteCarbo is a bash script that generates a z-matrix Gaussian input model of a neutral, multi-functionalized (5, 6, or 7)-membered ring molecule. The script requires some information as an input to construct the model and the random replicas ( Figure 6).
The script combines the information and power of the N-x-H-y-X-D#.txt, rangen*.f90 and puckN.gic files to develop a calc.gjf input file.
Depending on the user's selection, the final output of the required structures can be a Gaussian input, a PDB or an XYZ file (Open Babel 41 is required for the conversions).
The main limitation of MonteCarbo is that in extreme multifunctionalization cases with voluminous groups, the code generates structures with steric hindrance or overlapping. For instance, in case our molecule presents two neighbor CH 2 OH groups, the generation of random conformers of it will lead into some structures where the OH groups overlap or cross the same point in the space.
The code is free to download in https://github.com/drsalonsogil/ montecarbo and a README file is available with further information.

| MCdock: testing the substrate/inhibitor role of the monosaccharide in glycosidases
MCdock is another bash script that prepares the generated and/or and geometries are on pages S88-S100 of Figures S22-S26).

| Simple case: 2-hydroxy-tetrahydrofurane
After generating 500 conformers of 2-OH-C 4 H 7 O using MonteCarbo and representing the overlap between the different obtained structures with PyMOL 2 the result is shown in Figure 8.Using a simple case, we can easily observe how the code chooses between different conformers of a 5-membered ring and the different orientations of the hydrogen atom of the hydroxyl group present in the molecule.
Furthermore, due to the conversion from z-matrix to Cartesian coordinates, the structure's first carbon is always in the origin of coordinates. The second atom is still at the same distance (and at the same position). The other ring members form a continuous rainbow due to the proximity between the structures over the conformational energy surface.

| Test case: α-D-glucose
As performed with the previous simple case, 500 conformers of α-D-glucose were generated executing MonteCarbo and the resulting overlap is represented in Figure 9.Compared with the previous structure, we observe a hydroxyl group whose oxygen remains in the same position in the center of the image. Its hydrogen position takes a continuous of positions due to the random assignation of the H O C C dihedral angle between À180 and 180 by the pseudo-random number generator.
F I G U R E 6 Workflow for the execution of the MonteCarbo script. The authors strongly recommend using an input.dat file to perform parallel jobs and increase the projects' efficiency (more details in the README file). Depending on the user's will, MonteCarbo will generate $calc Gaussian Z-matrix input, $calc PDB or $calc XYZ coordinates files ($calc is the number of conformers requested by the user) F I G U R E 7 Workflow for the execution of the MCdock script (more details in the README file). This process' output(s) will be the output obtained by the multiple AutoDock Vina calculations 43 3.3 | Docking: a 7-membered ring mimics mannose In the experiment with the septanoside, a 3,4 TC 5,6 conformation is observed(PDB 5CGB), while the experiment with mannose showed a 4 C 1 conformation (PDB 4BUQ). 45 Furthermore, a computational analysis of the 1-hydroxymethyl-α-D-glycero-D-idoseptanoside shows the 3,4 TC 5,6 conformation as the most stable conformation. 46 Also, after analyzing the 50 structures of the hydrolyzed α-D-glycero-D-idoseptanoside, the most stable conformation is the 3,4 TC 5,6 (more details in Supporting Information). Then, we can conclude that the (GH38, GH76, GH92, and GH125, Figure S27).

| CONCLUSIONS
MonteCarbo is an easy-to-use computation-friendly software able to model and to dock multi-functionalized monosaccharides. Being an We have demonstrated the power of the provided codes in terms of quick-and-cheap structure generation and the relevance of the obtained results testing new substrates and inhibitors for carbohydrate-active enzymes.
As a limitation, the program does not have any internal mechanism to decipher if a structure will be physically reliable or to avoid chemical changes during the optimization process. These processes require a postanalysis to confirm and delete incorrect configurations. The scientific community strongly recommend that docking calculations are performed with optimized ligands. MonteCarbo has the RIGID option keeping the conformation fixed during the optimization, in case the conformation of interest is not a minimum in the conformational potential energy surface.