Raman spectroscopy and one‐dimensional convolutional neural network modeling as a real‐time monitoring tool for in vitro transaminase‐catalyzed synthesis of a pharmaceutically relevant amine precursor

Raman spectroscopy has been used to measure the concentration of a pharmaceutically relevant model amine intermediate for positive allosteric modulators of nicotinic acetylcholine receptor in a ω‐transaminase‐catalyzed conversion. A model based on a one‐dimensional convolutional neural network was developed to translate raw data augmented Raman spectra directly into substrate concentrations, with which the conversion from ketone to amine by ω‐transaminase could be determined over time. The model showed very good predictive capabilities, with R2 values higher than 0.99 for the spectra included in the modeling and 0.964 for an independent dataset. However, the model could not extrapolate outside the concentrations specified by the model. The presented work shows the potential of Raman spectroscopy as a real‐time monitoring tool for biocatalytic reactions.


| INTRODUCTION
The need for consistent product quality and increased process efficiency in the pharmaceutical industry has led to the development of real-time monitoring tools to complement at-line (or off-line) process monitoring, in which high-performance liquid chromatography (HPLC) is commonly used, depending on the placement of the HPLC equipment. 1,2th the implementation of real-time monitoring in a process, it is possible to set up a process control regime, which can inform the operators immediately if something goes wrong in the production, with the potential to either fix errors or discard quicker batches which are out-of-specification.This could potentially save time and effort as well as wasted materials, compared to when off-line monitoring alone is used in production.
One of the main real-time monitoring tools used in industry is infrared (IR) spectroscopy, in which IR radiation is passed through a substance, and the remaining light after absorbance is measured.As a rule of thumb, molecules with polar functional groups are known to be infrared-active and thus give strong bands in the IR-spectra.Given IR signals are strong, they have been used extensively for real-time monitoring in the pharmaceutical industry. 3However, issues with IR spectroscopy arise, when reactions in aqueous media are considered, such as most biocatalytic systems for example.The absorbance by the water molecules due to their polar nature will often be the dominant contribution to the measured signal, thus limiting relevant information from other species in the reaction media and complicating the use of IR for such systems. 4 alternative real-time monitoring tool is Raman spectroscopy, in which intra-and inter-molecular vibrations are measured by photon excitation/light scattering using monochromatic (laser) light. 3Unlike IR, nonpolar functional groups result in strong Raman bands, which make Raman spectroscopy particularly suitable for monitoring reactions in aqueous media, given the hydroxyl bonds in water molecules will not be Raman-active.
The use of biocatalysis for the synthesis of active pharmaceutical ingredients (APIs) and related intermediates has increased dramatically in recent decades, [4][5][6][7][8] and the development of real-time process monitoring in aqueous media is critical for seamless scale-up and implementation of biocatalysis for industrial-scale pharmaceutical production.For this purpose, Raman spectroscopy could be an interesting option.Moreover, solid particles are commonly not found in enzyme-based catalysis, and thus light scattering due to solid particles or bubbles in the media should not be an issue unlike synthesis with whole cells. 9Commonly, if a batch configuration is used for production, a Raman immersion probe is the standard, which is inserted directly into the media.Other alternatives include noncontact probe, although these are less common.
When running reactions in industry, for the synthesis of small molecules such as APIs and their intermediates are produced, it is critical to control the conversion at dilute concentrations to avoid substrate and product inhibition, which is commonly an issue with most biocatalysts.However, due to the very weak nature of Raman signals, interpretation of the resulting spectra can be challenging.Moreover, considering the presence of multiple components in many enzymecatalyzed reactions, such as substrate, product, cosubstrate, coproduct, cofactors, buffer, cosolvent, and enzymes, it can be difficult to translate reaction spectra to concentration profiles.Here, chemometrics and model development play a crucial role in qualifying and quantifying spectra to enable subsequent process control. 1 In the following, a brief overview of the use of chemometrics and convolutional neural networks (CNNs) for Raman spectroscopy as well the model reaction used in this article will be presented, followed by experimental results and subsequent modeling performed.

| Convolutional neural network model development for real-time monitoring
The most common pipeline for interpretation of Raman spectroscopy by chemometrics includes spectra preprocessing, such as spectra smoothing, baseline correction, normalization, dimension reduction, and so forth, before a statistical model can be used to extract relevant patterns from the data, the prevailing one being partial least squares (PLS). 10However, with very weak Raman signals, such as those obtained with dilute substrate and product concentrations, such preprocessing methods can easily erase important information from the raw spectra, which is then not included in the final model, limiting accuracy.Additionally, the preprocessing pipeline differs from system to system, further complicating model development.
In recent years, there has been an explosive development in deep-learning methods, which can potentially be implemented for spectroscopic data analysis and complement (if not replace) PLS models. 11One of these methods is the use of CNNs, which was originally proposed for image analysis.CNNs are a class of artificial neural networks (ANN).The distinguishing feature of a CNN is the use of a convolutional operation where a kernel, also termed a filter, convolutes across the input data and provides a response that is equivariant to translation.These sliding filters effectively reduce the time and size required for a CNN to reach the accuracy obtained with an ANN, and for images, CNNs have been shown to outperform ANNs. 12re recently, CNNs have been adopted for one-dimensional models, such as for signal analysis (1D CNN).Specifically, a recent study was published, in which 1D CNN was used on raw Raman spectral data without any preprocessing. 13,14stly, the 1D CNN models developed in recent years have been used for classification problems.Here, Raman spectra of different pure compounds have been modeled to determine which compounds of an unknown sample are present.However, for real-time monitoring, the 1D CNN model developed would not be used to characterize pure compounds, but instead to quantify concentrations of specific compounds in a mixture, and is thus used as a regression model instead.
The goal of this publication is therefore to investigate the use of Raman spectroscopy as a real-time monitoring tool for biocatalysis as applied to pharmaceutical production.For this purpose, a 1D CNN model was developed to predict the change in concentrations over time in a model enzyme-catalyzed reaction, based purely on raw Raman spectra from a small-scale batch synthesis.
As a model, the enzyme class ω-transaminase (also termed aminotransferase [EC 2.6.1.X], ATA) was chosen, which catalyzes the transfer of an amine-group from an amine donor molecule to a ketone to synthesize an optically pure chiral amine.This reaction is very relevant in the pharmaceutical industry, given at least one chiral amine subunit S C H E M E 1 Model system used for real-time monitoring with Raman spectroscopy, consisting of conversion of 5-acetyl-2-methoxypyridine [1] to (S)-1-(6-methoxypyridin-3-yl)ethanamine [4] by means of a commercial ω-transaminase (ATA-251) with isopropylamine [2] (IPA) as the amine donor, acetone as coproduct [3], and pyridoxal-5-phosphate (PLP) as cofactor.
[17][18] The model system is shown in Scheme 1, and consists of the ω-transaminase-catalyzed synthesis of a model pyridine methylamine ((S)-1-(6-methoxypyridin-3-yl)ethanamine) of pharmaceutical relevance as intermediate for positive allosteric modulators of nicotinic acetylcholine receptor as used for various mental disorders. 19| MATERIALS AND METHODS

| Real-time monitoring with Raman spectroscopy
A HyperFlux™ PRO Plus 785 Raman Spectrometer with a Hudson™ 785 Bioreactor Raman Immersion Probe (Tornado Spectral Systems, Toronto, ON, Canada) was used to collect Raman spectra for real-time monitoring.Raman spectra from 200 to 3300 cm À1 were collected with 495 mW laser power and an exposure time of 500 ms every 0.5 or 1 min, depending on the experiment.

| ω-Transaminase-catalyzed batch conversion of 5-acetyl-2-methoxypyridine setup
All batch reactions were done in a sealed 100-mL jacketed batch reactor with overhead stirring (250 rpm) with a working volume of 80 mL (EasyMax 102, Mettler-Toledo GmbH, Switzerland), with the setup shown in Figure 1.A Raman probe was inserted through the top of the reactor for continuous spectra collection (C) along with an 1/16-inch tube attached to a syringe for off-line analysis of samples by HPLC (D).The top of the reactor as well as the window in the front of the reactor were both covered in aluminum foil (E) to limit the amount of light entering the reactor.
TEA was dissolved in ultrapure water and pH regulated with concentrated NaOH to the desired pH as buffer for the reaction.IPA hydrochloride and PLP monohydrate were mixed with the buffer, sodium hydroxide was added until the PLP dissolved, and the pH was regulated to the desired pH with concentrated NaOH.5-acetyl-2-methoxypyridine was dissolved in DMSO, before being mixed with the IPA/PLP mixture (total of 75 mL) and heated to the desired temperature (30 or 45 C) in the batch reactor.
ATA-251 was mixed with pH regulated buffer (5 mL) and added through a hole in the top of the reactor, in which the Raman probe was inserted straight after.The final concentrations were 0.1 M TEA buffer, 10 mM 5-acetyl-2-methoxypyridine, 5 vol% DMSO, 150 mM IPA, and 1 mM PLP.
The Raman spectra collection was started immediately after ATA addition, and the first HPLC sample was taken for off-line measurement.HPLC samples were taken every half-hour to every hour, and a Raman spectrum was automatically taken every 0.5-1 min.The reaction was stopped after 24 h.

| HPLC method for 5-acetyl-2-methoxypyridine detection
Immediately after sampling, the samples were diluted with 1:1 acetonitrile, which quenched the reaction as suggested by the Codexis ATA screening protocol.This was thoroughly mixed on a vortex mixer, and subsequently centrifuged at 1200 RPM for 2 min.Finally, 0.1-mL supernatant was added to 0.9-mL HPLC grade water in HPLC vials.
F I G U R E 1 Cross section of the reactor setup in a sealed 100-mL jacketed batch reactor with a working volume of 80 mL, with a temperature probe (A) and overhead stirring (B).A Raman probe was inserted through the top (C), along with a tube and syringe for sampling (D).The top of the reactor was covered in aluminum foil (E) to limit the amount of light entering the reactor.
An Agilent Technologies 1200 Series Gradient HPLC system with a DAD and a Waters SymmetryShield™ RP 3.5 μm 150 Â 3.0 mm column was used to collect all HPLC data.HPLC grade water with 0.1% trifluoroacetic acid as well as HPLC grade acetonitrile were used as mobile phases A and B, respectively.An injection volume of 0.5 μL, a column temperature of 30 C, and a flow rate of 0.6 mL/min was used with the following gradient: 1 min with 5% B, 5%-30% B over 7 min, 30%-80% B over 2 min, 2 min with 80% B, 80%-5% B over 0.01 min, and 4 min with 5% B. 5-Acetyl-2-methoxypyridine was detected at 254 nm at 9.7 min.

| One-dimensional CNN and PLS modeling
The open-source library Keras (version 2.12.0), built on the Tensor-Flow library (version 2.12.0), was used to develop the onedimensional convolutional neural network (1D CNN) in Python.The input for the model was the raw Raman spectra consisting of data points in 1 cm À1 intervals from 200 to 3300 cm À1 .The model consisted of six 1D CNN layers, followed by two fully connected layers.
The rectified linear unit activation (ReLu) function was used in all layers, and the Adam optimizer 20 1.
To compare with a conventional modeling approach a pipeline was set up consisting of Whittaker smoothing, 21 asymmetrically reweighed penalized least squares baseline correction (ArPLS), 22 standardization and PLS regression. 23In terms of implementation, the libraries chemotools (version 0.1.4)and scikit-learn (version 1.3.2) were used.

| RESULTS AND DISCUSSION
The ω-transaminase-catalyzed asymmetric synthesis as a model system is a relatively difficult biocatalytic conversion for the development of real-time monitoring, given one ketone and one amine are converted to another ketone and another amine as shown Scheme 1.As such, the same functional groups are preserved in the conversion, and the reaction can therefore be difficult to detect with any spectroscopy method.However, with the aqueous media used in the reaction, Raman spectroscopy was deemed favorable compared with IR.
In this case, the amine donor IPA was used in a 15-fold excess to push the unfavorable reaction equilibrium toward products.As such, the Raman signal for the amine group of the product (S)-1-(6-methoxypyridin-3-yl)ethanamine was hidden by the amine donor signal and therefore not possible to detect.Moreover, triethanolamine buffer was used as the aqueous media, which in itself is an amine and thus also contributed significantly to the overall saturation of the signal, in which the amine signal was expected (3130-3480 cm À1 ).Another buffer could have been used instead to limit the contribution from the buffer to the signal.However, given that it would anyhow not be possible to detect the product with the high amine donor excess, triethanolamine was kept as buffer in this case.Moreover, given DMSO was used at a 5 vol% to dissolve the 5-acetyl-2-methoxypyridine completely, the much larger quantity of this compound was much more predominant in the spectra than the dilute substrate and product.After in-depth studies of the Raman spectra compared with theoretical Raman band correlations, it was concluded that a very small decrease in the theoretical ketone band correlation (1600-  Instead, it was decided to develop the 1D CNN and PLS models based on the Raman spectra from three of the four runs instead, and from this prediction, the substrate concentration profile of the fourth run.However, given deep learning models in general are assumed to require a great amount of data, 13 the number of HPLC measurements taken was certainly insufficient for 1D CNN model development as well as PLS. Figure 3 shows the dataflow used to generate more substrate concentration data for the models, in which the HPLC data (I in Figure 3) was used to develop a kinetic model for each run (II in Figure 3), to determine the specific model coefficients (III in Figure 3), and subsequently use the model to calculate the theoretical substrate concentration (substrate [mM]) at each Raman spectra timepoint (IV in Figure 3).process parameter setups, the reaction does not run to completion, despite the use of a 15-fold IPA excess.This is due to the unfavorable thermodynamics of this reaction, which is explained in detail our previous article. 26For the purpose of testing Raman spectroscopy as a real-time monitoring tool for ω-transaminase-catalyzed reactions, reaction completion is not required.
From Figure 4, it is noted that the kinetic model presented in

| Outlier detection
Before 1D CNN and PLS modeling of the Raman spectra, a residual analysis of the collected Raman spectra was done, to ensure the model was not trained on undesired variations.For the sake of simplicity, the overall mean value over all Raman shifts of each spectrum was calculated and compared with the overall mean value of all spectra mean values.Spectra, which had either significantly higher or lower means than the overall mean value were investigated in more detail, after which extreme outliers with an absolute Z-score above 5 were removed from the dataset.Three examples of outlier spectra are depicted in the middle column (red) of After outlier detection and removal of outliers, the spectral data sets were subsequently deemed fit for model development.

| Model development
As described previously, three models based on raw, data augmented raw, and preprocessed Raman spectra were developed both with a 1D CNN model structure as well as PLS, resulting in six models in total for comparison.
For the raw-and the augmented spectra, the training data was standardized prior to modeling to reduce the computational Data augmentation as used here is not novel, given the need for a lot of data for satisfactory deep learning models such as 1D CNNs is common knowledge.In this case, the data augmentation is used to shift the baseline synthetically as shown in Figures 6 and S1 as adapted from the publication by Lebrun and coworkers. 27is was done since it was noted that the baseline shifted slightly from run to run in the four runs, which could be expected due to dif- The overall workflow for the development of the six models can be seen in Figure 6, in which the runs A-C was used for training purposes with an 80%-20% calibration-validation split.Run D was subsequently used for independent testing.The results are shown in Figure 7 for 1D CNN models and Figure 8 for PLS with and overview of the error metrics root mean square error (RMSE) and R 2 in Table 2.
If the results from 1D CNN modeling are considered in Figure 7, it is clear that the difference between the use of raw (blue), data augmented raw (red), and preprocessed spectra (purple) does not change the accuracy of predictions for the spectra included in the training (A-C), all with an R 2 higher than 0.99.However, when the developed models were used on the independent dataset (D), the accuracy of prediction differs.Here, the use of raw spectra (blue) was the least accurate of the three with an RMSE and R 2 of 0.078 and 0.624, respectively.The preprocessing pipeline described increased the accuracy to 0.068 and 0.829 for RMSE and R 2 , respectively, as expected.
However, the use of data augmentation for the raw spectra showed the best accuracy overall, with 0.023 and 0.968 for RMSE and R 2 , respectively.the layers that are trained in the 1D CNN effectively accomplish what is required from a properly calibrated preprocessing pipeline while also performing the final regression task, that is, a holistic approach that gets the user the desired results.Note that this is achieved without the need for searching through preprocessing techniques and sequences of which there appears to be no readily available preexisting infrastructure to utilize.Thus, a 1D CNN modeling approach could potentially outperform the standard PLS approach, as only simple data If extrapolation of the concentration is required-such as with higher start concentrations of substrate or full conversion of substrate to product, it was assumed that the model would not achieve good predictions, given this is a common issue for neural network models.
This scenario is shown in Figure 9, in which the equilibrium shift toward conversion was managed due to evaporation of the acetone either by a not properly sealed lid (E) or by constant nitrogen sweeping (F). 15It is noted that there are only Raman data for F up until approximately 8 h.This was due to the high evaporation rate of the reaction media itself, leading to the liquid level decreasing to beneath the immersion probe.
However, despite the model being unable to extrapolate, it is noted that the predictions of the two new runs in Figure 9
and mean square error (MSE) were applied for model training purposes.Model training was done on 80% of the data, while model validation was done on the remaining data, with 200 epochs for each model developed.The detailed 1D CNN architecture used can be seen in Table

F I G U R E 2 Figure 2 3 . 1 |
Figure 2. Consequently, only the consumption of 5-acetyl-2-methoxypyridine was modeled.However, considering that this was ω-Transaminase is known to undergo two half-reactions and can be described by the nonsequential Ping-Pong Bi-Bi mechanism as shown in the top mechanism in Scheme 2. Here, the amine donor (A in Scheme 2) is bound to the enzyme (E in Scheme 2) and converted to the coproduct (B in Scheme 2) in the first step.This is then released, activating the enzyme (E* in Scheme 2), before the ketone substrate (S in Scheme 2) is bound and converted to the amine product (P in Scheme 2) in a second step.25However, given the amine donor was used in excess, the enzyme was assumed saturated with the amine donor, and the reaction mechanism could be simplified to only include the conversion of the substrate to the product.Given ω-transaminase reactions are reversible, the mechanism as shown in the bottom mechanism in Scheme 2 along with ordinary differential equations (ODE's) in Figure 3 (II), were used as a kinetic model, with an overview of the dataflow from HPLC concentrations to predicted concentrations for each Raman timepoint illustrated in the figure, with model fitness depicted as parity plots in Figure 4. Based on Figure 3, it is clear that the ω-transaminase reaction is faster at pH 9 and 45 C compared with pH 7.5 and 30 C, but the outcome after 24 h reaction is comparable.It is noted that with both F I G U R E 5 Three examples of outlier spectra (middle column, red) found after residual analysis based on overall spectra mean value.For comparison, the spectra collected immediately before (left column, blue) and after (right column, green) the outlier spectra are shown.

Figure 3 (
Figure 3 (II) shows good predictability, however the R 2 is lower for run C and D, given these contained outliers as seen in Figure 3.The outliers were excluded from the model development, given these would bias the resulting model and thus the subsequent 1D CNN and PLS model development.The resulting model for each run was subsequently used to determine the actual concentration at each time point for which a Raman spectrum was acquired, which could then be used in the subsequent 1D CNN and PLS modeling.

Figure 5 .
Figure 5.For comparison, the spectra before and after the outlier are shown in the left-hand (blue) and right-hand column (green), respectively.

6
Dataflow for 1D CNN and PLS model development, with training data based on run A-C from the kinetic model in Figure 4.The top represents the modeling done on raw spectra, the middle the use of preprocessing of the raw spectra before modeling, and the bottom the use of data augmentation of the raw spectra.The modeling was done either by 1D CNN or PLS.1D, one-dimensional convolutional; CNN, convolutional neural network; PLS, partial least squares.complexity during modeling.This was done by subtracting the mean and dividing by the standard deviation in the training dataset both for the Raman spectra based on each Raman shift as well as the substrate concentration, as commonly done and sometimes referred to as autoscaling or standardization.Subsequently, when predicting concentrations based on the developed model, the new Raman spectra were standardized based on the mean and standard deviation of the training data prior to prediction, while the predicted concentrations were likewise converted based on the mean and standard deviation of the substrate concentrations in the training data.Here, we opted to keep the term raw spectra despite the standardization described, as only the intensity values have been downsized, with no changes to the spectra itself.For the preprocessing pipeline, smoothing, baseline removal and standard normal variate scaling was used on both training and test data.
ferences in fluorescence or light entering the reactor.This could potentially be detrimental for model transferability, given the predictions were standardized based on the mean and standard deviation of the training data.If the absolute values of the baseline of the new dataset are not in the range of the training data, the standardization will skew the spectra and give unsatisfactory predictions.Commonly, this is solved by preprocessing, in which a baseline removal step is included.However, for very small changes, the typical preprocessing steps could potentially erase the change, making it unsuitable.As an F I G U R E 7 Parity plots of all runs with the measured concentrations from the kinetic model in Figure 4 versus predicted from the 1D CNN model, with raw (blue), data augmented raw (red), and preprocessed Raman spectra (purple), including run A-C as training data, and run D as independent test.The predicted (dots), with linear fits (solid lines), and R 2 shown, including a 1:1 diagonal (dotted line).1D, one-dimensional convolutional; CNN, convolutional neural network.alternative, data augmentation with artificial baseline changes was tested, to force the model to search for patterns in the signal itself, instead of the absolute signal value.The assumption was that the height of the Raman peaks would not change significantly despite the baseline shift, which should be valid until the point where the signal was overloaded by fluorescence.This generated data with more varied baselines, with the aim of making the model more robust and with better transferability.

Figure 8
Figure 8 shown a conventional PLS model implemented for comparisons sake based both on raw spectra (blue), data augmented raw spectra (red), and preprocessed spectra (purple).The PLS model built on the raw data outperformed the PLS model built with the preprocessing pipeline, which demonstrates an important point: preprocessing can easily deteriorate the quality of the signal and would need proper calibration, which would be time consuming.In comparison,

F I G U R E 9
Prediction of 5-acetyl-2-methoxypyridine concentration (mM) with 1D CNN model based on raw data augmented Raman spectra in which extrapolation is required for two batch runs with slight evaporation (E) and nitrogen sweeping (F).The measured values from HPLC are illustrated with blue dots with error bars and the predicted values as red dots.1D, one-dimensional convolutional; CNN, convolutional neural network.development methods described here to build a new 1D CNN model, which could then be used for real-time monitoring of transaminasecatalyzed amine synthesis with Raman spectroscopy of subsequent runs in a similar setup.4 | CONCLUSION A 1D CNN has been developed for real-time monitoring of the ω-transaminase-catalyzed amine synthesis of a pharmaceutically relevant model precursor by means of Raman spectroscopy.The best performing model was based on data-augmented raw Raman spectra with a R 2 of more than 0.99 for data included in the model development and 0.964 for an independent data set.The presented work shows the possibilities with Raman spectroscopy for biocatalytic conversion in aqueous media, with the potential for model transfer or for prediction of unknown spectral datasets in similar experimental setups.However, the 1D CNN model developed did not perform well in terms of data extrapolation, and as such additional work would need to be done on the model, if higher or lower concentration predictions are required in further work.(h À1 ) Detailed 1D CNN architecture used.
Overview over RMSE and R 2 values for each of the six models developed by 1D CNN and PLS as shown in Figures7 and 8.
within the minimum and maximum of the model ($6-12 mM) are very accurate, despite these being completely new spectral datasets.This highlights the transferability of the 1D CNN model developed for this setup within the concentration bounds of the training data.Thus, assuming similar runs can be done with full conversion from a desired starting substrate concentration, it should be possible to use the model T A B L E 2Abbreviations: 1D, one-dimensional convolutional; CNN, convolutional neural network; PLS, partial least squares; RMSE, root mean square error.