Development of a rapid polarized total synchronous fluorescence spectroscopy (pTSFS) method for protein quantification in a model bioreactor broth

Protein quantification during bioprocess monitoring is essential for biopharmaceutical manufacturing and is complicated by the complex chemical composition of the bioreactor broth. Here we present the early‐stage development and optimization of a polarized total synchronous fluorescence spectroscopy (pTSFS) method for protein quantification in a hydrolysate‐protein model (mimics clarified bioreactor broth samples) using a standard benchtop laboratory fluorometer. We used UV transmitting polarizers to provide wider range pTSFS spectra for screening of the four different TSFS spectra generated by the measurement: parallel (||), perpendicular (⊥), unpolarized (T) intensity spectra and anisotropy maps. TSFS|| (parallel polarized) measurements were the best for protein quantification compared to standard unpolarized measurements and the Bradford assay. This was because TSFS|| spectra had a better analyte signal to noise ratio (SNR), due to the anisotropy of protein emission. This meant that protein signals were better resolved from the background emission of small molecule fluorophores in the cell culture media. SNR of >5000 was achieved for concentrations of bovine serum albumin/yeastolate 1.2/10 g L–1 with TSFS||. Optimization using genetic algorithm and interval partial least squares based variable selection enabled reduction of spectral resolution and number of excitation wavelengths required without degrading performance. This enables fast (<3.5 min) online/at‐line measurements, and the method had an LOD of 0.18 g L–1 and high accuracy with a predictive error of <9%.


| INTRODUCTION
Bioprocess monitoring is important in ensuring high productivity manufacturing of therapeutic proteins. Accurate protein quantification during industrial cell culture is considered as one of the most important critical process parameters that needs to be measured for upstream process control (Rathore & Winkle, 2009). This calls for fast, accurate, and robust analytical methods capable of real-time monitoring of biologic protein production in accordance with the process analytical technology (PAT) and quality by design (QbD) initiatives. During cell culture processes, protein product concentration increases as the process progresses from very low to relatively high concentrations. The matrix in which the protein is produced is also very chemically and physically complex which continually evolves. Cell culture media (Gronemeyer et al., 2014) contains amino acids, sugars, salts, and other small to medium-sized molecules, which change as the cells metabolize these nutrients. Furthermore, the presence of host cell proteins, whole cells, and cell debris, produces a solution with continually varying physical properties. All these factors can severely limit the application of many conventional methods for protein quantification.
There are a variety of methods which can measure the total protein content, or quantify a single protein type, or determine multiple proteins simultaneously (Chutipongtanate et al., 2012).
Total protein quantitation usually relies on traditional colorimetric methods such as Bradford, Lowry, and Bicinchoninic acid (BCA) assays (Bradford, 1976;Smith et al., 1985), but these are more suitable for low protein concentrations (10-2000 μg ml -1 ). Detection of specific proteins within complex mixtures are usually performed by enzyme-linked immunosorbent assay and western blot analysis, however, they usually have low accuracy and precision (Walker, 1996). More accurate methods include mass spectrometry based techniques (Reusch et al., 2015), however, these require extensive sample preparation which can be problematic to integrate into manufacturing to enable online or in-line measurements required for rapid, real-time analysis. Spectroscopic techniques, like ultraviolet-visible (UV-vis) absorbance (Classen et al., 2017), near infrared (NIR) and mid-infrared (MIR) absorption (Cervera et al., 2009;Hakemeyer et al., 2013;Jose et al., 2011), and Raman scattering (B. Li et al., 2013), are noninvasive, non-destructive, fast methods capable of on-line protein content monitoring. UV-vis absorbance spectroscopy is fast and inexpensive, but it usually lacks the required sensitivity and specificity. MIR and NIR, are more informative, but very sensitive to the presence of polar compounds and water (Ryder, 2018). Water contamination is less of an issue for Raman spectroscopy, which facilitates its use for bioprocess monitoring in solution (Buckley & Ryder, 2017). However, the sensitivity of the method is limited by the weak protein spectra from bioprocess broths (B. Li et al., 2013), background scatter, and fluorescence interference (Buckley & Ryder, 2017).
Fluorescence spectroscopy has several advantages for protein analysis in bioprocess monitoring: proteins have intrinsic fluorescence that can be discriminated from other fluorophores, it is nondestructive, offers fast measurements, relatively inexpensive, can require no sample handling and can be applied in either in-line or atline Groza et al., 2014;Teixeira et al., 2011). These characteristics meet the criteria for the development of a PAT Read et al., 2010)

in line with
QbD principles for biopharmaceutical manufacturing (Rathore & Winkle, 2009). However, using single excitation wavelength fluorescence spectra limits the ability to resolve analyte fluorophores from strong background signals as is the case with bioreactor monitoring. Multidimensional fluorescence (MDF) measurements generate more unique fingerprints of protein emission which comprises the overlapped spectra of intrinsically fluorescent amino acids, mostly tryptophan (Trp) and tyrosine (Tyr). Phenylalanine is also fluorescent but it has a very low quantum yield (0.02) and if incorporated in proteins tends to undergo Förster resonance energy transfer (FRET) to Trp and Tyr (Lakowicz, 2006) and is thus rarely observed in proteins or complex mixtures. There are two complications with bioprocess monitoring by fluorescence spectroscopy which also need to be taken into consideration. First the population of small molecule fluorophores from the media and fluorescent metabolites produced will change very significantly during the process, contributing to a varying background interference signal. Second, the presence of fluorescent host cell proteins will also contribute to the fluorescence emission in the same regions as the product protein. However, despite this we have in the past shown that excitation-emission matrix (EEM) can be used to quantitatively model process performance in the presence of all these interfering signals .
MDF measurements can be made as an EEM or as a total synchronous fluorescence scan (TSFS) with most scanning-based spectrometers. One TSFS advantage is being able to avoid the collection of Rayleigh scatter as observed in EEM spectra, and this can assist in ensuring more accurate anisotropy measurements because scattered light is highly polarized. With polarized fluorescence spectroscopy, one can exploit the intrinsic protein emission anisotropy to achieve better resolution by discriminating the protein emission from that of the small molecule fluorophores in culture media or bioreactors. The anisotropy of protein emission arises from the slower rate of rotational diffusion which is largely due to their much larger sizes (Lakowicz, 2006). However, multiple other factors such as the physicochemical environment, the presence of quenchers, and FRET also affect anisotropy, making this a multivariate analysis problem. Using anisotropy measurements we demonstrated previously the ability to accurately quantify protein content (0.1-4 g L -1 ) in solutions with fixed media concentration (Groza et al., 2014). However, full spectrum anisotropy-based measurements are slower, and noisier, than simple intensity-based measurements, making it a less suitable PAT option for dynamic and complex bioprocesses Ryan et al., 2010). Some of the advantages and disadvantages of multidimensional fluorescence spectroscopy are summarized in Tables S1). MDF measurements and chemometrics have been widely applied to bioprocess monitoring such as the production of monoclonal antibodies (Ohadi et al., 2015;Schwab et al., 2016), antigens (Zavatti et al., 2016), and the quality control of cell culture media B. Li et al., 2011;Ryan et al., 2010). Still, MDF measurements generate large amount of spectral data that require careful interpretation and data reduction while retaining the essential information, which is generally accomplished using chemometric multivariate analysis tools. Curve resolution methods like PARAllel FACtor analysis (PARAFAC; Elcoroaristizabal et al., 2015;Murphy et al., 2013) or multivariate curve resolution-alternating least squares (MCR-ALS; Casamayou-Boucau & Ryder, 2018;de Juan et al., 2014) can sometimes resolve the analyte of interest spectrum, from the background or interferents, but they are generally subjected to significant matrix effects in complex, biogenic solutions.
Multivariate regression methods like unfolded (U-PLS) or N-way partial least squares (N-PLS) usually provide better predictive performance, but they have problems where the relationship between signal and concentration is very nonlinear such as in bioreactors (Olivieri, 2018). This nonlinearity is caused by multiple factors including: high spectral overlap, primary and secondary inner filter effects (IFE), FRET, and noise (Ghisaidoobe & Chung, 2014). Nonlinear multivariate regression methods like artificial neural networks (ANN) may be more suitable (Chiappini et al., 2020;Melcher et al., 2015), but these are more complex, do not provide spectral information, and are more prone to overfitting when only a limited number of samples/concentration range is available (Despagne et al., 2000). MDF spectra also generally contain a lot of both uninformative nonfluorescent spectral variables and highly correlated spectral variables (where there is emission) that should be removed as these can decrease robustness and increase model error. There are many variable selection methods described in the literature of which three of the most common methods are variable importance in projection (VIP; Svante Wold et al., 2002), interval partial least squares (iPLS; Nørgaard et al., 2000), and Genetic Algorithm (GA; Leardi et al., 2002). optimization. This test system was designed to provide foundation level data about the efficacy of this measurement method, before progressing to the analysis of samples from real industrial bioprocesses. The long-term goal of this study is to produce a rapid, accurate, and robust quantification methodology for eventual on-or at-line protein quantification suitable for industrial use.

| Materials and samples
Yeastolate ultra-filtered (UF, 10 kDa molecular weight cut-off) was obtained from Becton Dickson and used as received. BSA (>99%, lyophilized powder, globulin free), Bradford reagent (product number B6916), and phosphate buffered saline (PBS) tablets were from Sigma-Aldrich (Merck). HPLC grade water (Honeywell) was used for PBS stock solutions preparation. All materials and reagents were used without further purification. Yeastolate (50 g L -1 ) and BSA (5 g L -1 ) stock solutions were prepared in PBS buffer (pH 7.4 ± 0.01) and then membrane filtered (0.22 μm). Three samples sets were prepared: Calibration set (C-Set), a Test set (T-Set), and an instrumental optimization set (IO-Set). The C-Set comprised of 15 different mixtures, prepared in triplicate (n = 45) and was based on a D-optimal factorial design with: yeastolate concentration ranges: 9.0-11.0 g L -1 (n = 3, Δc = 1 g L -1 ) and 0.4-1.2 g L -1 for BSA (n = 5, Δc = 0.2 g L -1 ). A three level yeastolate concentration was selected to simulate the changing fluorescence emission as the cell culture evolves during a bioprocess. This BSA concentration was selected as being representative of titer values for monoclonal antibody production (Shukla et al., 2017). Difference plots of the normalized yeastolate spectra at these concentrations show changes in spectral shape at long excitation wavelengths (320-380 nm) due to IFE. These changes in fluorescence intensity and spectral shape ( Figure S1, SI) are characteristic of the spectral changes experienced during mammalian cell culture (B. Li et al., 2014). The external validation set (T-Set) comprised of 6 mixtures, prepared in triplicate (n = 18) with BSA concentration ranging from 0.5 to 0.9 g L -1 (n = 3, Δc = 0.2 g L -1 ) and yeastolate at 10 and 11 g L -1 (n = 2, Δc = 1 g L -1 ). The simple IO-Set used a fixed 10 g L -1 yeastolate concentration with low (0.4 g L -1 ) and high (1.2 g L -1 ) BSA concentrations. Aliquots of each sample were dispensed into 2 ml Lobind tubes (Eppendorf) and stored at −70°C before use (over 3 weeks) to limit compositional changes.
Before analysis, samples were defrosted overnight at 2-8°C and checked for ice particles before use. This procedure was used to minimize preparation variation and ensure that all samples had the same freeze-thaw profile. Aseptic solution preparation was carried out in a laminar flow hood to minimize contamination. The Bradford assay was selected as a simple orthogonal method (Bradford, 1976) to validate the performance of the fluorescence spectroscopy based predictive models and experimental details are provided in the Supporting Information (SI).

| Instrumentation and data collection
Absorbance spectra were measured along the short cuvette axis using an Agilent Cary 60 UV-Vis (Agilent Technologies) spectrophotometer with corresponding PBS stock solutions as reference.
TSFS spectra were collected using a modified Cary Eclipse fluorometer (Agilent Technologies) fitted with polarizers and temperature control. Instrument settings were similar to that reported elsewhere (Steiner-Browne et al., 2019) and details are given in the SI. | 1807 quantification was done using partial least squares (PLSs) regression (S. Wold et al., 2001) in the unfolded (U-PLS; Geladi, 2002) configuration. Different pre-processing methods were evaluated including normalization and mean-centering. The optimum number of latent variables (LVs) was estimated during leave-one-out cross-validation by comparing the root-mean-square error of calibration (RMSEC) and cross-validation (RMSECV; Olivieri, 2018) as a function of the number of LV, and the F-ratio criterion (Haaland & Thomas, 1988).

| Data analysis
The Durbin-Watson test (Olivieri, 2018) was used to verify the model non-linearities. For intensity-based methods, spectral normalization (to maximum intensity) was necessary to ensure PLS model linearity (Table S6, SI), consequently all models were built using blank subtracted, normalized data. No spectral pretreatment was required or implemented for aniso-TSFS data.
Model performance was evaluated using the external validation test set (T-set) based on determination coefficients (R 2 ), root-meansquare error of prediction (RMSEP), and relative (to the mean value) error of prediction (REP). RMSEP gives an average error of the predicted BSA concentration in g L -1 (Naes et al., 2002). Accuracy and precision of the different methods were compared using the Elliptical Joint Confidence Region test (EJCR; Mandel & Linning, 1957). The EJCR was calculated to evaluate the slope and the intercept of the regression of the reference and predicted values at a 95% confidence interval. If the 1,0 point is inside the EJCR, it can be concluded that constant and proportional biases were absent. Evaluation of significant differences between the predictive accuracy of the methods was also tested using a randomization test (van der Voet, 1994).

| Variable selection
PLS model robustness and accuracy for quantitative analysis can be improved using variable selection, to reduce the influence of uninformative and/or nonlinear spectral data (Odman et al., 2010).
Details of VIP, iPLS, and GA methods used are in the SI.

| Instrument optimization
Response surface methodology (Jensen, 2017) was used to determine the optimum instrumental parameters that maximizes the signal quality, signal to noise ratio (SNR) of the spectrally optimized TSFS ║ measurements evaluated at two BSA concentrations (IO-Set).
SNR was determined by the ratio of the average maximum BSA signal and the standard deviation of the noise determined by a selected area in the spectra with no fluorescence signal ( Figure S2, SI) and mathematically calculated using the formula in reference (Skoog, 1976). The influence of the emission bandwidth (2.5/5/10 nm), scan rate (120/1200/2400 nm min -1 ), and PMT voltage (600/650/700 V) on SNR was examined using a using a Box Behnken design (Anderson-Cook et al., 2009). The design consisted of 15 test conditions with three levels per instrumental factor and was tested at the lowest and highest BSA concentration conditions (0.4 and 1.2 g L -1 of BSA in 10 g L -1 yeastolate). The order of experiments was randomized to avoid bias. A composite desirability function was used to determine the optimal instrumental settings. The individual desirability (d) evaluates how the settings optimize each single response (SNR at each BSA concentration level), whereas the composite desirability (D) defines the settings that optimize both responses (at two BSA concentrations). A value of one for the desirability function indicates the ideal optimization (Derringer & Suich, 1980).

| RESULTS AND DISCUSSION
The yeastolate-protein model system used here was selected primarily as a model to facilitate measurement and data analysis method development using a chemically complex sample system.
This system was designed to simulate the chemical and optical complexity of clarified reactor broth samples. Clarified broth solutions are samples extracted from a bioreactor which have been filtered or processed to remove whole cells and the large cell fragments. Furthermore, yeastolate and BSA are also inexpensive, facilitating the generation of large sample sets and replicate measurements, thus making it easy for laboratories everywhere to easily prepare these samples and thus accurately compare the performance of different analytical techniques without the need to undertake costly, and irreproducible cell culture processes. To replicate a real, unclarified, bioprocess broth solution would require the addition of particulate matter to replicate the scatter due to whole cells and cell fragments. This we feel is outside the scope of the present study, as it requires significant input from multiple sources (e.g., academia, industry, and standards agencies) to achieve consensus as to how a practical model system can be defined. An issue with developing new analytical methods is that if one uses real processes then it is difficult to replicate the measurements and compare measurement efficacy with data collected from widely differing bioprocesses (e.g., Escherichia coli fermentation and CHO mammalian cell culture). It should be noted that there are no defined standard bioprocess sample systems described in the literature, whereas in biomedical spectroscopy, an area of high optical complexity, for example, the use of tissue phantoms is well established, and standards agencies such as NIST are heavily involved (Lemaillet et al., 2016). Here, a relatively low protein concentration range was selected as this is the range where quantification is more difficult (B. Li et al., 2014), particularly for the early bioprocess stages, and we wished to be able to compare efficacy with a previous study (Groza et al., 2014).

| Spectral analysis
The model culture media and calibration set samples all have a very strong absorbance in the UV (A 280 > 3 a.u. for C-Set, Figure S3, SI) because of overlap between the absorbance of the protein and cell culture media components like amino acids which are present in relatively high concentrations, similar to that of clarified industrial bioprocess broths. This effectively prevents the use of absorbance spectroscopy as a quantitative method for in-process protein analysis. This high absorbance also causes very strong IFE, which has a significant impact on UV fluorescence .
BSA intrinsic emission (λ ex = 285-300 nm, Figure 1) mostly originates from Trp with a relatively small Tyr contribution and even less from phenylalanine, largely because of FRET processes (Lakowicz, 2006). Intrinsic yeastolate emission (and most cell culture media) is more complex because of the large number of small molecule fluorophores present. The emission can be split into three main regions (Figures 1 and S4, SI): R1 (λ ex = 240-300 nm) mostly related to emission from aromatic amino acids (B. Y. Li et al., 2012); R2 (λ ex = 320-360 nm) corresponding to larger fluorophores such as vitamins (Faassen & Hitzmann, 2015), and R3 (λ ex > 360 nm) corresponding to co-factors and flavins (Graf et al., 2019). The strong yeastolate and BSA spectral overlap (Figure 1, R1) coupled with the presence of multiple chemical constituents from yeastolate, like paramagnetic ions, histidine (Lakowicz, 2006), and also non-emitting chromophores (B. Li et al., 2010), significantly reduces BSA emission intensity through quenching and IFE processes. For instance, when the BSA (1 g L -1 ) spectrum was compared to a 1:10 g L -1 mixture, BSA emission intensity decreased by 58% (TSFS || ), >53% (TSFS T ), and by >50% for TSFS ⊥ (Table S2, SI). The stronger quenching of protein emission observed in TSFS || was due to higher IFE sensitivity and the small additional scattering contributions in these spectra. Furthermore, the intensity changes were nonlinear ( Figure S4, SI), which makes univariate-based quantitation difficult. S5-S8 and Table S2, SI), verified that the parallel polarization was more sensitive to protein emission, with the BSA/YST ratio (calculated as the ratio of the 1 g L -1 BSA in buffer signal of to that of 10 g L -1 YST) in pure solutions of 1.13 (TSFS || ) > 0.99 (TSFS T ) > 0.93 (TSFS ⊥ ). Despite TSFS T measurements having the strongest absolute protein emission signal, the matrix contribution is also the highest leading to a poorer protein signal to matrix ratio (BSA signal to BSA/YST matrix ratio, 1 which was highest for (TSFS || ) = 1.02 > TSFS T (0.92) > TSFS ⊥ (0.86).

Comparison of polarization modes (Figures
The changes in fluorescence intensity between C-Set samples were more pronounced in the tryptophan emission region of and in the TSFS || spectra. This suggests that the parallel polarization measurement is potentially more sensitive to changes in protein emission signal. The repeatability and intermediate precision of replicate pTSFS measurements were all acceptable, with RSD values of <1% and ≈5%, respectively (Tables S3 and S4, SI). BSA has maximum anisotropy,~0.22 at λ ex /Δλ = 310 nm/40-60 nm (λ em = 350-360 nm), however, this varies very significantly across the emission space because of a combination of FRET and environmental effects (Lakowicz, 2006). These data agree with literature value of 0.219 ± 0.002 (λ ex /λ em = 340/470 nm, 20°C) recorded using dansyl labelled BSA (Flecha & Levi, 2003), and intrinsic emission measurements, λ ex /λ em = 294-305/324-363 nm (Groza et al., 2014).

| Deep UV MDF spectral features observed
Using WGP instead of standard thin film polarisers (TFP) enabled UV excitation below 280 nm. This led to slight changes in the shape and anisotropy profile of a similar model system studied previously (Groza et al., 2014). Excitation below 280 nm allowed for more light absorption by Tyr resulting in higher emission from both Tyr and Trp with additional emission from Trp (via hetero-FRET) and Tyr (via homo-FRET). There are also differences observed in the TSFS spectra obtained with WGP in comparison to that obtained with TFP. For TFP and WGP collected spectra, the most polarized emission for BSA occurred at λ ex /λ em = 300/340 and 280-295/340 nm, respectively. This blue shift corresponds to increased Tyr emission which is important for proteins that lack a Trp residue such as Insulin repeatability values (RSD = 4.0%-5.4%) for aniso-TSFS (Tables S3 and   S4, SI) were worse, due to the error propagation associated with calculating anisotropy from multiple spectral measurements. This along with the long analysis time required, made aniso-TSFS measurements a less viable PAT option, and as such we now focus on pTSFS measurements.
The explained variance for the BSA component in TSFS || data was dramatically larger (~44%) than for TSFS T . The third component (2-3%) has its maximum excitation peak red shifted from 282 to 290 nm and was stronger in TSFS || spectra suggesting that it originated from BSA ( Figures S9-11, SI). This could be due to yeastolate components binding to BSA, and thus changing the emission properties, identifying specific binding processes is not feasible due to the large number of potential interactions (e.g. BSA can interact with vitamin B12 and B6 (Zhang et al., 2008), both are present in yeastolate (Mosser et al., 2015). The fourth component (1%-2%) is weaker in the TSFS || data which might suggest that this originates from smaller yeastolate fluorophores. The best explanation at the moment is that this represents an IFE based effect where the increased light absorption by BSA as its concentration increases. These MCR identified factors will potentially introduce non-linearities into protein quantification regression models for complex media.

| Model performance
For all models, the optimum number of LVs required was four and these had similar attributions as above. For example, PLS loadings plots of TSFS || measurements (Figure 2) Note: LOF and SimI represent the lack of fit and similarity index, respectively. SimI was calculated for C1 and C2 using the pure spectra of yeastolate and BSA, respectively. λ ex /Δλ represents the excitation wavelength and offset of maximum fluorescence intensity.

BOATENG ET AL.
| 1811 figures of merit such as the RMSEC, RMSECV, and RMSEP as well as the determination coefficient (R 2 ). Considering the full spectral map using 2 nm spectral resolution, there were no significant differences between the different intensity based pTSFS models. aniso-TSFS was worse because of increased noise, and the increase of~1.4 in REP% agrees with error propagation theory. This is also seen in the EJCR plots (Figure 3a), where it shows an accuracy bias and lower precision compared to the other measurement methods. This confirms the unsuitability of the more complex aniso-TSFS based method due to the larger errors associated with anisotropy calculations, however, the aniso-TSFS plots do provide a valuable insight into understanding sample emission properties which can help validate the basis for regression modelling of pTSFS data. Using full MDF measurements to build chemometric models, however, is not optimal because of the high degree of collinearity in this type of data. This means that many variables are redundant and contribute to making the predictive models less reliable. Therefore, one needs to reduce the number of variables to only those that are informative (B . Li et al., 2014).
F I G U R E 2 Loadings plots from the PLS regression model built using the TSFS || normalized data: (a) LV 1 (99.6% spectral variance explained), (b) LV 2 (0.4% spectral variance explained), (c) LV 3 (0.01% spectral variance explained), and (d) LV 4 (0.01% spectral variance explained). PLS, partial least square; TSFS, total synchronous fluorescence spectroscopy; LV, latent variable [Color figure can be viewed at wileyonlinelibrary.com] T A B L E 2 Statistical parameters of the PLS model performance built using pTSFS and aniso-TSFS data Note: All models used four LVs and pTSFS data were pre-processed by normalization to maximum intensity. No spectral pretreatment was used for the aniso-TSFS data. The best method is highlighted in bold.
We expected that this would enable us to refine and shorten the pTSFS data collection method to reduce measurement time. VIP scores is a simple, fast, popular variable selection method available in most commercial software (Farres et al., 2015). Generally, VIP scores > 1 are used as a criterion for variable selection (Chong & Jun, 2005;Farres et al., 2015), however this cut-off threshold needs to be evaluated for optimal results. Here, selecting variables with VIP scores > 2 were found to improve accuracy by~10% compared to the full pTSFS spectra. Using lower (>1) or higher (>3) thresholds yielded poorer prediction performance (Table S7, SI) probably because at higher thresholds, the selected variables were being concentrated in the BSA emission region only ( Figure S12D-F, SI). This agrees with the MCR analysis that indicated secondary protein effects (LV3 and LV4) were important and that using only protein emission spectral changes was not appropriate.
Using GA variable selection provided a slight improvement (~15%) in prediction accuracy (Table 2) for TSFS || based models.
However, both GA and VIP-based methods selected variables were distributed across the emission space and as such were less suited for PAT implementation and requires too many excitation wavelengths. We then used iPLS to do variable selection, as this was expected to generate selected variables which has fewer 2D spectra.
The TSFS || model EJCRs showed that the ideal value (1,0) was located in a more central position (Figure 3b), indicating a lower bias value compared to the other pTSFS models. Of more significance was that only~25% of the original spectral variables were needed (~1200 from 4686) which were evenly distributed across the spectrum in such a way that their selection could be associated with the minimum spectral resolution required ( Figure S12C, SI). This suggested that that using a lower 4 nm resolution would not compromise model quality and also reduce measurement time. This was verified by manually selecting the data in 4 nm steps (in both excitation and emission) and then evaluating the performance of these reduced variables PLS models ( Table 2). The results were not significantly different (p > .05, randomization test) proving that measurement time could be reduced bỹ 70% (from~40 to~8-12 min). The speed improvement will also enable spectral averaging, improving SNR, and thus prediction accuracy.

| Instrument optimization
pTSFS measurement settings used were based on historical values and we needed to validate if these were the optimal instrumental settings for producing single scan spectra with the best SNR, and consequently, the lowest measurement error. Spectral averaging should be an included variable here but was impractical with the current data acquisition speed. SNR optimization of TSFS || measurements via instrumental settings was implemented using a F I G U R E 3 Elliptical joint confidence regions (at 95% level) for the slope and intercept of the PLS pTSFS regression models using: (a) full spectra; (b) GA selected variables, and (c) TSFS || full spectra (2 nm resolution), TSFS || SNR optimized at 4 nm resolution, and Bradford Assay. PLS, partial least square; pTSFS, polarized total synchronous fluorescence spectroscopy; SNR, signal to noise ratio [Color figure can be viewed at wileyonlinelibrary.com] compact Box Behnken design (Table S10, SI) with three parameters: scan rate (120-2400 nm/min), PMT voltage (600-700 V), and emission slit width (2.5 to 10 nm). The highest SNR (5732) was observed for BSA/YST concentrations of 1.2/10 g L -1 using scan rate of 1200 nm min -1 , PMT voltage of 650 V and emission bandwidth of 5 nm, although using the same settings for a low BSA concentration yields a poorer SNR. Increasing the excitation slit width is the easiest way to significantly increase fluorescence intensity and hence SNR, but it also increases the amount of scattered and stray light being generated which is a major problem with intrinsic emission measurements. Furthermore, this optimization process is limited by the detector dynamic range (Wiberg et al., 2004), therefore, in this design, the excitation slit was fixed at 10 nm to avoid detector saturation issues. From the ANOVA results, the scan rate and emission bandwidth were most significant at the higher BSA concentrations whereas the scan rate, PMT voltage, and the emission bandwidth were all significant for the lower BSA concentrations (Table S11, SI).
Among these, emission bandwidth was the most significant factor (p = .001), increasing both SNR responses without reducing accuracy of protein prediction (vide infra). This indicated that spectral resolution was not overly important. At low protein concentrations, the matrix effect was significantly higher, as expected, leading to poorer SNR.
Instrumental optimization was further assessed using the composite desirability function ( Figure S13, SI), which provides an overall assessment of how the instrumental settings affected SNR. The optimal solution was found to be: scan rate 600 nm min -1 , PMT voltage 650 V, and emission slit 8 nm with a composite desirability of 0.99. To make the optimal solution compatible with experimental conditions, the emission slit was set to 10 nm. However, a 600 nm min -1 scan rate was considered too slow and increased to 1200 nm min -1 . The composite desirability reduced slightly to 0.84, but this instrumentally optimized TSFS || model improved in prediction accuracy (25% RMSEP) compared to the model without instrumental optimization ( Table 2). The results of this exercise showed that model accuracy could be substantially improved but also that SNR could not be significantly improved by modifying the instrumental parameters under single scan conditions. This then means that the only option to improve pTSFS measurement SNR on these types of sample is to use spectral averaging. However, with scanning-based fluorometers that is not a particularly viable option, even with the reduced measurement time resulting in using a larger 4 nm step size. The Bradford Assay and EJCR plots (Figure 3c) were used to quickly visualize differences in the accuracy and precision of both methods. This showed that the 1,0 point was located within both ellipses, implying that both methods were accurate. However, ellipse sizes indicated that TSFS || was more precise than the Bradford Assay, particularly for optimized TSFS || data, which was supported by the figures of merit (Table 3). The TSFS || method was clearly superior in terms of sensitivity (LOD and LOQ reduced by~18% and 28%) and accuracy (~5% REP lower for TSFS || method), and dramatically better in terms of predicting new, test, samples. This is explained by the fact that the Bradford Assay is less able to handle interferences arising from the media matrix, and has a limited linear range (Walker, 1996).

| CONCLUSIONS
We have demonstrated the potential for pTSFS measurements as a rapid, spectroscopic methodology for potential online or at-line bioreactor monitoring using a model protein-media system. High-resolution pTSFS measurements are time-consuming (~40 min per sample) and for online applications, data acquisition speed needs to be much faster and thus variable selection was employed to see how much the spectra could be reduced. GA variable selection analysis showed that spectral resolution could be safely reduced from 2 to 4 nm without degrading model accuracy which reduced measurement time by 60%. iPLS variable selection further showed that fewer excitation wavelengths (15%) could be used to build models with similar performance to those using full pTSFS spectra. This is a demonstration of the most effective use of variable selection for chemometric modelling, namely the intelligent deresolving of complex high-resolution measurements (e.g., full TSFS spectra) into simpler datasets which can be acquired using simpler and faster measurement methods.
Further improvement in model performance requires better quality data and specifically higher SNR. Optimization of the instrument measurement using DoE was effective at improving measurement SNR somewhat and reducing model error for these single scanbased experiments. It also showed that reducing scan speed was very significant because it increases the exposure time at each spectral point leading to stronger signals. Unfortunately, this also increases the data acquisition time making the measurement too time consuming (>10 min per spectrum) for off-or at-line measurements. A more practical solution is to use multichannel spectrometers which can both use longer exposure times and undertake spectral averaging without making the overall measurement time too long.
TSFS || was found to be best for measuring protein concentration because these spectra had improved protein signal SNR due to the anisotropic protein emission. This better discrimination of the analyte signal from the strong background emission of all the small molecule fluorophores in the cell culture media is the significant Note: R 2 = coefficient of determination, RMSE = root-mean-square error in g L -1 , REP = relative error of prediction in %. LOD and LOQ were calculated as the ratio between 3.3 and 10 times the standard deviation of the intercept and the slope of the regression, respectively (ICH Guideline, 2005).
benefit compared to unpolarized MDF measurements resulting in lower predictive errors and higher precision. It also means that only a single spectral measurement is required making the pTSFS method sufficiently fast (<3.5 min) for online/at-line measurements. The optimized TSFS || -based measurement method outperformed the Bradford Assay having an LOD of 0.18 g L -1 and high accuracy with predictive error of less than 10%. In addition, it requires no reagent use or significant sample handling. Another advantage of MDF measurements and thus TSFS || is the potential to quantify two or more proteins simultaneously (Wiberg et al., 2004) and the potential to build models for other nonfluorescent bioprocess components like as glucose and ammonia concentration (Ohadi et al., 2014).

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.