Location Prediction of Sperm Cells Using Long Short-Term Memory Networks

Intracytoplasmic sperm injection (ICSI) requires precise selection of a single sperm cell in a dish to be injected into an oocyte. This task is challenging due to high sperm velocity, collision with other sperm cells, and changes in the imaging focus. Herein, a new model is proposed, which is based on a multilayer long short‐term memory (LSTM) network combined with linear extrapolation calculation, for predicting the future location of individual sperm cells based on their previous paths. The model is trained with a unique loss function, constructed from the predicted location and trajectory, and results in low mean location error predictions. The results are based on comparing different input sequences length, number of time frames ahead, and motility parameters of sperm cells. This model can provide faster and more accurate sperm motility predictions, better tracking, and aid automatic sperm capturing technologies.


Introduction
Recent statistics shows that 15% of couples experience infertility problems, characterized by an inability to conceive naturally over the course of one year. [1,2]Fertility treatments, namely, assisted reproductive technology (ART), involve medical procedures, such as in vitro fertilization (IVF) and intracytoplasmic sperm injection (ICSI). [3]In ICSI, a single sperm cell is selected in a dish and injected into the female oocyte.After several days of incubation and inspection, the fertilized oocyte is returned into the female body to hopefully induce a successful pregnancy.
The current success rate of the ICSI procedure that end in live birth, depending on many factors, stands at 20-33%. [4]n ICSI, the embryologist detects a potent sperm cell by inspecting both its morphology and motility, while tracking it along its path, and eventually captures it using a micropipette.Due to the sperm cell high motility, these tasks are difficult and typically performed manually by the embryologist.Polyvinylpyrrolidone (PVP) solution may be used to slow down the sperm motion so that the embryologist's capturing task would become feasible. [5,6]However, some research shows that PVP might carry potential risks to the sperm cell and affect early embryo development. [7]0][11] Computerized motility feature extraction [12] facilitates fast detection and quality analysis of sperm cells.Researchers have also explored the use of various microscopic imaging techniques, including quantitative phase imaging and deep learning algorithms, to predict the quality of sperm cells based on their morphology and estimate the extent of their DNA fragmentation, [13][14][15][16] in addition to their motility. [17]Yet, sperm cell tracking faces many challenges such as overlaying of cells in the semen sample and sperm cells that get out of focus during motion.Moreover, capturing the cell, manually or automatically, requires the ability to predict where the cell would be in the future, when the capturing occurs.
In recent years, there has been considerable research on modeling the physical movement of sperm cells.This research has primarily focused on defining accurate mathematical models that describe the cell movement in relation to its physical and morphological features.For example, Eamonn A. Gaffney, et al. [18] reviewed mathematical models of sperm cells using fluid dynamics around the cell to describe its movement, and D. J. G. Pearce, et al. [19] explored the movement of aggregating sperm cells compared to a single sperm using force and momentum equations.G. Dardikman-Yoffe, et al. [20] used quantitative phase imaging videos of swimming sperm cells to reconstruct the cell 3D movement, and M. Nassir et al. [21] used dynamic numerical mechanical models to predict sperm progression in three dimensions and compared different sperm cell morphologies.
In contrast to these deterministic approaches for sperm cell dynamic swimming prediction, the present work focuses on Intracytoplasmic sperm injection (ICSI) requires precise selection of a single sperm cell in a dish to be injected into an oocyte.This task is challenging due to high sperm velocity, collision with other sperm cells, and changes in the imaging focus.Herein, a new model is proposed, which is based on a multilayer long short-term memory (LSTM) network combined with linear extrapolation calculation, for predicting the future location of individual sperm cells based on their previous paths.The model is trained with a unique loss function, constructed from the predicted location and trajectory, and results in low mean location error predictions.The results are based on comparing different input sequences length, number of time frames ahead, and motility parameters of sperm cells.This model can provide faster and more accurate sperm motility predictions, better tracking, and aid automatic sperm capturing technologies.
predicting the future location of sperm cells using an automatically learnt mathematical model through the use of deep learning algorithms, trained on different sperm swimming dynamics.Such a model has the advantage of learning from collective data to make predictions, providing faster and potentially more accurate results, overcoming the limitations of physics-based models in predicting the intricate swimming dynamics of the studied environment.
Recurrent neural networks (RNN) have been used in various applications, such as natural language processing (NLP) and time-series prediction, due to their ability to induce recursive dynamics and thus introduce delayed activation dependencies across the processing elements of the network. [22]Long shortterm memory (LSTM) is an RNN component composed of a memory cell, input gate, output gate, and forget gate to solve the problems of vanishing gradients and exploding gradients, making LSTM networks fit for many prediction tasks with better accuracy results. [23,24]The task of future location prediction has been previously used LSTM networks in the macroscopic world. [25,26]n the present work, we use the LSTM model to predict sperm cell future locations in semen sample videos, for the purpose of assisting the embryologist's tasks and eventually improving ICSI success rate.The following sections specify our dataset acquisition process, our prediction model, the experimental setup, and our results comparing different parameters of the input sequence length and the number of future frames.Finally, we present the study limitations and discuss possible further research.

Dataset Acquisition
Upon obtaining Tel Aviv University's IRB approval and informed consent from the donors, semen samples were obtained from three donors, marked as donor A, B and C. The semen was liquefied at room temperature for 30 min; then spermatozoa were isolated using the PureCeption Bi-layer kit (Origio, Målov, Denmark) in accordance with the manufacturer instructions.Upper phases were discarded, the pellet was resuspended in 5 mL of Quinn's Advantage (QA) medium with HEPES (ART-1023, cooper surgical, US) and centrifuged at 500Â g for 5 min.Next, the supernatant was discarded and the pelleted sperm cells were resuspended in 0.1 mL QA medium for donors A and B and in 0.2 mL for donor C. Multiple samples of 50 μL per donor containing live moving cells were placed on a 60 Â 20 mm #1 cover slip, which was then enclosed within a chamber made from a cut 22 Â 22 mm #1 coverslip sealed with wax.In order to conduct various concentration tests, the sperm cells obtained from donor C were resuspended in 0.04 and 2.0 mL of QA medium to generate concentrated and sparse semen samples, respectively.Additionally, for the polyvinylpyrrolidone (PVP) test, which is commonly used in the clinic to slow the sperm swimming and ease analysis, 30 μL of the sperm cell solution obtained from donor C has been supplemented with 20 μL of 7% PVP (360 000 mw, PVP360, Sigma), deposited onto the coverslip, and enclosed within the chamber.
The samples were placed in an inverted microscope (Olympus IX83) with PlanApo N 60Â/1.42NA oil-immersion objective.In addition, to assess the performance of our model for different magnifications, we carried out experiments with UPLFLN 40Â and UPLFLN 20Â objectives, while keeping all experimental conditions the same.For each donor, multiple videos were recorded with a binning setting of 4, enabling an exposure time of 1 ms.Resulting in a frame size of 616 Â 514 pixels, with pixel sizes 0.36 μm Â 0.36 μm, 0.54 μm Â 0.54 μm, and 1.08 μm Â 1.08 μm for the 60Â, 40Â, and 20Â microscope objectives, respectively.The frame rate was set to 70 fps for all microscope objectives.

Dataset Annotation
The previously available methods for tracking sperm cells have several limitations, such as cells going out of focus or colliding with one another, leading to inaccurate cell tracking.Hence, to ensure the accuracy of the annotations as quickly as possible, a combination of automatic and manual annotation methods is necessary.To this end, we developed a self-made annotation application.This application allows the user to select a specific frame from a given video and mark the coordinate of the sperm cell to be tracked.The cell is tracked to the next frame utilizing the mean of the k-th largest and k-th smallest pixel values within the area around the selected coordinate.If the predicted coordinate is incorrect, the user can manually select the correct location of the cell.The coordinates of the cell are then saved to a file containing all the annotations, and the user moves on to the next label.Video S1, Supporting Information, shows the process of annotating a sperm cell along its movement using our application.
For each sperm cell, the straight-line velocity (VSL) and curve line velocity (VCL) motility features were extracted.Additionally, the linearity (LIN = VSL/VCL) motility feature was calculated. [27,28]The VSL and VCL were calculated according to our location annotations using the following formula where N is the number of annotated frames for a specific sperm cell, X r ¼ ½x r ð0Þ, : : : , x r ðNÞ ¼ x r ð0Þ y r ð0Þ , : : : , x r ðNÞ y r ðNÞ is the path vector from frame t ¼ 0 to frame t ¼ N, and Δt is the time interval between two frames.After acquiring the videos, the cells were tracked manually using the application described, creating the full dataset, where donor A's annotations were used to train the model and donors B's and C's annotations were used for evaluation.A total of 289 cells over 64 088 frames were annotated from donor A, 99 cells over 20 532 frames were gathered from donor B, and 112 cells over 29 468 frames were gathered for all the different experiments from donor C. The data distribution over the different videos and the motility features are presented in Table 1 and Figure 1.

Sequence Selection and Preprocessing
The future location prediction of the entire workflow is described in Figure 2.
First, for each sperm cell, we defined the cell path vector from frame , : : : , x r ðNÞ ¼ x r ð0Þ y r ð0Þ , : : : , x r ðNÞ y r ðNÞ , where N is the number of annotated frames for the cell.Using the path vector, we define the input sequence vector X r tÀn, t ¼ ½x r ðt À nÞ, : : : , x r ðtÞ, where n is an arbitrary number representing the length of the input sequence, and is smaller than N, such that x r ðtÞ is the current cell location for the network input.
We define the cell path origin as and the last sample of the mean cell path as The normalized partial path, which is used as our input sequence, is defined as  X tÀn, t ¼ ½xðt À nÞ, : : : Thus, for every attributed cell, the normalized input sequence is set to be between xðt À nÞ ¼ ð0, 0Þ and xðtÞ ¼ ð1, 1Þ, creating uniform direction for all input sequences.
We then define xðt þ kÞ as the location of the sperm cell k future frames, normalized using Equation ( 4), as the ground truth.In the training process, the input sequence origin ðt À nÞ is taken randomly from the cell path vector in each training iteration.However, in the validation process, we take the input sequence from the beginning of the cell path in a rolling window from x r ð0Þ to x r ðN À n À kÞ.
From the current position xðtÞ to the future location xðt þ kÞ, we calculate the velocity vector of the sperm cell trajectory as follows We use this velocity vector to evaluate the sperm cell trajectory loss during the network training and to estimate the sperm cell future location in the inference process.In addition, smoothing with a Gaussian filter is applied as a preprocess stage to remove noise from the location vector, which may affect the learning and inference processes.

LSTM Model
The input sequence passes through a multilayer LSTM model.The LSTM utilizes distant past data and near past data using a cell state and a hidden state to address the problem of vanishing gradients, thus efficiently learning and inferring future predictions.The cell state acts as a conveyor belt that carries information from one time step to the next.
The hidden state is used to provide the output of the LSTM at each time step.It is a function of the current input, the previous hidden state, and the current cell state. [29]The last hidden state is the output of the LSTM network and the input for the fully connected (FC) layer, and is used to produce the outputs.
In a multilayer LSTM network, multiple LSTM chains are stacked, such that each layer contains its own cell state line, and the hidden layer from each LSTM block of one time point is also chained to the consecutive LSTM block of the same time point in the next layer, as illustrated in Figure 3.
In the LSTM multilayer model, the last output hidden vector of the last layer is used as the input of the FC layer.The network outputs are the predicted sperm cell location x p ðt þ kÞ ¼ x p ðt þ kÞ y p ðt þ kÞ and the predicted trajectory velocity v p ðt, t þ kÞ ¼ v x p ðt, t þ kÞ v y p ðt, t þ kÞ between the current location xðtÞ and the future location xðt þ kÞ.

Loss Function
Our loss function is constructed from three different loss calculations based on mean squared error (MSE) between the predicted value and the GT.The first loss is an MSE loss on the future location vector over all the path vectors in the batch where B is the batch size.The second loss is an MSE loss on the trajectory velocity vector The third MSE loss indicates that the predicted future location, combined with the trajectory velocity, should be backprojected to the current location to satisfy the requirement The total loss of the network is the combination of these losses This equation represents a loss function that can be customized by assigning weights to different parts of the function.The weights w l , w v , and w p corresponded to the weight given to the loss in location, velocity, and projection, respectively.By adjusting these weights, the algorithm can prioritize different aspects of the prediction process according to the specific requirements of the application.

Future Location Prediction
To determine the future location, we compute the mean of the predicted position and the position obtained by projecting the current location with the predicted velocity, which can be expressed as Note that linear extrapolation of the predicted position contains significant information regarding the sperm cell future location.Moreover, the accuracy of the linear extrapolation grows as the length of the input sequence increases and the number of future steps decreases.Therefore, we use a weighted average between the location prediction b x p ðt þ kÞ and the linear extrapolation location estimation, b x l ðt þ kÞ.
We define the relation between the input sequence length n to the number of future frames k as follows such that when n > k and γ > 0.5, we give a larger weight to the linear extrapolation.However, when n < k and γ < 0.5, we give a larger weight to the location estimation.Thus, we calculate the final normalized future location prediction as follows Finally, we use the parameters obtained from Equation ( 1) and ( 2) to denormalize the result into the image plane, as follows 3. Results

Experimental Setup
The LSTM network was constructed with four layers and a hidden dimension sized 32 for all the different experiments.
All the deep learning infrastructures were based on PyTorch v1.12.1 þ cu113.All the training and inference runs were performed using run:AI platform with NVIDIA RTX A5000 \A6000 GPU.
The network was trained on donor A's dataset using the loss presented above ℒðw l , w v , w p ¼ 1Þ for 1600 epochs with batch size 32 and learning rate λ ¼ 0.001.The size of the hidden layer in our network was set to 32, with a dropout rate of 0.2.For the fully connected layer, we used a linear activation function.To determine the optimal network parameters, we performed multiple experiments and reviewed the results manually, optimizing the number of epochs, batch size, learning rate, hidden layer size, and dropout rate.While this approach proved effective for our purposes, a more comprehensive optimization technique [30] could potentially identify even better parameters, at the cost of requiring more resources.For evaluation, inference was made on the donor B's dataset using the trained network, and the Euclidean distances in micrometers were calculated between the actual future annotation x r ðt þ kÞ and the predicted future location b x r ðt þ kÞ.

Error Rate Calculation
The error rate is determined by computing the Euclidean distance between the predicted position b x r ðt þ kÞ and the GT position x r ðt þ kÞ Moreover, we also analyze the relative error, which is the quotient between the error obtained from Equation ( 15) and the distance traversed by the sperm cell from the end of the input sequence to the future position at the given time

Location Accuracy
Figure 4 illustrates a heatmap of the error and the relative error as presented in Equation ( 15) and ( 16) as a function of input sequence length n ½s and the future steps taken k½s.As expected, the error rate tends to rise as the prediction step k becomes larger, while it tends to decrease as the input sequence length n increases.Specifically, for small k values ð< 0.6sÞ, the mean error ranges from 4 to 8 μm, whereas for large k values ð> 0.6sÞ, the mean error can go up to 14 μm for short input sequences and 10 À 12 μm for long input sequences.We observe that the relative error rate, that is, the location error, in proportion to the distance covered by the cell, particularly for small future steps k, ranging from 35% to 45%, reduces as k values increase.Specifically, for k ¼ 1.4s, the error rate is %10% relative to the distance traversed by the cell.
If we look at the mean error relative to average sperm cell head length (4.5 μm) and width (3 μm), [31] we see that for k < 0.8 s the ratio between the error and the sperm cell head dimensions is between one to two.For k > 0.8 s, the ratio can reach four.This ratio might need to be taken into consideration for future applications.
Since no baseline of algorithms for future sperm cell location prediction exists, specifically for our dataset, we compare our predictions with a simple linear extrapolation and with a polynomial extrapolation, [32] as shown in Figure 5.The extrapolations are done by fitting a linear or polynomial function f ðtÞ to the input sequence location points xðt À nÞ, : : : , xðtÞ using regression and applying this function to the future frame f ðt þ kÞ.
In Figure 5a, we provide a comparison of the LSTM model and linear extrapolation across all input sequence lengths and future steps tested.This comparison shows the superiority of the LSTM model over linear extrapolation.In Figure 5b,c, we zoom into a specific input sequence length (n ¼ 571 ms) and compare the LSTM model, linear extrapolation, and polynomial extrapolation.This comparison demonstrates the rapid increase of the polynomial extrapolation error, which exceeds 20 μm for high futurestep k values.While the linear extrapolation error rate is relatively low (5 À 10 μm), especially for small future-step k values, our model still outperforms the extrapolation prediction.We also observe that our model consistently performs better than linear extrapolation by 5-10% in terms of relative error rate.

Robustness to Changes in Frame Rate and Magnification
Due to the relatively high frame rate of 70 fps used in our study, it is possible that some microscopy instruments commonly used in clinics and current computer-assisted sperm analysis (CASA) applications may not be able to operate at such a high rate.Thus, it is important to investigate the robustness of our algorithms with lower frame rates.To achieve this, we utilized bicubic interpolation on the cell path and reduced the number of frames collected to match the desired frame rate.For instance, if we tracked a cell over 140 frames at 70 fps, its 20 fps path would contain 40 frames.We retrained and evaluated our algorithm for all cells at 20 and 50 fps, which are similar to the frame rates used in various clinics and CASA applications.
Additionally, the magnification levels used in different microscopes in clinics and CASA applications can vary, particularly in procedures such as ICSI where embryologists need to detect the best cell from a wide field of view.Although we used a 60Â objective in our study, we aim to investigate the performance of our model for lower magnifications using 40Â and 20Â objectives.To achieve this, we created a low-magnification dataset by scaling the annotated coordinates with the corresponding scales of 2/3 for the 40Â objective and 1/3 for the 20Â objective, and then trained our model on these constructed datasets.
The accuracy of our algorithm is limited by the accuracy of the location-detection algorithm used to track the sperm movement, which in our case has been performed manually.Lowermagnification levels can lead to lower accuracy of the location predicted by the algorithm.To simulate this effect, we applied additive white Gaussian noise to the location signal during the training and evaluation process with varying signal-to-noise ratio (SNR) values.Figure 6 compares the predicted location error rate of our LSTM model and the linear extrapolation method on donor's B test set simulated with different magnifications and different frame rates, using an input sequence of 1 s.As shown in Figure 6a, the frame rate has a small impact on the prediction accuracy, with only a 4.3% mean difference (0.31 μm) between the 50 and 70 fps measurements and a 9.9% mean difference (0.77 μm) between the 20 fps and 70 fps measurements.Figure 6b illustrates the effect of varying SNR using the additive white Gaussian noise function on the accuracy of the different magnifications.We observe that lower SNR values (higher noise) lead to a significant decrease in accuracy, with an error rate of 13.9 μm for the 60Â original objective compared to the original accuracy of 9.48 μm.The low magnifications appear to be more sensitive to noise, with 40Â and 20Â simulated paths achieving a higher accuracy of 3.2 À 7.2% and 5.6 À 10.6%, respectively, for low SNR values relative to the original measurements.For high SNR (low noise), we observe very similar results, with a slight advantage for the low magnifications, where the 20Â simulated paths have a mean accuracy 2.7% better than the original 60Â.
Additionally, we conducted tests on our model, trained using the simulated dataset, using annotations obtained from actual 20Â and 40Â videos captured from donor C, with an input sequence of 1 s and future steps of 1 s.A summary of the results is shown in Table 2.These results suggest that the performance of our model on the simulated 20Â and 40Â objectives can be considered reliable for real lower-magnification data as well.

Sample Environment
To account for variations in sperm cell concentration, which can impact the characterization of their movement, we conducted additional experiments to assess the performance of our model across different sperm concentration levels.As previously mentioned, changes in concentration can have an effect on the behavior of sperm cells.Thus, we wanted to ensure that our model, trained on a moderately concentrated sample, would also perform well on samples with high and low sperm concentrations, referred to as dense and sparse samples, respectively.To evaluate this, we tested the original trained network using 1 s input sequence and predicted the cell locations 1 s into the future for both the dense and sparse samples obtained from donor C.
Moreover, in the context of current clinical practices and protocols, the use of PVP is common to facilitate the capture of sperm cells by embryologists.As the movement of cells in PVP is considerably slower compared to their natural movement, we adapted our model training accordingly.Specifically, we retrained our model using an interpolated dataset that simulated cell movements at a reduced speed, approximately one-tenth of the original speed.Subsequently, we evaluated the performance of the model using a 1 s input sequence and predicted the locations of the cells 1.42 s into the future based on the PVP-basedsample annotations obtained from donor C.
The summarized results of the aforementioned tests are presented in Table 3.Several noteworthy observations can be drawn from these results.First, the evaluation of the dense and sparse samples yields outcomes that are highly similar to the original results.Additionally, it is notable that the movement of sperm cells in the sparse sample appears to be more linear compared to the dense sample, as evidenced by the relative error between the linear extrapolation and our model (37.8% in the dense sample vs. 7.2% in the sparse sample).Furthermore, our model demonstrates significantly improved performance in the presence of PVP, outperforming the original results by 38.3% with the same input sequence and future steps parameters.Interestingly, in the PVP scenario, the linear extrapolation method displays superior results compared to our model.This suggests that in certain  cases simple linear extrapolation may yield satisfactory predictions for future cell locations within the PVP environment.

VSL Prediction
The VSL is a critical motility feature that embryologists take into consideration during sperm analysis and selection.The VSL refers to the distance traveled in a straight line between the starting and ending points of a path taken by a sperm, divided by the total time taken for the journey. [33]uring sperm analysis, the VSL is typically obtained by tracking the sperm cell for several seconds.The assumption is that the longer the sperm is tracked, the more accurate the VSL measurement will be.However, embryologists may require a method to predict the VSL of a sperm cell as if it has been tracked for a longer period.This approach could be particularly useful in the sperm selection process, where accurate and efficient evaluation of sperm motility is essential within a short inspection time.
For examining our model in VSL prediction, we produced the prediction vector using our model with input sequence length of ¼ 70 frames and future step of k ¼ 70 frames, that is, onesecond input and output.Then, we calculated the VSL at time t, from xð0Þ to xðtÞ, and the predicted VSL at time t þ k, from xð0Þ to b xðt þ kÞ.Finally, we compared these values to the actual VSL calculated at time t þ k, from xð0Þ to xðt þ kÞ.
Figure 7a illustrates the difference between the present VSL and the predicted future VSL, compared to the actual future VSL.Our findings show that the VSL calculated using our location prediction method is much closer to the actual future VSL, as opposed to using the current location of the sperm.In Figure 7b,c, we provide a closer look at the VSL comparison at specific time frames, t ¼ 1.5 s and t ¼ 3 s, respectively.In both plots, the linear regression lines demonstrate an improvement in the VSL predicted at times t ¼ 2.5 s and t ¼ 4 s, respectively, using our model location estimation compared to the actual VSL calculated at these times.These results have the potential to enhance semen analysis applications and improve the efficiency of sperm selection by embryologists in the ICSI process.

Linearity Prediction
We compare our model and the linear extrapolation error in relation to the LIN (linearity) motility feature, defined as the straightline movement of the sperm cell with no lateral movements and calculated by VSL/VCL ratio. [1]Figure 8 presents our model results compared to the linear extrapolation results as a function of LIN. Figure 8a shows the minimum and maximum mean error difference between the linear extrapolation and our model jb xðt þ kÞ À b x l ðt þ kÞj, comparing sperm cells with low LIN (under the 20 th percentile) and high LIN (over the 80 th percentile).As expected, the error difference in the high-LIN sperm cells is much lower than the low-LIN sperm cells error difference, since the linear extrapolation will be more accurate for sperm cells with linear motion.The maximum error difference in the high-LIN sperm cells is in most cases lower than the minimum difference in the low-LIN sperm cells.
Figure 8b compares our model and the linear extrapolation mean error as a function of the sperm cell LIN motility feature for input sequence length of 700 ms and future steps of 700 ms.We notice that the mean error between the predictions and ground-truth values is larger for sperm cells with low LIN, and this error is notably lower to that of the linear extrapolation estimation.However, as the LIN value of the sperm cells increases, the difference between the prediction and linear extrapolation error decreases.This decrease is to the extent that the error of the linear extrapolation is lower than that of our model for very high LIN values.
As mentioned, the sperm-cell LIN-motility feature is one of the basic key parameters, upon which embryologists choose whether to use a specific sperm cell in the ICSI process.Estimating the linearity of a sperm cell as quickly as possible using its predicted future location can be of great benefit for embryologists searching for potent sperm cells in a semen sample.Moreover, it can be used to assess a larger number of cells in a small field of view faster than CASA algorithms, [27] which do not predict the future sperm locations.

Real-Time Implementation
Video S2-S14, Supporting Information, present example videos as detailed in Table 4.
For visualization purposes, all the videos are presented in "slow motion", at 15 fps, compared to the actual video frame rate, which is 70 fps.In each video, we tracked a sperm cell for different n steps, that is, X r tÀn,t ¼ ½x r ðt À nÞ, : : : , x r ðtÞ, marked as the red scatter points in the video.The orange point in each video marks the current location x r ðtÞ: Blue scatter points indicate our model prediction b x r ðt À n þ kÞ, : : : , b x r ðt þ kÞ based on the input sequence X r tÀn,t .The cyan point in each video marks the current location that is predicted using our model b x r ðtÞ.In each frame, we can see the time passing from the beginning of the video in seconds (top-left corner).Scale bar of 5 μm is shown at the bottom-left corner.Starting from frame t ¼ n þ k, we can see the error rate as calculated in Equation (15) (cyan).The inference time using the GPU mentioned in the Experimental Setup section is 5 À 6 ms.Thus, when predicting a future location, 0.7 À 1.1 s into the future, the model is inferring in an effective time frame.

Conclusions and Discussion
We designed a workflow model based on an LSTM network that learns the sperm cell motion and predicts its future location based on a given input sequence.Our dataset, constructed from manually annotated swimming sperm cell locations, and a unique loss function, were used to train and evaluate our model.
This study demonstrates the predictive performance of our model by evaluating the mean error with respect to the ground-truth locations, considering variations in the input sequence length, future steps, frame rates, magnification, and cell-medium conditions.The comparison of our results to the linear extrapolation prediction shows not only that our model outperforms in location accuracy in most cases, but also that in some cases the linear extrapolation produces relatively good results.This notion can be used for naive approach, which can be applied on more primitive devices with relatively simple and fast calculations.Moreover, we present the possibility to predict the sperm cell linearity by comparing our model prediction to the linear extrapolation prediction, which is an important aspect in the sperm cell selection process.
Although only 501 sperm cells were used to train and evaluate our model, our dataset is spread over a large number of frames, 114 229.Our annotation method can be used in the future to perform additional path annotations, allowing our model to be further trained and evaluated on a broader dataset.
We have predicted the sperm location to the extent of 1.43 s in the future.For predicting a large number of future steps, the error increases, which can be significantly large in relation to the sperm-cell dimensions.This work can be used as a baseline for future possibilities and directions of research.First, the LSTM network was chosen due to its known advantages over other RNN networks.Currently, there are many variants of the LSTM network itself, [34][35][36] in addition to other networks such as transformers.Transformers networks are similar to RNN and designed to process sequential data.However, unlike RNN they can process the entire input at once making them more efficient than LSTM for some tasks. [37,38]Replacing the baseline LSTM network used in our model with these networks might improve the performance of the model.
Second, utilizing the location annotations, we can use a model that does not only predict based on the location signal, but also uses the frame images themselves to predict the future location and the future image.Networks, such as future GAN [39] and convolutional LSTM, [40] have demonstrated good results in next-frame video prediction and might be applied in this case to not only improve the location prediction, but also extract other important features from the sperm cell movement that would be relevant to current and future applications.These networks might require a larger dataset, which can be produced relatively quickly using our annotation method.
Furthermore, as discussed in the introduction, extensive research has been conducted on modeling the biophysical movement of sperm cells.Integrating these findings with our work can not only enhance the accuracy of location prediction but also provide a comprehensive understanding of the motility of sperm cells across various morphologies, mediums, and concentrations.
Eventually, integrating our model with an automated selection system for sperm cells based on their motility, morphology, or DNA fragmentation [12][13][14][15][16][17]41,42] is another important step in the path for an accurate, efficient, fully automatic sperm selection framework for ICSI, which eventually will lead to higher success rate.

Figure 1 .
Figure 1.Dataset motility features distribution.a) Distribution of the VSL and VCL motility features for all cells in the dataset.b) Distribution of the LIN motility feature for all cells in the dataset.

Figure 2 .
Figure 2. Position prediction process.n points from the input sequence (red) are obtained from the sperm cell path x r ðt À nÞ, : : : , x r ðtÞ and normalized; future position xðt þ kÞ is obtained and normalized; signal denoising; input sequence is inserted into the LSTM model; the predicted position x p ðt þ kÞ and the predicted trajectory velocity v p ðt, t þ kÞ are calculated; b xðt þ kÞ is calculated using weighted average between the predicted values and a linear extrapolation; denormalize to original coordinates b x r ðt þ kÞ.

Figure 3 .
Figure 3. a) Visual representation of a multilayer LSTM model predicting sperm cell location.Location points enter to the multilayer LSTM network.The last hidden vector of the last LSTM layer is used as an input to the FC layer that extracts the predicted location vector x p ðt þ kÞ and the predicted velocity vector v p ðt þ kÞ.b) LSTM block.Previews cell state c tÀ1 and hidden state h tÀ1 are processed with the current input sample point x t producing the current cell state c t and current hidden state h t .

Figure 4 .
Figure 4. Error rate heat map.Error rate a) and relative error rate b) as function of the Input sequence length n ½s and the future steps k ½s.

Figure 5 .
Figure 5. LSTM model versus extrapolation.a) Our LSTM model predicted location mean error and linear extrapolation mean error as function of input sequence length and future steps.b) Zooming into the red line section in (a) at n ¼ 571 ms, clearly demonstrating the superiority of our model over regular linear extrapolation.c) Relative error for n ¼ 571 ms.

Figure 6 .
Figure 6.Evaluation of model performance under varying frame rate input and magnifications.a) Comparison of mean error rates between our model and linear extrapolation for different frame rates, measured as a function of future steps with an input sequence of 1s.b) Assessment of our model accuracy for different magnifications and varying levels of additive white Gaussian noise using different SNR values, for an input sequence of 1 s and a future step of 1 s.

Figure 7 .
Figure 7. VSL prediction versus present VSL.a) Average difference between VSL at t ½s and VSL prediction for t þ 1 ½s calculated at t ½s versus the actual VSL calculated at t þ 1 ½s.b) and c) A closer look at the predicted and actual VSL values at specific times (t ¼ 1.5 s and t ¼ 3.0 s, respectively) compared to the actual VSL calculated at later times (t ¼ 2.5 s and t ¼ 4.0 s, respectively).

Figure 8 .
Figure 8. LIN motility features comparison.a) Comparing the minimum difference and maximum difference of the mean error between our LSTM model and the linear extrapolation as function of future steps, while separating sperm cells with low LIN and sperm cells with high LIN.b) Comparing our model and linear extrapolation mean error as function of the sperm cell LIN motility feature (linearity) for input sequence length of n ¼ 714 ms and future steps of k ¼ 714 ms.

Table 3 .
Results summary: donor C, different sperm cell concentration, and PVP medium.

Table 4 .
Input sequence length and output future steps per video.