Application of computer-generated images to train pattern recognition used in semiquantitative immunohistochemistry scoring

. Application of computer-generated images to train pattern recognition used in semiquantitative immunohistochemistry scoring. APMIS. 2022; 130: 26 – 33. This study aimed to clarify whether the pattern recognition involved in scoring proliferation fractions can be trained by abstract computerized images of virtual tissues. Twenty computer-generated images with randomly distributed blue or red dots were scored by 12 probands (all co-workers or collaborators of the Institute of Pathology, University of Bonn). Afterward, the probands underwent a training phase during which they received an immediate feedback on the actual rate of positivity after each image. Finally, the initial testing series was rescored. In a second round with 15 different probands, 20 Ki-67 immunohistochemistry images of tonsil tissue were scored, followed by the same training phase with computer-generated images, before the immunohistochemistry slides were scored again. Paired t-tests were used to compare the diﬀerences in mean rates pre-and post-training. Concerning computerized images, untrained pro-bands scored the percentages of positive dots with a mean deviation from the true rates of 8.2%. Following training, the same testing series was scored signiﬁcantly better with a mean deviation of 4.9% (mean improvement 3.3%, p < 0.001). Scoring real immunohistochemistry slides, the training with computerized images also improved correct estimations, albeit to a lesser degree (mean improvement 1%, p = 0.03). Abstract computerized images of virtual tissues may be a useful tool to train and improve the accuracy of pattern recognition involved in semiquantitative scoring of immunohistochemistry slides. As a side results, this study highlights the value of computer-generated images to verify the performance of image-analysis software.

Immunohistochemistry has become the most important ancillary technique for surgical pathologists [1]. Next to qualitative diagnostic markers that aid in tumor typing, a growing number of markers demands a quantitation of expression either in intensity (e.g., Her2) or number of positive cells (Ki-67, hormone receptors) [2][3][4]. Hence, a recurrent task of pathologists is quantitation of immunohistochemistry slides. Accurate determination of these scores is often crucial for further therapy planning. For instance, the grade of neuroendocrine tumors is defined by either the mitotic count or the proliferative fraction (Ki-67) [5]. Clinicians may consider the proliferative activity of a tumor also in other tumor entities to better estimate the prognosis or to tailor adjuvant therapy [6][7][8][9]. With the growing application of personalized approaches in oncology, an increasing number of tests (e.g., PD-L1, tumoral T-cell infiltrate) will necessitate quantitation or semiquantitation by pathologists.
Immunohistochemistry itself is strictly not a quantitative technique; hence, it shows marked variability of its results. These may depend on preanalytical or analytical factors, but probably more important is the influence of the observer and the assessment method and many studies have noted significant rates of variability [10][11][12][13][14]. It is widely believed that computerized analyses of immunohistochemistry images will eventually standardize immunohistochemistry scores and a variety of software solutions for quantitation are already available. Presently, the majority of surgical pathologists that do not work in a digitized laboratory will adhere to the conventional analog techniques of a) either counting negative and positive cells ('accountant approach'), which is cumbersome and tedious or b) simply estimating the rate of positive cells ('eyeballing' or 'cowboy approach') which is faster but may be inaccurate [15]. Due to the increasingly pressing work loads of contemporary pathologists, most will practice a mixed approach of both, counting a small area and eyeballing the overall case. Clearly, estimating rates of positivity by eyeballing can be a very efficient way to read an immunostain.
The authors wondered whether the pattern recognition required for reading, for example, a proliferation fraction could be trained by computersimulated patterns, by presenting randomly generated grids with negative and positive dots. Also, we addressed the question whether training with computer-generated images would result in better estimates in scoring real immunohistochemistry slides. This study presents and validates a very simple computer program to evaluate and train recognition of positivity rates.

Computer platform and program to generate artificial images
In his youth, the first author had used a Sinclair ZX Spectrum (48k), one of the most popular home computers in the UK and Europe in the early to mid 80ties of last century for hobby programming [16]. For this project, the ZX Spectrum emulator FUSE (http://fuse-emulator. sourceforge.net) that runs on an Apple Macbook pro (2015) was used to write a program that generates different screens with randomly assigned positive and negative dots, reminiscent of cells with different proliferation fractions.
The program was written in Sinclair Basic (listed in Fig. 1). First, a small solid ball to represent a nucleus or a cell is added to the user-defined graphics (subroutine lines 2000-2070); then, the user is asked to define the range (minimum, maximum) of positivity rates to evaluate. Afterward, the program establishes a three-dimensional array to encompass 10 screens with randomly generated positivity rates in the user-defined range, distributed over the screen at random again (subroutine lines 2200-2330). The program ensures that the exact number of positive cells calculated will be displayed. Finally, the main routine of this program (lines 130-170) will present the user a succession of these 10 screens and individually ask for the estimated percentage of positive 'cells' (Fig. 2A). Afterward, the program presents a line-wise comparison of real and estimated numbers (Fig. 2B). The program code is provided as a snapshot file upon request.

Probands
Twelve voluntary probands consisting of seven histopathology residents in training, three basic scientists, and two visitors with a non-medical background were analyzed. In the second round, another 15 voluntary probands consisting of 12 beginners (<3 years of histopathology training) and 3 histopathology consultants participated. No ethics approval was necessary to conduct this study.

Validation of training effect in correctly estimating positivity rates of computer images
The testing range of the program was set to 0 to 100% positive cells. To speed up testing and training, Fig. 1. Listing of the ZX Spectrum program that generates virtual histology images. This program was used to create random screens of positive and negative cells, written in Sinclair Basic. screenshots were compiled in a PowerPoint presentation (File S1). The 20 test screens displayed various rates of positive cells, ranging from 2 to 93% (mean 40%).
The proband's results of the first 20 cases were recorded in SPSS (v25), followed by a training phase of evaluating 30 cases, which was not recorded. In the training phase, the correct percentage of red cells was displayed after each individual image, upon pressing a key to give the proband an immediate feedback. Eventually, the first 20 cases were scored again and the results were recorded.

Validation of training effect in correctly estimating positivity rates of real Ki-67 immunohistochemistry images
To check, if training semiquantitative image analysis with computer-generated images does result in improved estimations in real-world immunohistochemistry images, a second round of this experiment with a different set of probands was conducted.
A series of 37 images (63x lens, Olympus BX43 microscope, digital 'DP20' camera) were taken from Ki-67 immunohistochemistry stainings of tonsil tissues retrieved from the archives of the Institute of Pathology, University Hospital Bonn. Ki-67 immunohistochemistry was conducted on a Medac Autostainer platform (clone Mib-1, 1:100, heat-induced antigen retrieval with citrate buffer pH 6.0). Twenty images were visually selected for further study and were chosen to represent low, intermediate, and high proliferative fractions. To define the ground truth, the proliferation rates of these images were determined by two images analysis computer programs, the commercialized and clinically validated 'Ki-67 Quantifier' (VMScope GmbH, Berlin, Germany) [17], and the widely used academic 'Qupath' software package [18].
The probands showed these 20 immunohistochemistry images sequentially (approximately 30 seconds each) and asked to record the positivity rate individually. Then, the training phase with a set of 30 computer images followed, in which the probands were asked to guess and discuss their estimations before the correct value was revealed. Directly afterward, the initial series of 20 immunohistochemistry images was re-rated, again with recording of the individual ratings in fast succession (File S2).
Verification of positivity rates by computerized image analysis using the software packages Qupath and Ki-67 quantifier To verify, if even primitive computer-generated images could be used to validate image analysis software, a set of 10 computer-generated images were evaluated by the software packages Qupath and Ki-67 quantifier. To analyze the images with Qupath, the colors for DAB and hematoxylin were adjusted in the image data to allow correct detection (DAB: 0.048, 0.706, 0.706; hematoxylin 0.996, 0.065, 0.067). Then, the positivity rate was measured by analyzing fast cell counts. For the image analysis with Ki-67 quantifier, which is a software package optimized for the detection of DAB-stained nuclei, the colors of the images were optimized as well (red turned to brown, cyan to blue).

Statistical analysis
Paired t-tests were used to compare the proliferation rates estimated from the pre-and post-training data of all participants. Nominal p-values were used to evaluate statistical significance, defined as p < 0.05. Due to the exploratory nature of the experiments, no procedures were applied to correct p-values for multiple comparisons. All statistics were calculated with SPSS v25 (Armonk, NY: IBM Corp., USA) and the R Language and Environment for Statistical Computing, version 4.1.0 (R Core Team 2021).

Performance of the BASIC program
Even though the screen presentation is very schematic, the random presentation of blue or red cells on this matrix of 640 cells is sufficiently complex to discourage exact counting and to promote 'eyeballing' estimates ( Fig. 2A). The program itself may be used to evaluate 10 screens and then presents a line-wise comparison of real and estimated data (Fig. 2B). The Microsoft PowerPoint presentation with the 80 images (testing/training/re-testing) was accomplished by all participants in about half an hour.

Estimates of percentages on computer images (preand post-training)
For the twelve participants of the first round, the maximum deviation of the estimate from the real value was 36 percentage points, and the mean deviation (overall 20 images) ranged from 6.15 to 12.90 percentage points (mean 8.2). The participants reported that estimating the extremes was subjectively easier than the middle region from 10 to 90%. This is confirmed by the statistics, showing a deviation of 2.57 percentage points for images with <10% or >90% positive dots, compared with 10.05% points for the middle range (10-90%).
Following training, the overall estimation performance was markedly improved. The maximum deviation was 29 percentage points, and the mean deviation ranged from 3.6 to 6.5 percentage points (mean 4.9). The difference of means (pre-vs posttraining) was highly significant (Fig. 4A, p < 0.001).
The individual analysis of the participants' estimations revealed differences in their learning success. The paired t-test demonstrated significant (p < 0.05) differences of mean estimations pre-and post-training for four individuals, three more reached p-values ranging between 0.05 and 0.1, and five still had closer estimates post-training but had p-values > 0.1.
Measurements and estimates of percentages on real immunohistochemistry images (pre-and post-training with computer images)  Fig. 3B. To define the ground truth for further analysis of proband data, the mean proliferation rate of both algorithms was used. Thus, the proliferation rates of the 20 immunohistochemistry images ranged from 2 to 81% (median 15%).
For the 15 participants, the maximum deviation of their estimates from the real values was 32 percentage points, and the mean deviation (overall 20 images) ranged from 3.6 to 10.3 percentage points (mean 6.5). Following training, the overall performance of estimations was slightly improved with lower deviation rates in 11 participants. The mean deviation ranged from 2.7 to 9.4 percentage points (mean 5.5), which was significantly lower than pretraining (Fig. 4B, p = 0.034). In the group of residents, more variation, but also more marked improvements were seen than in the group of consultants, whose ratings were more stable (Fig. 4B, see p13-15).

Verification of positivity of computer-generated images using the software packages Qupath and Ki-67 Quantifier
In all ten computer-generated images, the absolute correct number of cells (640) and positive 'cells' was detected by QuPath. The Ki-67 quantifier detected between 639 and 643 cells per image (mean 640.2), a minimal variation from the ground truth of 640 that did not impact the correct percentual rate of positive cells. In all ten computer-generated images, the correct percentual rate of positive 'cells' was detected by both programs.

DISCUSSION
Determining fractions of positive cells in immunohistochemistry slides is a daily task of surgical pathologists, and different approaches ranging from accurate counts to rough estimates by eyeballing are being practiced. This study employed computergenerated images displaying positive and negative dots to evaluate and train the capacity of observers to estimate the correct fraction of positive dots. To our knowledge, this new concept of training pattern recognition of pathologists using computer emulated images as a model of tissues has not been established so far but ought to be of general interest to surgical pathologists.
The statistical evaluation of a dozen participants who scored 20 screens, underwent training, and rescored the initial series of 20 screens clearly validates a positive training effect, allowing them to improve their accuracy of estimating percentages by an average of 3.3 percentage points. Also, the maximum deviations from the true values decreased after training. Considering this finding, a recurring and obvious question is, if this observation, based on highly artificial and crude computer graphics, has any value in the sense that it translates into improved estimations of positivity rates of real immunohistochemistry slides. Therefore, a second round with a different set of scorers was conducted, who evaluated individually a series of 20 real Ki-67 stainings of lymphatic tissue, then underwent a training phase with computerized images, and finally rescored the initial IHC slides. Again, following the training phase with computer images, the second round of Ki-67 estimates was on average closer to the correct values that had been determined before using two different images analysis programs. This verifies that this type of training with artificial images had indeed increased the accuracy of estimates even on real immunohistochemistry slides, which has, to our knowledge, not been shown before. We are not entirely surprised by this finding, as the principles of pattern recognition either applied to abstract color dots on the computer screen or stained cells under the microscope ought to be identical. We are confident that this novel approach provides a simple way for pathologists to train their accuracy in these estimations. It is also obvious that the rate of improvement in real IHC slides is lower (1%) than in scoring computerized slides of the first series (3.3%). We assume that this is in fact due to a 'loss in translation' from crude computer graphics to real-life IHC images, which are more complex to read: cells differ in size, are unevenly distributed, partially overlap, the staining intensity varies-all these factors contribute to the difficulty of estimating slides quickly and correctly. Future versions of this training program will have to simulate human tissues more realistically, and it will be of interest to see, if this further increases the training success, as verified on real slides. These percentual differences (pre-and posttraining) may appear small, but may in fact impact patient care, for example in neuroendocrine tumors, in which the proliferation rates are crucial to determine the tumor grade [5]. Although the number of participants in this study is rather small, which precludes more detailed stratified analyses, it has already proven that pattern recognition can be trained in an abstract form to improve correct percentual estimation of 'positive' items.
The data set generated from the abstract images also allowed to identify the problematic zone, as estimations in the extremes (<10% or >90%) were significantly closer to the real numbers as in the intermediate zone, which is also reported for real slides of Ki-67 stains [15].
As intra-and interobserver variability is a proven obstacle in semiquantitative immunohistochemistry, every tool to increase accuracy of IHC reading ought to be welcome [19,20]. Even though the graphical user interface provided by this program running on an emulated historical home computer is rather crude and obviously over-simplified, it is already sufficient to allow a successful training of correct estimates of positive cells. The participants also had fun to score the screens provided and particularly liked the simple charme of the outdated graphics, which reminded them of video games of the 80ies. So far, very few examples of medical software for the ZX Spectrum have been published and this program is a rather late addition to this stock [21,22].
With the increasingly widespread introduction of digital pathology, and applications of artificial intelligence to image analysis, many believe that tasks of quantifications by humans will soon be history [23]. This may, or may not be so. Studies comparing human and computer-based expression analysis demonstrate the general feasibility of automated analysis, yet also provide reasons for caution, as results may differ and supervision by an experienced pathologist is still warranted [24][25][26]. In the interpretation of biomarker expression, the correct selection of the regions of interest (ROI) is crucial, especially in heterogeneously expressed biomarkers and future computer algorithms may have to accommodate this and provide not only values of general percentual positivity but also detect hot spot regions [27].
For the time being, pathologists have to read biomarkers from glass slides in their daily routine work and any training tool to help them accomplish this task better should be welcome. Also, a couple of images analysis applications have been developed, for example, for Ki-67, to date and the first benchmark studies have been carried out to validate these on human tissues [28,29]. These studies demonstrate a high rate of concordance, which we can confirm, but we also found a minimal systemic difference between the two platforms used here. This may possibly be adjusted by changing detection parameters. But a remaining problem in these comparative studies is the missing gold standard, so the resulting data from different algorithms are usually only correlated with itself and possibly clinical outcome data. Images of virtual tissues with an exactly predefined rate of positivity, as exemplified in this prototypical study, may be a valuable tool to train and validate algorithms for tissue analysis. It is assuring to see that all ten computer-generated images read by the widely used academic quantitative software package Qupath and by the Ki-67 Quantifier have been correctly quantified.

CONCLUSIONS
In summary, this study exemplifies that the pattern recognition capabilities needed to score rates of positivity that are so crucial in evaluating actual immunohistochemistry slides can be trained and improved by computer-generated images of tissues. Further improvements of this program on a contemporary computer platform are under development to display a more natural variability of cell distribution, shape, and size and may also incorporate variation of color intensity as is seen with real immunohistochemistry stains. Also, other staining qualities as membranous or cytoplasmic staining may be addressed in future versions. Other common diagnostic problems, as discrimination of tumor cells from inflammatory, or stroma cells may also be implemented in future emulations, to allow for a more realistic training of life-like diagnostic situations.
We are grateful to the histopathology residents and consultants of the Institute of Pathology of the University of Bonn for their time and enthusiasm for this project and for being voluntary probands. We thank Kai Saeger (VMScope, Berlin) for providing us with Ki-67 measurements of the immunohistochemistry images. This research did not receive grants from any funding agency in the public, commercial, or not-for-profit sectors.

CONFLICTS OF INTEREST
We declare to have no conflict of interests.