Online variational inference on ﬁnite multivariate Beta mixture models for medical applications

Technological advances led to the generation of large scale complex data. Thus, extrac-tion and retrieval of information to automatically discover latent pattern have been largely studied in the various domains of science and technology. Consequently, machine learning experienced tremendous development and various statistical approaches have been sug-gested. In particular, data clustering has received a lot of attention. Finite mixture models have been revealed to be one of the ﬂexible and popular approaches in data clustering. Considering mixture models, three crucial aspects should be addressed. The ﬁrst issue is choosing a distribution which is ﬂexible enough to ﬁt the data. In this paper, a model based on multivariate Beta distributions is proposed. The two other challenges in mixture models are estimation of model’s parameters and model complexity. To tackle these challenges, variational inference techniques demonstrated considerable robustness. In this paper, two methods are studied, namely, batch and online variational inferences and the models are evaluated on four medical applications including image segmentation of colorectal cancer, multi-class colon tissue analysis, digital imaging in skin lesion diagnosis and computer aid detection of Malaria.


INTRODUCTION
Over the past decades, fast progress of computational power and data storage yield a great deal of complex data and machine learning methods experienced considerable development to recognize critical information from data efficiently and automatically with minimal human interaction. In order to cover the wide variety of data such as text, image and video and problem types exhibited across different domains, a diverse array of machine learning algorithms have been developed [1]. Many algorithms focus on image processing and computer vision as techniques of electronics engineering.
A critical scientific and practical goal to the majority of the algorithms is to characterize their capabilities and robustness. Supervised learning systems have been widely used over the past years. Deep learning platforms [2,3] have been demonstrated to outperform previous supervised machine learning techniques in several fields. Convolutional neural networks [4] and deep belief networks [5] are some examples of currently remarkable techniques in some applications such as image This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2021 The Authors. IET Image Processing published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology analysis [6], emotion detection [7], object detection [8][9][10][11], synthetic aperture radar image analysis [12,13], remote sensing [14], Internet of Things [15], smart cities [16]. Similarly, modern medical imaging have witnessed admirable progresses and became one of the attention-grabbing domains in research and technology. Consequently, statistical modelling has been applied successfully in this domain and achieved state-of-theart performance in image segmentation and computer-aided detection (CAD) to assist professionals in the interpretation of medical images, digital pathology and other medical datasets [17]. Due to the increasing digitization in medical image results [18] and prompt progression in artificial intelligence (AI) and machine learning (ML), various methods have been proposed [18]. However, the nature of medical data and some needs of healthcare team in making decision led to a limited success in applying the current algorithms in routine clinical cases, [19]. It should be noted that with some deep learning platforms we can achieve good results in classification tasks in various medical domains such as brain image analysis [20], pathological image analysis [21], cardiac image analysis [22], breast histology images IET Image Process. 2021;15:1869-1882. wileyonlinelibrary.com/iet-ipr

FIGURE 1
Four examples of multivariate Beta distributions analysis [23], blood cell analysis [24], liver tumour analysis [25]. However, they may cause some failure as they are unpredictable and unexplainable [26][27][28]. It should be emphasized that deep learning models need large scale labelled data for training and the publicly available datasets are limited as confidentiality is a principle rule in healthcare. However, this is not the only issue, but medical data labelling is a great obstacle as it could be performed just by professional physicians and need sophisticated amount of budget, time and skill. It is noteworthy that the nature of medical data is heterogeneous and to arrive at a better decision, the model should have the potential and ability to deal with various types of data such as patient history, images, videos and signals, simultaneously. These characteristics and demand, motivated us to focus on unsupervised models of machine learning as label-free approaches. Clustering methods especially finite mixture models are one of the best known methods to model heterogeneous data which includes multiple distributions [29]. The first challenging aspect which should be carefully addressed is choosing the most proper distributions that best repre-sent the corresponding components of mixture accurately when modelling data. Gaussian mixture models (GMM) have been widely adopted in various applications [30]. However, in recent works other alternatives such as Dirichlet [31,32], generalized Dirichlet [33][34][35] demonstrated considerable flexibility and high potential to describe non-Gaussian data. Hence, in our paper we focus on multivariate Beta mixture models which are developed based on a very flexible distribution which does not have a constant shape and is appropriate to be used to model data skewness. Furthermore, considering its bounded nature, it fits better compactly supported data. Figure 1 illustrates the high potential of this distribution.
To design a clustering algorithm, the parameters estimation is a crucial step and has a significant impact on the performance of model learning. The majority of parameter estimation methods apply either deterministic or Bayesian techniques. The former one is based on classic maximum likelihood (ML) inference and optimizing the model likelihood function via expectation-maximization (EM) [36] framework. However, this method is sensitive to initialization and carry disadvantages such as over-fitting. To avoid such drawbacks, Bayesian techniques have been proposed. In this improved method, a prior knowledge is applied in a principled way and the parameter uncertainty is then marginalized by Laplace's approximation or Markov chain Monte Carlo (MCMC) simulation techniques [37,38]. Unfortunately, we face some issues in Bayesian inference. For instance, Laplace's approximation is generally imprecise and MCMC techniques are computationally expensive. Recently, several research efforts focused on variational inference [39] as a preferable and efficient alternative technique for the learning of statistical models. Indeed, it can be expressed as an effective compromise between deterministic and Bayesian approaches. Variational inference is based on approximating the model posterior distribution which is achieved by minimizing the Kullback-Leibler (Kl) divergence between the true posterior and an approximating distribution. Another crucial issue when using mixture models is defining model structure or the best number of mixture components that describes the data perfectly without over-fitting or under-fitting. Some model selection techniques such as MML or MDL [31,40,41] have been considered. However, they are time-consuming since they have to evaluate a given selection criterion for several numbers of mixture components and such high computational cost limited their applications. One of the advantages of variational inference is that it automatically determines the number of mixture components as part of the Bayesian inference procedure [42,43]. Variational learning can be performed online [44] which is mainly motivated by the fact that such algorithm allows data instances to be processed in a sequential way, which is important for large-scale data and real-time applications. This technique is significantly faster than traditional variational learning. In this paper, we propose two novel algorithms for batch and online variational learning based on multivariate Beta mixture models. We evaluate the performance of our proposed frameworks by exploring challenging medical applications and the results are compared with batch and online variational learning for Gaussian mixture models.
The structure of the rest of this paper is as follows; Section 2 is devoted to the description of finite multivariate Beta mixture model. Sections 3 and 4 describe the batch and online variational learning algorithms, respectively. We present the experimental results in Section 5 considering four real-world applications. Finally, we conclude in Section 6.

FINITE MULTIVARIATE BETA MIXTURE MODEL
In this section, we give a brief description of finite multivariate Beta mixture models. Let us assume that an observation following a multivariate Beta (MB) distribution [45,46] is defined by ⃗ X i = (x i1 , … , x iD ) as a D-dimensional vector where all its elements are positive and less than one. Γ(.) denotes the Gamma function. The probability density function of MB is expressed by (1).
Let us consider a set of N independent identically distributed vectors  = { ⃗ X 1 , … , ⃗ X N } which are generated from multivariate Beta mixture models and composed of M different clusters. Thus, multivariate Beta mixture model is represented by: where ⃗ = ( 1 , … , M ) is the set of mixing coefficients with two constraints ∑ M j =1 j = 1 and j >= 0. ⃗ j and j are shape parameter and weight of component j where j = 1, … , M . So, the likelihood function for N samples is, Four examples of multivariate Beta mixture models (MBMM) are shown in Figure 2.
In mixture models, we define an auxiliary variable  to allocate each sample to one of the M components. Thus, we introduce ⃗ Z i = (Z i1 , … , Z iM ) where Z i j is a binary random variable such that Z i j = 1 if ⃗ X i belongs to the specific cluster j and 0, otherwise. The distribution of  = { ⃗ Z 1 , … , ⃗ Z N } as a set of "membership vectors" is specified by (4) in terms of the mixing coefficients ⃗ [47].
Thus, the conditional probability of the data given  is,

BATCH VARIATIONAL LEARNING
Variational approaches have been widely applied previously to approximate posterior distributions of a variety of statistical models. In this section as the first step, we develop Our main objective is to develop an optimized method which is capable enough to estimate the parameters of mixture model and determine its structure and complexity simultaneously.

Prior specification
A crucial challenge in the case of variational learning is placing prior distributions over parameters. To simplify this approach, we consider a conjugate prior for the ⃗ parameters. Unfortunately, a conjugate prior does not exist. In this case, we adopt a Gamma prior as an approximation assuming that the parameters are statistically independent [48,49]. So, the probability density function of jl is described by (6). u jl and jl are positive hyperparameters.
The model parameters ⃗ are given by: Thus, the joint distribution of all random variables is given by:

Learning algorithm
In order to estimate the parameters of model and select a correct number of components, we estimate the mixing coefficient ⃗ by maximizing the marginal likelihood p( | ⃗) expressed by (9).
As the marginalization of this equation is intractable, we apply variational inference [50] to calculate the lower bound on p( | ⃗). The variational lower bound  of the logarithm of the marginal likelihood p( | ⃗) is defined by: where Θ = {, ⃗ , ⃗} and Q(Θ) is an approximation to the true posterior distribution p(Θ | , ⃗). This approximation is determined by computation of KL divergence between Q(Θ) and p(Θ | , ⃗) defined by (11).
The KL divergence is the representation of the dissimilarity between the true posterior and its approximation. As KL(Q || P ) ≥ 0, the KL(Q || P ) is zero when Q(Θ) = p(Θ |  ). Considering above mentioned equations, it is obvious that (Q) ≤ ln p( | ⃗), thus (Q) is a lower bound on ln p( | ⃗). So, by maximizing the lower bound, the KL divergence is minimized and hence the true posterior distribution is approximated. Consequently, we consider a restricted and tractable family of distributions Q(Θ) which are flexible enough to properly approximate the true posterior distribution. We apply common method, namely, mean field theory to adopt factorization assumptions for restricting the form of Q(Θ). Subsequently, the posterior distribution Q(Θ) can be factorized [48] such that, We find the variational solution for (Q) with respect to each of the parameters to maximize the lower bound and for a specific parameter s, the optimal solution can be expressed by: By taking the exponential from both sides of this equation and normalizing, we can get: < ⋅ > i≠s is the expectation with respect to all the parameters other than Θ s . The solutions for the optimal variational posteriors as derived in Appendix A are given by: whereR j is as follows based on [51] and its calculation is presented in Appendix A.
(.) and ′ (.) in the above equations represent the digamma and trigamma functions. The expectation of values mentioned in the equations above is given by, In variational learning, we trace the convergence systematically by monitoring the variational lower bound during the reestimation step. Indeed, at each step of the iterative updating procedure, the value of (Q) should not rise. Thus, we terminate optimization when the lower bound increases more than a threshold compared to previous estimated value. The lower bound in (10) is evaluated as follows which is explained in details in Appendix A : The complete algorithm of batch variational learning can be summarized in Algorithm 1.

ONLINE VARIATIONAL LEARNING OF MULTIVARIATE BETA MIXTURE MODELS
In this subsection, we extend the classic variational inference approach [49] to online settings for learning multivariate Beta   (23) to (27). 7. The variational M-step: 8. Update Q() and Q(⃗ ) by estimating r t j from (16) and (17). 9. until Convergence criterion is reached. mixture model by adopting the framework proposed in [44] as in real-world, observations arrive in an online manner. Thus, we assume that a specific amount of data are observed defined by t , such that their corresponding lower bound is defined by [52]: In this method, the current variational lower bound expressed by (29) is maximized consecutively. To explain more in detail, let us consider a set of observations { ⃗ X 1 , … , ⃗ X (t −1) }. Then, a new observation ⃗ X t arrives and we maximize and update the current lower bound  (t ) (Q) corresponding to Q( ⃗ Z t ), while Q(⃗ ) and j is set to Q t −1 (⃗ ) and t −1 j , respectively. Thus, the variational solution to Q( ⃗ Z t ) is as follows: ]} R j is calculated in Appendix A. Then, with the application of the gradient method, we set Q( ⃗ Z t ) fixed, so that the lower bound is maximized corresponding to Q (t ) (⃗ ) and (t ) j . Therefore, the natural gradients are estimated by multiplying the gradients of the parameters with the inverse of the coefficient matrix, which is then removed so that the natural gradients for the posterior probabilities can be computed for an efficient online learning framework. Thus, we have the optimal solutions for parameters' updates: The solution for the mixing coefficient where t denotes the learning rate [53] described by (37) with two constraints, ∈ (0.5, 1] and ≥ 0.
The main idea of the learning rate is to ignore the previous wrong estimations of the lower bound and accelerate the convergence rate. Therefore, the natural gradients are as follows: Δv * (t ) where ⟨ ln jd ⟩ and < (ln jd − ln jd ) 2 > are similar to (25) and (26), respectively. When a new data point arrives, an additional distribution is added to the lower bound.
There are two constraints expressed by (41) that ensure the convergence of lower bound as the online learning framework can be considered as a stochastic approximation: Our model is described completely in Algorithm 2. We applied k-means to initialize the parameter. Consequently, the

EXPERIMENTAL RESULTS
In this section, we validate the performance of online variational learning of multivariate Beta mixture model (OVMBMM) on four strong candidates in real-world medical applications, namely, image segmentation of colorectal cancer, multi-class colon tissue analysis, digital imaging in skin lesion diagnosis and computer aid detection (CAD) of Malaria. It is noteworthy to mention that our main motivation to focus on medical applications is that advanced analytical and statistical methods provide more precise information to healthcare systems which is a valuable asset for the patient care as having more information, better understanding and improved analysis results in making proper decisions in different steps such as screening, diagnosis and treatment. The significance of machine learning in healthcare applications is enhanced specially in development of high-performance medical image processing systems. Computer-aided detection (CADe) detects clinically significant objects from medical images and computer-aided diagnosis (CADx) generally confronts with processing and analysing high dimensional datasets which is beyond the scope of human capability. It is obvious that in both of these domains, advanced clinical insights ultimately lead to improve quality of services, better outcomes, lower healthcare costs, and increased patient satisfaction. In some disciplines such as radiology and pathology, identification of abnormalities and marking the critical areas are vital to improve efficiency, reliability, and accuracy of diagnosis. Moreover, such medical testing techniques generate large scale datasets for which online variational inference is a proper modelling method. Here, we compare four algorithms, namely, batch variational multivariate Beta mixture model (BVMBMM), online variational multivariate Beta mixture model (OVMBMM), batch variational Gaussian mixture model (BVGMM) and online variational Gaussian mixture model (OVGMM) in terms of their accuracy based on confusion matrix and Jaccard similarity index for image segmentation.

Image segmentation in colorectal cancer
According to World Health Organization (WHO) reports, cancer is the second leading cause of death globally, taking the life of 1 in 6 people, accounting for an estimated 9.6 million deaths in 2018 [54]. Colorectal cancer with 1.80 million cases, has the third place in the ranking of most common cancers and secondly ranked in most typical causes of cancer death with 862,000 deaths. Early detection and treatment has a great impact on reducing cancer mortality. By early identification and avoiding delays in care, the patient is more likely to survive by responding effectively to treatments. This goal is achieved by awareness, accessing clinical evaluation, diagnosis and having access to treatment [54]. One of the valuable solutions to avoid late stages detection is screening which aims to find individuals with abnormalities, pre-cancer and not developed symptoms. As one of the main steps of screening, tissue or cell samples can be taken from intestine or stomach for determining causes of abnormalities or presence and effects of cancer. Hence, histopathology analysis has a significant role and poses critical challenge as biological tissues have various structures and precise tumour segmentation, accurate pattern detection is a tough task for humans. In recent years, since tissue specimens were digitized, automated analysis of histopathology slides [55] has become a key requirement to asses quantitative morphology, cancer aggressiveness grading and reliable differentiation of various tumour types which is reflected by the formation and architecture of glands. Subsequently, machine learning techniques have demonstrated superior performance over conventional methods [56]. Here we focus on two applications related to colorectal cancers. First, image segmentation of a publicly available collection of microscopy images of colon cancer cells from broad bioimage benchmark collection (BBBC018v1) [57,58]. The image set consists of 56 fields of view (four from each of 14 samples). Because there are three channels, there are 168 image files. The samples were stained with Hoechst 33342, pH3, and phalloidin. Hoechst 33342 is a DNA stain that labels the nucleus. Phospho-histone H3 indicates mitosis. Phalloidin labels actin, which is present in the cytoplasm. This image set is accompanied by a set of ground truth data to test automated image analysis against them. The ground truth set consists of outlines of nuclei and cells. In Figure 4, some examples of tissues and nucleus with their corresponding ground truth are illustrated.

FIGURE 5 Sample images from colon dataset
The results of validating our proposed frameworks based on Jaccard similarity index are presented in Table 1 which prove that our model outperforms the three other alternatives.

Multiclass colon tissue analysis
The second application is multiclass tissue clustering problem and categorization of a collection of textures in histological images of human colorectal cancer. The term texture refers to specific properties, pattern and structure of image regions. In medical image analysis, texture analysing methods are applied to classify tissue types. Human solid tumours are complex structures that typically several distinct tissue types are integrated in tumours consisting of non-malignant tissues, necrotic regions, tumour stroma, immune cell infiltration and islets of remaining. Moreover, tumour progression over time leads to changes in the architecture of tissue. In the digital pathology, automatic recognition of different tissue types assists to estimate the tumour/stroma ratio on histological samples and can provide quantitative and high-throughput analysis of the tumour tissue.
In this paper to assess the performance, we evaluate our models by a collection of textures in colorectal cancer histology [59,60] which is publicly available. It includes 5000 histological images of human colorectal cancer consisting of eight different types of tissue. In Figure 5, three samples of eight tissue classes are shown which enhance a variety of illumination, stain intensity and tissue textures. These classes are tumour epithelium, simple stroma that is homogeneous composition including tumour or extra-tumoural stroma, smooth muscle, single tumour or immune cells and/or single immune cell, complex stroma containing single tumour cells and/or few immune cells, immune cells including immune-cell conglomerates and sub-mucosal lymphoid follicles, debris including necrosis, haemorrhage and mucus, normal mucosal glands, adipose tissue and background without any tissue. As an important step, we extracted the feature of each image using one of the most popular techniques, namely, scale-invariant feature transform (SIFT) [61] and bag of visual words (BOVW). The general idea of this method is to represent an image as a set of features which include key points and descriptors. The keypoints of each image are invariant to geometrical transformation and illumination and descriptors are the description of these points which both are extracted by SIFT. Consequently, we construct vocabularies with key points and descriptors to represent each image as a frequency histogram of features which could be applied in image categorization to find images with similar pattern which could be differentiated by histopathological evaluation and the tissue composition could be quantified. The outputs of testing the performance of our algorithms are illustrated in Table 2 which obviously shows the superior performance of OVMBMM.

Digital imaging in Melanoma lesion detection and segmentation
As stated by WHO, 1.04 million cases of skin cancer were reported in 2018 and it was ranked as the fifth common cancer [54]. The major cause of death from skin cancer is malignant melanoma which is caused by the abnormal multiplication of cells. However, it is far less prevalent than non-melanoma skin cancers. This type of cancer is primarily diagnosed visually. After initial clinical screening and dermoscopic analysis, a biopsy and histopathological sample is analysed. Digital imaging can help to recognize and treat in its earliest stages which lead to reduce melanoma mortality as it is readily curable. Automated diagnosis and digital images of skin lesions can aid in the diagnosis of melanoma through teledermatology. The standard quality of skin lesion imaging has a great impact on early detection and results in improvement of the efficiency, effectiveness, and accuracy of melanoma diagnosis. Nevertheless, unprofessional screening results in unnecessary biopsies and excisions of benign skin lesions. However, it is difficult to distinguish earlystage melanoma from benign skin lesions with similar structure which may lead to missing positive cases, useless clinical advanced examinations and misclassifying the benign and malig-  nant melanoma. Thus, the expertise of the examiner and clinical setting have significant role. Evolution of digital imaging in skin lesion diagnosis permits the early detection of atypical lesions. Therefore, unnecessary biopsies of benign tumours are decreased or avoided. Recent enhancements in computer vision, machine learning algorithms and digital dermoscopic techniques can assist in image segmentation and retrieval, facilitate follow up and reduce unbiased diagnosis and misclassification rate therefore. These admirable advantages led to gaining the attention of researchers and increasing the focus towards computer aided systems in the last few decades. To evaluate automated Melanoma region segmentation using dermoscopic images, we tested our models, on a public dataset of ISIC [62], containing 23,906 images of skin lesions with their corresponding ground truths. In Figure 6, six samples of this dataset and their ground truth are illustrated. Similar to previous experiments, we compared four models. The outputs are presented in Table3 based on Jaccard similarity index. As it is shown, OVMBMM is more accurate than the other algorithms.

Computer aid detection of Malaria
Malaria is a serious infectious disease caused by a blood parasite which is injected into the human body by female Anopheles mosquito. Considering the statistics announced by WHO, 219 million Malaria cases and 435,000 Malaria deaths were . To manage and monitor this disease efficiently, it is crucial to diagnose it promptly and accurately as misdiagnosis can lead to significant morbidity and mortality. Therefore, with the help of parasitological and clinical microscopy which is considered as the mainstay of parasitebased diagnosis, the infection could be identified and confirmed precisely. The microscopy examination of Malaria, as the most prevalent and commonly practiced method, involves visual examination blood smears to test for the presence or absence of parasite in the blood and quantification of parasitemia, specie identification and life cycle classification. However, we should bear in mind that acceptable microscopy service with consistently accurate results is time consuming and costly and depends on the qualification of experts and load of samples. WHO reported that more than 208 million patients were tested by microscopic examination in 2017. Such massive number of ongoing examinations indicates the significance of process automation in analysis of samples. In order to overcome the issues such as error-prone and timely procedure, CAD and mathematical morphology are applied as effective tools for computer aided Malaria detection. These techniques are widely used for image processing purposes and employed successfully in biomedical image analysis. However, computer vision techniques for diagnosis, recognition and differentiation between non-parasitic and infected samples represent a relatively new domain of research. In our work, we applied our models on a dataset provided by NIH including thin blood smear slide images from the Malaria Screener research activity [64]. The dataset contains a total of 27,558 cell images with equal instances of parasitized and uninfected cells.
A few examples of this dataset are illustrated in Figure 7 including six parasitized and six uninfected blood smear samples.
In this experiment, the features are extracted by BOVW and SIFT. Finally to evaluate the performance of our method, we compared the results of four models which are illustrated in Table 4 indicating that OVMBMM has more accurate outputs. It is noteworthy to mention that these results clarify that online variational learning is a robust method as physicians are analysing large amount of pathological samples.

CONCLUSION
This article introduces a novel unsupervised learning approach based on variational inference of finite multivariate Beta mixture model with the main focus on medical applications. Considering rich and various forms of medical information, artificial intelligence has a great impact on diagnosis and treatment of diseases. We developed our models based on variational Bayesian inference framework as a powerful alternative to deterministic methods such as maximum likelihood and conventional Bayesian inference, which has high computational cost. In our proposed method, convergence and simultaneously estimation of parameters and model complexity is guaranteed within an iterative process. Then, we employ online variational learning as an extension to classic method which keeps not only the advantages of previous models, but also speeds up the convergence rate significantly. Indeed, the online algorithm has a great capability to handle different demanding large scale datasets in real time.
The additive constant term includes any term which is independent of Q s (Θ s ). Q() and Q(⃗ ) are derived from the logarithm of the joint distribution p(, Θ).

A.1 Proof of Equation (16): variational solution of
As R j is intractable and has not a closed form and standard variational inference can be applied indirectly. Thus, we approximate the lower bound to obtain a closed-form expression by the second-order Taylor series expansion. The function R j is approximated about ⃗ .R j and ( j 1 , … , jD ) are notations for approximation of R j and ⃗ , respectively. The approximation of R j is proved in [32] and after replacing it byR j , optimization of (A.2) is tractable. So, the optimal solution for  can be derived by: By taking the exponential of both sides of (A.5), we will have: By normalizing the distribution, Q() is as follows where r i j are positive and sum to one.
Thus, the standard result for Q() is: A.2 Proof of equation (17): variational solution of Q(⃗ ) Considering the assumption that the parameters jl are independent, Q(⃗ ) can be factorized as: Considering a specific factor Q( i j ), the variational optimization is derived by taking logarithm of the optimized factor given by: As in the other two cases the logarithm of the variational solution Q( jl ) is given by, This approximation is also found to be a strict lower bound of  ( jl ) and, ln Q ( , 15) is the logarithmic form of a Gamma distribution. By taking exponential of both the sides, we have: Thus, the optimal solution for the hyper-parameters u js and js given by: u * js = u js + jl , * js = js − js (A.19)