Constrained tolerance rough set in incomplete information systems

National Natural Science Foundation of China, Grant/Award Numbers: 61,662,001, 61,762,002; Young and Middle‐aged Talents Training Program of National Ethnic Affair Commission, Grant/Award Number: 2016GQR06; Ningxia First‐class Construction Discipline Program, Grant/Award Number: NXYLXK2017B09; Open Foundation of Ningxia Key Laboratory of Intelligent Information and Big Data Processing, Grant/Award Number: 2019KLBD006 Abstract The tolerance rough set is developed as one of the outstanding extensions of the Pawlak's rough set model under incomplete information, and the limited tolerance relation is developed to overcome the problem that objects leniently satisfy the tolerance relation. However, the classification based on the limited tolerance relationship cannot reflect the matching degree of uncertain information of objects. In this article, we explore the influence of null values in an incomplete system, and propose the constrained tolerance relation based on the matching degree of uncertain information of objects. The proposed rough set based on the constrained tolerance relation can provide a more detailed structure of an object class through threshold. Proofs and example analyses further show the rationality and superiority of the proposed model.


| INTRODUCTION
The classical rough set model [1,2], proposed by Pawlak in the early 1980s, is a powerful mathematical tool for data analysis. The rough set theory has been widely used in pattern recognition, machine learning, decision analysis, knowledge acquisition and data mining [3][4][5][6][7][8][9][10]. In the past few decades, due to the diversity of data and different requirements of analysis purposes, the extended rough set models have been developed, such as the variable precision rough set model [11], probability rough set model [12,13], game-theoretic rough set [14,15], fuzzy rough set model [16,17], local neighborhood rough set [18] and so on.
However, there are two factors that limit the application of the rough set: firstly, the classical rough set model and most of its extensions are basically based on the equivalence relation which possesses reflexive, symmetric and transitive properties.
The equivalence relation is relatively strict condition in many practical application, and classes clustering on this relation cannot well reflect the natural characteristic of the overlapping data set; secondly, the classical rough set requires the information of processed object should be complete, however, quite a few data objects in practical applications are incomplete or inconsistent, and even with null values [19].
Many scholars have conducted research works for substitution of the equivalence relation [20][21][22][23], some scholars also describe the concept of target through multiple indiscernibility relations and propose a multi-granularity rough set model [24][25][26][27]. In these works, Skowron and Stepniuk [28] replaced the equivalence relation with the tolerance relation and proposed the tolerance approximation spaces, Skowron and Stepniuk [28] replaced the equivalence relation with the tolerance relation and proposed the tolerance approximation spaces, and Kryszkiewicz [19] defined a similarity relation in incomplete information systems. Kryszkiewicz's similarity relation is an extension of Skowron's tolerance relation, therefore, both of them are referred to as tolerance relation collectively by later researchers. The tolerance relation discards the transitivity requirement of indiscernibility relation in the classical rough set and relaxes the symmetry requirement for incomplete information. Hence, the tolerance classes can well reflect the overlapping relation between groups of objects. Dai [29] defined the fuzzy tolerance relation in the complete numerical data set and established the fuzzy tolerance rough set; Kang and Miao [30] proposed an extended version of the variable precision rough set model based on the granularity of the tolerance relation. Xu et al. [27] extended the singlegranulation tolerance rough set model to two types of multigranulation tolerance rough set models from a granular computing view. Stefanowski and Tsoukias [20] introduced non-symmetric similarity relation which can refine the results obtained using the tolerance relation approach, and they also proposed valued tolerance relation in order to provide more informative results; however, Wang [21] found that the symmetric similarity relation may lose some important information and valued tolerance relation requires accurate probability distribution of all attributes in advance, Wang then proposed the limited tolerance relation. Deris et al. [31] used conditional entropy to handle flexibility and precisely data classification in limited tolerance relation. There are also some scholars who studied alternatives to missing values. Nakata and Sakai [32] used possible equivalence classes to approximate the set of attributes having missing values. Yang [33] computed attribute reduction with the related family. Hu and Yao [34] introduced a logic formula to describe incomplete information tables.
In this article, we propose the constrained tolerance rough set model in the term of matching degree of incomplete information. The rest of the article is organized into four parts. In Section 2, we review some related concepts. In Section 3, we present constrained tolerance relation as an improved version of limited tolerance relation and analyse the properties of the proposed rough set model. In Section 4, the method of measuring the uncertainty of the proposed roughed set model is given and the superiority of the model is further verified. Finally, Section 5 concludes the paper.

| RELATED CONCEPTS
In this section, we review some basic concepts such as information system, Pawlak's rough set, tolerance rough set, limited tolerance rough set. Definition 2.1 [19,31]. An information system (IS) is a 4-tuple S ¼ ðU; T A; V ; f Þ, where U¼ fx 1 ; x 2 ; …; x jUj g is a non-empty finite set of objects, T A ¼ fa 1 ; a 2 ; …; a jT Aj g is a non-empty finite set of attributes, V ¼ ∪ a∈TA V a ,V a is the value set of attribute, f : U � T A → V is a total function such that f ðx; aÞ ∈ V , for every ðx; aÞ ∈ U � T A, called information function. If U contains at least one object with an unknown or missing value (so-called null value), then S is called incomplete information system (IIS). The unknown value is denoted as "*" in the incomplete information system. In this article, we also use the quadruple S ¼ ðU; T A; V ; f Þ to denote an incomplete information system. T A ¼ C ∪ D If, where C is the set of condition attributes, Dis the set of decision attributes, then S is called Decision Information System.
Each subset of attributes A ⊆ T A determines a binary indiscernibility relation INDðAÞ as follows: The relation INDðAÞis an equivalence relation since it is reflexive, symmetric and transitive.
is referred to as the Pawlak's rough setof X with respecttothe setof attributes A.
Obviously, T is reflexive and symmetric, but not transitive. The tolerance class I T A ðxÞ of an object x with reference to an attribute subset A is defined as I T A ðxÞ ¼ fyjy ∈ U ∧ ðx; yÞg.
Definition 2.4 [31]. Let S ¼ ðU; T A; V ; f Þ be an IIS, A ⊆ T A T , is a tolerance relation, the lower and upper approximations of an arbitrary subset X of U with reference to attribute subset A respectively can defined similar to how is referred to as the tolerance rough set of X with respect to the set of attributes A.
Definition 2.5 [21]. Let S ¼ ðU; T A; V ; f Þ be an IIS, A ⊆ T A, and P A ðxÞ ¼ faja ∈ A ∧ aðxÞ ≠ *g. A binary relation L (limited tolerance relation) defined on U is given as L is reflexive and symmetric, but not transitive. The limited tolerance class I L A ðxÞ of an object x with reference to an attribute subset A is defined as I L A ðxÞ ¼ fyjy ∈ U ∧ L A ðx; yÞg. Definition 2.6 [31]. Let S ¼ ðU; T A; V ; f Þ be an IIS, A ⊆ T A, L is a limited tolerance relation, the lower and upper approximations of an arbitrary subset X of U with reference to attribute subset A, respectively, can defined similar to how is referred to as the limited tolerance rough set of X with respect to the set of attributes A.

| ROUGH SET BASED ON CONSTRAINED TOLERANCE
From Definition 2.5, we can easily derive an equivalent form of the limited tolerance relation as following: It means that the objects with all attributes being null will be judged to be limited tolerating and then should be grouped into the same limited tolerance class. However, in practical application, the risk of classifying those objects whose attributes filled with a quit mount of null values will greatly arise. In fact, we prefer to control the scale of null-valued attributes within a certain range. Meanwhile, the more the properties of the two objects with the same value, the greater the probability of being divided into the same class and the higher the classification accuracy. However, the limited tolerance may group those objects with only one attribute of the same value into the same class.
We can illustrate the above phenomena considering the following example with an IIS described as Table 1. Table 1 is a IIS, where x 1 ; x 2 ; …; x 16 , are objects, a 1 ; a 2 ; a 3 ; a 4 are four condition attributes, d is a decision attribute. The domains of these four condition attributes are all {0, 1, 2, 3}. The domain of the decision attribute d is {H, J} Let A ¼ fa 1 ; a 2 ; a 3 ; a 4 g, we can easily obtain the following results by analysing Table 1 with the limited tolerance relation.

Example 3.1 Suppose
For data analysis, the elements in the lower approximation of the data set are expected to be representative of the final classes. In Table 1, all conditional attributes of x 15 and x 16 are null; intuitively, they should be classified as outliers (special classes) in most practical applications, but from the above results, the particularity of x 15 and x 16 is not shown in the lower approximation of 'H' or 'J', because the classification based on the limited tolerance relation is not able to distinguish the influence degree of the null value attribute.
In order to improve the accuracy of object classification based on tolerance relations and reflect the influence degree of null value, in this article, we propose the constrained tolerance relation. From Definition 3.2, we know the inequality ρðx; yÞ ≤ 1 is always true when τ ¼ 1. If ρðx; yÞ ¼ α, where α is a constant for a given pairðx; yÞ, and α < 1, from Proposition 3.1, we have ðx; yÞ ∈ T c τ ðAÞ ⇒ ðx; yÞ ∈ LðAÞ; If ρðx; yÞ ¼ 1, it means that, for any α ∈ A, at least one of aðxÞand aðyÞ is null. At this point, x and y become outliers of each other's class in terms of the constrained tolerance class or the limited tolerance class.
That is, when τ ¼ 1, the constrained tolerance relation will retrograde into the tolerance relation and the constrained tolerance class will retrograde into the tolerance class. Proof.
(2) ∀ y ∈ I � is referred to as the constrained tolerance rough set of X with respect to the set of attributes A.

Proposition 3.3 Give an IIS S ¼ ðU; T A; V ; f Þ, A ⊆ T A, then A T c
Proof.
From the Definition 3.4, we have the following properties of the constrained tolerance rough set. (1), we know that A T c τ ð∅Þ ⊆ ∅, and ∅ ⊆ A T c τ ð∅Þ (because the empty set is a subset of any set). Hence, A T c τ ð∅Þ ¼ ∅.

Proposition 3.4 Given an IIS S
(2b) Suppose A T c τ ðXÞ ≠ ∅, then, there exists x ∈ A T c τ ð∅Þ, hence I T c τ A ðxÞ ∩ ∅ ≠ ∅. It contradicts the statement that the intersection of an empty set with any set is an empty set. Thus, the assumption is not true. Therefore, , since the constrained tolerance relation is symmetric, we then have Therefore, A T c τ 2 ðXÞ ⊆ A T c τ 1 ðXÞ.