Address correspondence to C. Ferri, DSIC, Universitat Politècnica de València, Valencia, Spain; e-mail:


Distance-based and generalization-based methods are two families of artificial intelligence techniques that have been successfully used over a wide range of real-world problems. In the first case, general algorithms can be applied to any data representation by just changing the distance. The metric space sets the search and learning space, which is generally instance-oriented. In the second case, models can be obtained for a given pattern language, which can be comprehensible. The generality-ordered space sets the search and learning space, which is generally model-oriented. However, the concepts of distance and generalization clash in many different ways, especially when knowledge representation is complex (e.g., structured data). This work establishes a framework where these two fields can be integrated in a consistent way. We introduce the concept of distance-based generalization, which connects all the generalized examples in such a way that all of them are reachable inside the generalization by using straight paths in the metric space. This makes the metric space and the generality-ordered space coherent (or even dual). Additionally, we also introduce a definition of minimal distance-based generalization that can be seen as the first formulation of the Minimum Description Length (MDL)/Minimum Message Length (MML) principle in terms of a distance function. We instantiate and develop the framework for the most common data representations and distances, where we show that consistent instances can be found for numerical data, nominal data, sets, lists, tuples, graphs, first-order atoms, and clauses. As a result, general learning methods that integrate the best from distance-based and generalization-based methods can be defined and adapted to any specific problem by appropriately choosing the distance, the pattern language and the generalization operator.