Probabilistic key for identifying vegetation types in the field: A new method and Android application

Quick identification of vegetation types in the field, based on species composition but not requiring time‐consuming plot sampling, is often needed for vegetation mapping, conservation assessment, teaching and other applications of vegetation classification. Here, we propose a new method that identifies the probability of be‐ longing to the units of an established vegetation classification for vegetation stands encountered in the field. The method is based on calculating the probability that a few species observed in the field would co‐occur in a priori defined vegetation types, using the existing information on species occurrence frequency in these types. The method has been implemented in a freely available Android application called Probabilistic Vegetation Key, which makes it possible to employ it in the field using smartphones or tablets, even in the absence of internet access.


| INTRODUC TI ON
Vegetation science has a long tradition of classifying vegetation into types defined by species composition (De Cáceres et al., 2015;Mucina et al., 2016). Such vegetation types can be used for vegetation mapping, monitoring, conservation planning and assessment, or for defining study systems in basic research. The key question for the users of vegetation classification systems is how to identify vegetation types, either in the field or in databases of vegetation plots. Several methods have been developed for the identification of vegetation types in databases, some of them based on similarity in species composition between a single plot record and a set of plots previously classified to the types (Gégout & Coudun, 2012;Hill, 1989;Tichý, 2005;van Tongeren, Gremmen, & Hennekens, 2008), others based on expert systems comprising formal definitions of types (Bruelheide, 1997;Kočí, Chytrý, & Tichý, 2003;Landucci, Tichý, Šumberová, & Chytrý, 2015;Tichý, Chytrý, & Landucci, 2019). However, there are hardly any tools that would enable quick identification of vegetation types directly in the field.
Vegetation scientists have long recognized that usually only relatively small subsets of species from the total species composition are important for characterizing or discriminating vegetation types (Dale, Beatrice, Venanzoni, & Ferrari, 1986;Jancey, 1979).
Therefore, a few species should be sufficient to identify vegetation types in the field, provided they are characteristic of the considered vegetation types. Identification based on a few species found on the spot would be very useful for vegetation or habitat mapping, conservation assessment and other tasks that do not allow enough time for sampling complete species composition and abundances in vegetation plots.
Here, we introduce a new method to identify vegetation types based on a few observed species, called Probabilistic Vegetation Key, and provide a software application for smartphones or tablets that allows the use of this method in the field.

The identification of vegetation types in the Probabilistic
Vegetation Key is based on the probability that an observed set of species co-occurs in a particular vegetation type. Existing systems of vegetation classification are usually characterized by synoptic tables of species composition, which include occurrence frequencies (also called constancies) of each species in each vegetation type. These frequencies can be used as estimations of the probability of occurrence of each individual species in each type. For more than one species, the probability of their co-occurrence in a stand of a vegetation type can be defined as a product of occurrence probabilities (relative occurrence frequencies) of individual species: where V i is the probability of co-occurrence of n selected species in vegetation type i and p 1 , p 2 , p 3 , … p n are the probabilities of occurrence of species 1, 2, 3, … n in this type, expressed as their relative occurrence frequencies in this type on the scale from 0 to 1.
Using this approach, a vegetation type can be erroneously excluded from consideration if a user selects a generalist species that can occur in the type but is not present in the list of species for this type, or if a user enters a misidentified species not present in the type. To make the algorithm more robust to such errors, the minimum relative frequency for all species was arbitrarily set to 0.0001, which does not exclude any vegetation type because of a single species.
The probabilities of co-occurrence of the selected set of species in vegetation types 1, 2, 3… k (V 1 , … V k ) are summed across all k vegetation types defined in the classification. The relative probability R i (in percentages) estimated for the vegetation type i is then calculated as the species co-occurrence probability V i for this type divided by the sum: The vegetation type for which this value is the highest is the most plausible identification of the vegetation stand in which the observed set of species was found. Unlike probability V, which decreases with an increasing number of observed species for all vegetation types, the relative probabilities across all vegetation types R 1 , … R k always add up to 100%.
The probabilistic identification is based on species composition. In addition, physiognomically different vegetation types, such as forest vs non-forest, can be analyzed independently, which decreases the misidentification rate. The classification accuracy could be further improved by adding environmental or geographical information on individual types. However, as each of such variables has different importance for discriminating different vegetation types, we decided not to include them into the identification procedures.
Instead, they can be used as external criteria for verification of the classification results.

| A TE S T WITH RE AL DATA
We used data from the national vegetation classification of the Czech Republic (Chytrý, 2007(Chytrý, -2013, which includes 496 phytosociological associations. Occurrence frequencies of all species in all associations were calculated based on vegetation plots from the Czech National Phytosociological Database (Chytrý & Rafajová, 2003), which were classified to associations by the expert system developed as a part of the national vegetation classification, and then resampled within geographic strata nested within associations in order to reduce local oversampling. The resampled data set contained 30,115 plots.
To test the degree of success of the identification of associations, we prepared two selections of species. First, we randomly selected 10,000 vegetation plots (separately for forest and non-forest plots), and in each of them, we randomly selected groups of 2, 3, 4, 5 and 6 co-occurring species. These species combinations included both rare and common species, both specialists and generalists, and species typical of different associations. We also recorded the assignment of each plot to the association. This data set represented the case when a person records species in the field, having no idea about their relationship to vegetation types. Second, we asked fourteen vegetation scientists to select three phytosociological associations familiar to them and list six species for each association in decreasing order of their estimated importance for differentiation of the association. This data set represented the case when a person has field experience with the studied vegetation, can recognize local vegetation types and understands which species characterize each type, but does not know the name of the type according to the regional/national vegetation classification system. For both data sets, we calculated the mean relative probability of the selected species combination to co-occur in the respective association, and ordered the associations by this probability (Table 1). An example of a specific species combination and related probabilities is shown in Appendix S1.
For both species selections, and for both forest and non-forest vegetation, the probability of correct assignment to the association increased with the number of species used. With the expert-based species selections, the method was able to identify the correct association out of 496 (or at least to place it on the second position) using only 2-4 species. When species were selected randomly, more species were usually needed for correct identification, but this test made it clear that even a botanist with no idea about the classification of vegetation types is able to restrict a large number of vegetation types to a small group of possibly correct types. The final identification can be made by using additional information (e.g., physiognomy, ecology, distribution, and a full list of indicator species) about the vegetation types that were identified as the most probable.

| APPLI C ATI ON
The Probabilistic Vegetation Key can identify vegetation types or habitat types at any level of the classification hierarchy. It does not require recording full species composition in vegetation plots. Therefore, it is useful for fast identification of vegetation types directly in the field, for example in vegetation mapping, for a preliminary exploration of vegetation diversity in an area, and in field courses of vegetation science. To enable the application of the method in the field, an Android application that can be used in mobile phones or tablets was developed by the first author of this report. This application, called the Probabilistic Vegetation Key, is freely available at https :// play.google.com/store/ apps/detai ls?xml:id=com.test.tichy.vegkey. Its structure and interconnection are described in Appendix S2. This application uses several predefined databases. One of them is a list of species, from which the user can select those observed in the field, specify which of them have a high cover and whether they were found in forest or non-forest vegetation.

Lubomír Tichý
https://orcid.org/0000-0001-8400-7741 Milan Chytrý https://orcid.org/0000-0002-8122-3075 TA B L E 1 The success of the identification of phytosociological associations using lists of 2, 3, … 6 species, either selected randomly from a vegetation-plot database or selected by vegetation scientists with a priori knowledge of these associations; the mean order of the correct association on the list of 496 associations is based on the decreasing mean relative probability that the selected group of species co-occurs in this association

Mean relative probability (%)
Mean order of the correct association

Mean relative probability (%)
Mean order of the correct association  Appendix S1 An example of the identification process of an association with successive addition of one to six species. The correct association is Sorbo torminalis-Quercetum. Percentage probabilities for forest associations from the Czech national vegetation classification, sorted in decreasing order, are shown, excluding those with a probability lower than 3%. After adding the first species, the correct association scored second in the order of probabilities, but the probability of correct assignment was very low (6.0%). Since the addition of the third species, the correct association scored first, and after adding the sixth species, the probability of correct assignment increased to