A new subfamily classification of the Leguminosae based on a taxonomically comprehensive phylogeny

U.S.A

With close to 770 genera and over 19,500 species LPWG, 2013a), the Leguminosae is the thirdlargest angiosperm family in terms of species numbers after Asteraceae and Orchidaceae. Economically, Leguminosae is second in importance only to Poaceae. It is estimated, for example, that total world exports of pulses (i.e., legume crops harvested for their dry seeds) have more than doubled between 1990 and 2012, expanding from 6.6 to 13.4 million tons, and in 2012 the value of pulse exports was estimated at US$ 9.5 billion (Food and Agriculture Organisation [FAO]: http:// www.fao.org/pulses-2016/en/). The United Nations General Assembly designated 2016 the International Year of Pulses to promote awareness of their nutritional benefits, importance in food security and sustainable agriculture, and in mitigating biodiversity loss and climate change. Legumes are important food crops providing highly nutritious sources of protein and micronutrients that can greatly benefit health and livelihoods, particularly in developing countries (Yahara & al., 2013). Legumes have been domesticated alongside grasses in different areas of the world since the beginnings of agriculture and have played a key role in its early development (Gepts & al., 2005;Hancock, 2012). Legumes are also uniquely important as fodder and green manure in both temperate and tropical regions, and are used for their wood, tannins, oils and resins, in the manufacture of varnishes, paints, dyes and medicines, and in the horticultural trade.
Legumes are cosmopolitan in distribution, representing important ecological constituents in almost all biomes across the globe and occur in even the most extreme habitats (Schrire & al., 2005a, b). They constitute significant elements in terms of both species diversity and abundance, in lowland wet tropical forests in Africa, South America, and Asia (Yahara & al., 2013), and they dominate dry forests and savannas throughout the tropics (DRYFLOR, 2016), and also occur in Mediterranean, desert and temperate regions, up to high latitudes and at high elevations. They can be large emergent tropical trees with buttresses, small ephemeral annual herbs, climbing annuals or perennials with tendrils, desert shrubs, geoxylic subshrubs, woody lianas and, less commonly, aquatics. Flower symmetry spans the full range from radially symmetric (actinomorphic) to bilaterally symmetric (zygomorphic) and asymmetric flowers, which are in turn adapted to a wide range of pollinators such as insects, birds and bats. The ability of the majority of legume species to fix atmospheric nitrogen in symbiosis with soil rhizobia is perhaps the best-known ecological characteristic of the family; however, not all legumes form associations with nitrogen-fixing bacteria. Overall, the family is morphologically, physiologically and ecologically exceptionally diverse, representing one of the most spectacular examples of evolutionary diversification in plants. All of these characteristics have led to a continued fascination with the biology, diversity and evolution of the family, the evolution of functional traits, and the ecology and biogeography of the family by legume biologists (e.g., Stirton & Zarucchi, 1989;Schrire & al., 2005a, b;Sprent, 2007Sprent, , 2009Champagne & al., 2007;Simon & al., 2009;Bouchenak-Khelladi & al., 2010;Cannon & al., 2010Cannon & al., , 2015Pennington & al., 2010;Doyle, 2011;Simon & Pennington, 2012;Koenen & al., 2013;Oliveira-Filho & al., 2013;Moncrieff & al., 2014;Werner & al., 2014Werner & al., , 2015Dugas & al., 2015;BFG, 2015).
Here we propose a new subfamilial classification of the family Leguminosae that takes into account the phylogenetic pattern that is consistently resolved in numerous recent studies. This new classification is proposed and endorsed by the legume systematics community as reflected in the use of the Legume Phylogeny Working Group (LPWG) as the authority for all Keywords Caesalpinioideae; Cercidoideae; Detarioideae; Dialioideae; Duparquetioideae; mimosoid clade; Papilionoideae; plastid matK phylogeny Supplementary Material Electronic Supplement (Fig. S1), voucher information (Table S1)  new names proposed. The Legume Phylogeny Working Group was established explicitly to develop and foster collaborative research towards a comprehensive phylogeny and classification for Leguminosae (LPWG, 2013a).

Version of Record
The new classification proposed here follows a traditional Linnaean approach but is compatible with and complementary to emerging clade-based classifications of individual legume subfamilies (Wojciechowski, 2013). Rank-free naming of clades within (and across) subfamilies is already well-established and increasingly prevalent in the legume literature (e.g., Dalbergioid clade, Lavin & al., 2001;inverted repeat [IR]-lacking clade, Wojciechowski & al., 2000;Umtiza clade, Herendeen & al., 2003;Acacia s.l. clade, Miller & al., 2014), and additional important clades will continue to be named even after a fully fledged and stable subfamily and tribal classification is established. As noted by Wojciechowski (2013), use of Linnaean names does not preclude a system that also defines and names clades and their overall relationships outside of the traditional Linnaean framework. Instead, the two are considered complementary and necessary for developing a stable, flexible and useful classification of legumes.

THE NEW SUBFAMILY CLASSIFICATION
The monophyly of the family Leguminosae is strongly supported in all molecular phylogenetic analyses, regardless of taxon or gene sampling (see LPWG, 2013a and references therein). Indeed, despite uncertainty over their closest relatives (cf. Dickison, 1981;APG III, 2009;Bello & al., 2009), the monophyly and distinctiveness of the Leguminosae have never been questioned in terms of morphology since the family was first established (Adanson, 1763;Jussieu, 1789;Polhill & Raven, 1981;Polhill, 1994;Bello & al., 2012). The most conspicuous characteristic of the family is, with only a few exceptions, a single superior carpel with one locule, marginal placentation, and usually two to many ovules, in two alternating rows on a single placenta . However, legume systematists have been aware for a long time of the discrepancy between the current subfamily classification and emerging phylogenetic results (Irwin, 1981;Käss & Wink, 1996;Doyle & al., 1997), most notably the long known paraphyly of subfamily Caesalpinioideae, as well as many other problematic issues, such as lack of monophyly of many tribes and subtribes. This means that the phylogenetic structure of the family is not directly reflected in the current classification . Thus, legume biologists studying particular clades have invented and used informal clade names that are biologically meaningful and appropriate for their study questions. This has resulted in a proliferation of informally named clades that can be inconsistent, ad hoc, and sometimes contradictory across studies, and which can lead to nomenclatural confusion unless they are properly defined (LPWG, 2013a, b;Wojciechowski, 2013). This is important not just within the legume taxonomic community but also for the legume biology, genomics, and indeed the wider evolutionary biology community as a whole (e.g., Cannon & al., 2015).
In contrast to some other large angiosperm families where the subfamily rank is perhaps not as widely recognised or used outside the immediate taxonomic community (e.g., Poaceae, Grass Phylogeny Working Group, 2001Asteraceae, Panero & Funk, 2002, Funk & al., 2009, in legumes, the subfamily has always been a widely used and central rank. The three currently recognised subfamilies have long been considered as distinct groups and have often been recognised at the family rank (e.g., Hutchinson, 1964;Cronquist, 1981). In 1825, in his Prodromus, Candolle subdivided the Leguminosae into four suborders (= subfamilies), naming for the first time the three present-day subfamilies in addition to a fourth "suborder", Swartzieae, now included in subfamily Papilionoideae. This system was elaborated upon by Bentham (1865), who recognised three major groups within Leguminosae and whose classification formed the basis for all subsequent classifications of the family over the following 140 years (from, e.g., Taubert, 1891, to Polhill, 1994. In his Families of flowering plants (1926) and Genera of flowering plants (1964), Hutchinson raised the three subfamilies to the family level, but grouped them in the order Leguminales, a system that has been followed in a number of Floras (e.g., Hutchinson & Dalziel, 1928;Görts-van Rijn, 1989;Orchard & Wilson, 1998Mori & al., 2002; see also Lewis & Schrire, 2003). In the first volume of Advances in legume systematics (Polhill & Raven, 1981), the three groups were recognised at the subfamily rank. Regardless of rank, these three groups have been used as a division for identifying and classifying genera and species in Floras and herbaria throughout the world since the 19th century. These groupings are taught in botany, floristics and taxonomy courses, and are consistently used by agronomists, horticulturalists, and ecologists throughout the world. As remarked by Polhill & al. (1981: 24), "the basic classification of the family has remained remarkably stable and sensible. Users of classifications provide a strong selective force […]". Indeed, although the generic membership of the three subfamilies has changed somewhat over the centuries, these iconic groupings have remained useful concepts for identifying this diverse group of plants. Our objective here is to retain the utility of these well-known groups as far as possible while at the same time proposing a new classification that correctly reflects the evolutionary relationships and emphasises the distinctive features of each of the subfamilies.
Despite tremendous progress in understanding phylogenetic relationships across the family (LPWG, 2013a), uncertainty remains regarding relationships amongst the six first branching lineages of legumes and within certain clades (Figs. 1 & 2) Bruneau & al., 2008;LPWG, 2013a). For example, relationships among early-branching papilionoids (Cardoso & al., 2012a(Cardoso & al., , 2013b, the large so-called Mimosoideae-Caesalpinieae-Cassieae clade, or MCC clade sensu Doyle (2011) (Bruneau & al., 2008Manzanilla & Bruneau, 2012;), and the Ingeae-Acacia s.str. clade (Luckow & al., 2003;Simon & al., 2009) all lack resolution and support using conventional DNA sequence datasets (i.e., a few kilobases of plastid DNA sequence data). However, there is no uncertainty surrounding the paraphyly of subfamily Version of Record  (Table  2) of Leguminosae (for 30 species, multiple varieties or subspecies were included) and 100 outgroup taxa (uncoloured) spanning core Eudicots (see Appendix 1, Table S1). Branch lengths are proportional to numbers of matK substitutions. All subfamilies are supported with 1.0 posterior probability (indicated as thicker lines) and 100% maximum likelihood bootstrap values (Fig. S1). Support is weak across the backbone of the grade subtending the mimosoid clade, and this grade includes five or more lineages which would need to be recognised as additional small subfamilies if Mimosoideae had been retained at a subfamilial rank. Duparquetioideae forms a polytomy with Cercidoideae, Detarioideae and the clade that groups the other three subfamilies (but see Fig. 2, where Duparquetioideae is sister to the clade comprising Dialioideae, Caesalpinioideae and Papilionoideae based on analysis of a much larger plastid gene set). Numbers of genera and species (+ infraspecific taxa) sampled / currently recognised are indicated for each subfamily. The phylogenetic tree can be visualised (e.g., with FigTree [http://tree.bio.ed.ac.uk/software/figtree/] or Dendroscope [http://dendroscope.org/]; Huson & Scornavacca, 2012), and downloaded from Supplementary Data: Data file B. B, Schematic phylogeny based on the matK Bayesian analysis showing the six subfamily classification of the Leguminosae, with clade sizes proportional to number of species. A schematic figure illustrating the diversity of the six subfamilies is available for download as a poster (Figs. S2, S3).

Fig. 2.
Phylogeny and subfamily classification of the Leguminosae, depicted on a 95% majority-rule Bayesian consensus tree based on analysis of peptide sequences from 81 plastid encoded proteins, subsampling representative taxa from forthcoming phylogenomic analyses (E.J.M. Koenen & al., in prep.). This analysis resolves the relationships of Duparquetioideae (cf. Fig. 1 based on analysis of matK alone). The tree is unresolved in just a few places, including the root of the family and amongst clades in the Caesalpinioideae. All other nodes received 1.0 posterior probability, except the two nodes marked with an asterisk, which have 0.99 posterior probability. The tree was inferred using PhyloBayes v.1.6j (Lartillot & al., 2009) with the -CATGTR model selected and running two independent chains until they reached convergence. The six subfamilies are indicated by the coloured boxes to the right of the phylogeny.  Caesalpinioideae and hence the need for a new subfamilial classification (LPWG, 2013a, b). All adequately sampled phylogenetic analyses of the family indicate that the monophyletic Mimosoideae and Papilionoideae are nested within a paraphyletic assemblage of caesalpinioid lineages. This is perhaps no surprise. Already in 1981, in the preface to Advances in legume systematics volume 1, based on morphology alone, H.S. Irwin noted that Caesalpinioideae remained the most troublesome segment of the family and that, inevitably, a greater number of higher-level groups would need to be recognised.

Version of Record
The three traditional subfamilies were based essentially on a small set of conspicuous floral characters, particularly petal aestivation patterns (  ). While some of these floral characters may be useful for defining Papilionoideae and Mimosoideae, they are extremely variable across the traditional Caesalpinioidae (Tucker, 2003;Bruneau & al., 2014), which cannot be defined or diagnosed based on these characters. Furthermore, even for Papilionoideae and Mimosoideae, most of these floral traits are now known to be homoplasious (Pennington & al., 2000). For example, individual species or clades marked by radially symmetric flowers are independently derived multiple times across basal Papilionoideae, a large assemblage of florally heterogeneous lineages dominated by bilaterally symmetric flower morphology (Figs. 7-9) (Pennington & al., 2000;Cardoso & al., 2012bCardoso & al., , 2013aRamos & al., 2016). Similarly, while Mimosoideae are the most conspicuously biodiverse clade with radially symmetric flowers, other closely related lineages scattered across the MCC clade also have radially symmetric, mimosoid-like flowers (Fig. 5). Thus, despite the central importance of floral characters in the traditional subfamilial classification, phylogenetic results over the past 20 years favour giving less weight to floral morphology because it is so prone to evolutionary modification and convergence, especially in the transition from radial to bilateral floral symmetry, which can be achieved in different ways.
There has been broad consensus about the need for a new classification within the legume systematics community since the first molecular phylogenies of the family were published (Käss & Wink, 1996;Doyle & al., 1997). However, the multilineage paraphyletic structure of subfamily Caesalpinioideae with respect to the monophyletic Mimosoideae and Papilionoideae poses significant questions about how many subfamilies should be recognised. Furthermore, until recently, incomplete sampling of many key genera in phylogenies suggested the need for caution before establishing a new subfamilial classification. More recent and densely sampled phylogenies (Luckow & al., 2003;Lavin & al., 2005;Bruneau & al., 2008;Simon & al., 2009;Cardoso & al., 2012aCardoso & al., , 2013b, as well as the matK phylogeny with its near-complete sampling of genera that we present here (Figs. 1 & S1; Appendix 1), now provide adequate taxon sampling and phylogenetic support to reveal in sufficient and definitive detail the overall phylogenetic structure of the family and allow us to properly evaluate the options and arrive at the best solution for translating the phylogenetic tree into a new classification. Furthermore, the main clades resolved in the matK phylogeny are also fully supported in whole plastid genome sequence analyses ( Fig. 2) (E.J.M. Koenen & al., in prep.), and are corroborated by phylogenetic analyses of orthologous nuclear genes derived from representative sampling of multiple transcriptomes of all subfamilies, except Duparquetioideae (E.J.M. Koenen & al., in prep.).
A concerted effort to arrive at a new classification was initiated at the 6th International Legume Conference in Johannesburg, South Africa, in January 2013. Specifically, there was general consensus that sufficient data, in terms of taxon sampling and phylogenetic support, were available to propose a new subfamilial classification for Leguminosae, and there was universal agreement that the number of subfamilies needed to be increased (LPWG, 2013b). There was also broad agreement that several caesalpinioid clades (Cercideae, Detarieae, Duparquetia, Dialiineae s.l.) could be appropriately, uncontroversially and usefully recognised as new subfamilies, alongside Papilionoideae. The central problem for a new subfamilial classification, was how to deal with the large clade that includes the "Umtiza clade" or "grade", "Caesalpinia Group clade", "Cassia clade", "Peltophorum clade", scattered Dimorphandra Group genera, and which has Mimosoideae nested within it, i.e., the large MCC clade (sensu Doyle, 2011Doyle, , 2012. Several participants suggested that the whole MCC clade should be recognized as a single subfamily (making a total of six subfamilies), but with the disadvantage that mimosoids, in the traditional sense, would no longer be recognised as a subfamily, which made some legume systematists uncomfortable. The alternative, whereby Mimosoideae is retained as a subfamily, entails recognition of six to eight (or more) additional small subfamilies to account for the multiple lineages that make up the large paraphyletic assemblage subtending mimosoids (Figs. 1, 2). However, many recognised that although resolution and support across this grade remains relatively weak in current phylogenies (Fig. 1;Bruneau & al., 2008;Manzanilla & Bruneau, 2012), improved resolution and support from larger datasets (e.g., Fig. 2; E.J.M. Koenen & al., in prep.) was not alone going to solve the problem of 6 vs. 11 or more subfamilies. These two main options for a new classification were summarised, the points of agreement noted, and the foundations for furthering the discussion presented in LPWG (2013b).
The advantages and disadvantages of these two main options for a new subfamily classification (6 vs. 11, or more subfamilies) were specifically discussed and evaluated at a subsequent legume systematics symposium, held during the Latin American Botanical Congress in October 2014, in Bahia, Brazil. A document was then drafted summarising the advantages and disadvantages and circulated to a LPWG electronic discussion group with wide, international membership for further discussion and opinion. The comments received from this draft were taken into account when developing the classification presented here, subfamily descriptions were discussed at a legume morphology workshop in Botucatu, Brazil (November Version of Record   2015), and draft manuscripts circulated again to the LPWG membership for further comment prior to submission of this paper for publication.

Version of Record
After broad consultation within the legume systematics community, it was generally agreed that a six subfamily classification is the most appropriate option for naming subfamilies in a Linnaean system (Figs. 1, 2, S2 & S3). The six subfamily option is based on a set of clades with robust support (1.00 Bayesian posterior probabilities and 100% maximum likelihood bootstrap values in Figs. 1, 2 & S1) that are each subtended by long branches: Cercidoideae ( In addition to the molecular support all six subfamilies have support from morphological data (Table 1). While morphological circumscription of the six subfamilies is not entirely straightforward given the complex and homoplasious nature of most morphological characters (Table 1; see Taxonomy below), it is certainly no more difficult or problematic than for the traditional three subfamilies, for which the supposed diagnostic morphological (mainly conspicuous floral) characters are beset by numerous exceptions, and where Caesalpinioideae, as traditionally circumscribed, lacks obvious diagnosability. Although Papilionoideae and the re-circumscribed Caesalpinioideae are still large and heterogeneous clades, the former retains its current definition and generic membership (Polhill, 1994; (Table 2), while the latter is now more homogeneous, including, for example, all legumes with bipinnate leaves and most with extrafloral nectaries on the petiole and rachis ( Fig. 2 Table 1) (e.g., Marazzi & al., 2012). The six subfamilies have similar stem ages, all having apparently diverged soon after the first appearance of the family (Lavin & al., 2005;Bruneau & al., 2008;Simon & al., 2009). The major disadvantage of adopting a six subfamily classification, namely abandoning the well-known Mimosoideae, is mitigated by continuing to recognise this lineage as a distinct clade, informally referred to as the mimosoid clade at this point, but with scope to be formally named as a tribe within a new Linnaean tribal classification, and/or in a rank-free clade-based phylogenetic classification of new sense Caesalpinioideae, once relationships within this subfamily are better resolved. It is also worth noting that options recognising fewer than six subfamilies would both reduce morphological diagnosability and result in subfamilies with even more unwieldy morphological heterogeneity. The six subfamily option minimises the number of new Linnaean names, which is likely to be more easily accepted by a wider user community, and we considered this option as more likely to remain stable through time. With a six subfamily system, we are ensuring greater nomenclatural stability than a system that would describe 11 or more new subfamilies, particularly as several of the additional subfamilies that would need to be recognised lack robust support in current phylogenies being subtended by short branches (Figs. 1, 2 & S1) (Bruneau & al., 2008;Manzanilla & Bruneau, 2012; E.J.M. Koenen & al., in prep.) and might later need to be changed.
Although       In our new classification, three subfamily names are new at this rank. We ascribe these names to the collective known as the "Legume Phylogeny Working Group". This uncommon practice in botanical nomenclature does not prevent valid publication of the names under the botanical code as stipulated in Chapter VI, Section 1 (Author Citations). Although we could have adopted a modification of Recommendation 46C.2, which suggests citing the first author followed by "et al." (and at first appearance of that authority, listing all 97 authors), we considered that ascribing authorship to the Legume Phylogeny Working Group is more straightforward, more clearly gives due credit to the legume systematics community and reflects much better the collaborative approach used to arrive at this new classification. At a time when systematics papers may have increasing numbers of authors, for example, as genomic datasets become routine, we feel that a desire for authorship ascribed to research groups and communities rather than individuals will become more commonplace.

INTEGRATING TRIBAL AND CLADE-BASED CLASSIFICATIONS
In addition to the need for a new Linnaean-based subfamily classification, there are important questions about the best approach to naming clades within subfamilies. New phylogenies of many legume groups have unequivocally demonstrated the inadequacies of the tribal classifications of Polhill & Raven (1981), Polhill (1994), and  because of the non-monophyly of most of the traditionally recognised tribes (LPWG, 2013a). In addition, questions remain about the monophyly and placement of several genera, with considerable ongoing uncertainty surrounding generic delimitation and relationships (LPWG, 2013a; . However, numerous phylogenetic studies are ongoing and revised tribal classifications of subfamilies will be forthcoming in the near future. The emergence of clade-based phylogenetic classification systems provides an additional option to facilitate rankfree naming of robustly supported legume clades under the draft ICPN. Such clade-based classifications can be easily integrated with traditional Linnaean rank-based classification to name additional clades coinciding with the evolution of key biological traits that are hypothesised as synapomorphies. For example, several important legume clades corresponding to biologically important apomorphies (sometimes in the form of deep homologies), including nodulation, bipinnate leaves (here corresponding to the redefined Caesalpinioideae), extrafloral petiolar or leaf rachis nectaries, pollen in tetrads / polyads, and valvate petal aestivation (mimosoid clade) could be named in this way, as pursued by Wojciechowksi (2013) for Papilionoideae using many of the recommendations of the ICPN. We believe this approach, integrating Linnaean ranks alongside clade-based ICPN classification, will greatly enhance the biological meaning and utility of future classifications with significant benefits for effective communication across a wide spectrum of biological disciplines.
A new classification is clearly needed for the recircumscribed subfamily Caesalpinioideae, which has been the most difficult and controversial to delimit in the new subfamilial classification because of the inclusion of the formerly recognised and morphologically distinctive subfamily Mimosoideae. Because relationships amongst major groups within the recircumscribed Caesalpinioideae remain poorly resolved (Figs. 1, 2 & S1), we refrain from establishing a new tribal and / or clade-based classification for this subfamily here. Although most mimosoids are morphologically distinct (Fig. 6), the morphological distinctions between some members of the mimosoid clade and subtending caesalpinioid lineages are not always clearcut. For example, Dinizia Ducke, once considered to be in Mimosoideae, is placed outside the mimosoid clade in molecular phylogenetic analyses (Luckow & al., 2003;Bruneau & al., 2008), and Chidlowia Hoyle (Fig. 6A), which has always been considered a caesalpinioid legume , is placed within the mimosoid clade in recent molecular phylogenetic analyses (Manzanilla & Bruneau, 2012;E.J.M. Koenen & al., in prep.). For these reasons, we refrain from formally naming this clade until relationships amongst lineages within Caesalpinioideae can be better resolved, and refer to the former subfamily Mimosoideae DC. simply as the "mimosoid clade" for the time being.
In Cercidoideae and Dialioideae, both of which have relatively few genera (Lewis & Forest, 2005;Sinou & al., 2009;E. Zimmerman, unpub. data), infra-subfamilial classifications may not be needed, and Duparquetioideae is monospecific. In Detarioideae, phylogenetic relationships amongst basal lineages have been too poorly resolved until now to permit their classification (Bruneau & al., 2001(Bruneau & al., , 2008Fougère-Danezan & al., 2007), but ongoing studies are leading to better resolution with the possibility for recognising clades as tribes and / or formally named clades (M. de la Estrella & al., unpub. data). Similarly, ongoing studies in Papilionoideae and in the recircumscribed Caesalpinioideae should help resolve key relationships, with the ultimate outcome that names of strongly supported and biologically meaningful clades will be proposed in forthcoming publications.

REFERENCE PHYLOGENY
The classification proposed here uses as its framework the most comprehensively sampled phylogenetic analysis of legumes to date (Figs. 1, S1; Table S1; Methods described in Appendix 1). This new phylogeny is based on plastid matK gene sequences because this gene region is the most widely sequenced across the legumes (cf. LPWG, 2013a) and it is sufficiently variable to resolve generic membership of many strongly supported higher-level clades as demonstrated by a large number of studies such as those referenced herein. Although this analysis is based on a single plastid locus, the topology observed and the groups that are supported have been consistently resolved in numerous previous phylogenetic analyses of the entire family or of particular clades within the family using diverse plastid (trnL-F, trnD-T, rbcL, rps16, rpl16) and nuclear loci (e.g., rDNA ITS, SucS) (see LPWG, 2013a and references therein). In recent analyses of all 81 plastid genes (Fig. 2) and of a large nuclear gene dataset derived from transcriptome sequences (E.J.M. Koenen & al., in prep.), all five of the non-monospecific subfamilies are strongly supported, and the relationships amongst them do not conflict with the matK analyses (see below), although the nuclear gene dataset does not include Duparquetioideae.

Version of Record
The analysis presented here includes 3696 legume species (with an additional 48 infraspecific taxa) representing 698 of the currently recognised 765 legume genera (Figs. 1 & S1; Tables 2 & S1; Appendix 1). Subfamilies Cercidoideae and Duparquetioideae are fully sampled at the generic level. In the Detarioideae, five genera are not sampled, all of them monospecific, in Dialioideae two monospecific genera are missing, and in Caesalpinioideae, two genera are not sampled (Table 2, missing genera identified with *). Papilionoideae are represented by 445 genera, with most of the missing 48 genera belonging to the tribe Loteae and phaseoloid clades. The phylogenetic trees and the underlying alignment and voucher data are available to browse and download from the online Supplementary Data (Table S1; Data Files A-F) and on Data Dryad (DOI: https:// doi.org/10.5061/dryad.61pd6).
Bayesian analyses (Fig. 1) and maximum likelihood (Fig.  S1) of the matK sequence data resolve the Leguminosae as monophyletic with 1.0 posterior probability and 100% bootstrap support. Each of the five non-monospecific subfamilies of Leguminosae is also supported with 1.0 posterior probability and 100% bootstrap support. Relationships amongst subfamilies Cercidoideae, Detarioideae, Duparquetioideae and the clade that groups the remaining legumes (i.e., the other three subfamilies) are unresolved, forming a basal polytomy (Fig. 1). Dialioideae is resolved as sister to Caesalpinioideae + Papilionoideae (1.0 posterior probabability, Fig. 1; 100% bootstrap support, Fig. S1), which are sister to each other. In the full plastid analyses of E.J.M. Koenen & al. (in prep.), Duparquetioideae is robustly supported as sister to the Dialioideae + Caesalpini oideae + Papilionoideae clade, but the relationship of this clade to the Cercidoideae and Detarioideae remains unresolved (Fig. 2). Many genera of Leguminosae are supported as monophyletic in the matK analysis, with notable exceptions for certain large genera that are the focus of ongoing taxonomic and phylogenetic studies (e.g., Bauhinia s.l. in Cercidoideae, several genera of Detarioideae, of the mimosoid clade, and of tribe Millettieae in Papilionoideae). In the mimosoid clade, and in other parts of Caesalpinioideae and Detarioideae, genera are often not supported as monophyletic, and generic-level relationships are often poorly resolved. This can likely be attributed in part to striking substitution rate heterogeneity in plastid genes, and hence variable phylogenetic resolution across legumes, as previously noted by Lavin & al. (2005) and Dugas & al. (2015) (see also Figs. 1 & 2).
Several recent large-scale angiosperm / rosid phylogenetic analyses (Zanne & al., 2014;Li & al., 2016; included thousands of legume nuclear and plastid and, in some cases, mitochondrial sequences. These analyses contain many taxa that were mis-identified or labelled using outdated taxon names, or are based on apparent sequence contaminants that have been deposited in GenBank without being properly checked and annotated. These inaccuracies, compounded by large amounts of missing data (e.g., 80% in Zanne & al., 2014), together interact to cause unpredictable and chaotic problems in phylogenetic analyses, a phenomenon highlighted several years ago by McMahon & Sanderson (2006) in their supermatrix phylogenetic analysis of papilionoid legumes. Unfortunately, such potentially flawed topologies have been used as the basis for several recent large-scale evolutionary studies focused, for example, on key characteristics of legumes such as the origins of nodulation and nitrogen fixation (e.g., Werner & al., 2014Werner & al., , 2015Li & al., 2016). A cursory examination of many of these large-scale phylogenies has revealed a number of unusual and demonstrably inaccurate relationships. Using such badly flawed phylogenies can obviously lead to weak or even erroneous conclusions regarding the evolution of particular traits (cf. Doyle, 2016). In contrast, the phylogeny presented here uses a fully curated set of sequences that are vouchered and taxonomically validated by the legume systematics community.
The phylogenetics of legumes, like that of any major clade, is of course a work in progress. New densely sampled phylogenies at the species, generic and higher levels based on full plastome sequences, as well as transcriptomes and hundreds of nuclear loci are ongoing, and will in due course supersede the phylogeny presented here. Regardless, the taxonomically validated tree presented here can be used for downstream analyses that require an accurate and densely sampled phylogenetic framework of the Leguminosae.
Currently 12 genera and ca. 335 species, mainly tropical, Cercis in the warm temperate Northern Hemisphere.

Version of Record
Currently 84 genera and ca. 760 species, almost exclusively tropical, Schotia Jacq. in sub-tropical South Africa.
Caesalpinioideae in its emended circumscription contains 148 genera and ca. 4400 species. Pantropical, common in both wet and dry regions, with a handful of species extending to the temperate zone, less frequently frost-tolerant (Gleditsia, Gymnocladus and some species of Desmanthus Willd. and Senna).

The mimosoid clade
Although the mimosoid clade ( Fig. 6) is not formally named here, it is morphologically distinct and can be defined as the most inclusive crown clade containing all Leguminosae with radially symmetrical flowers having valvate petal aestivation, homologous to those found in Pentaclethra macrophylla Benth. and Inga edulis Mart.
The mimosoid clade contains all genera previously assigned to subfamily Mimosoideae plus Chidlowia, previously considered to be a member of the former Caesalpinioideae, but now shown to belong to the mimosoid clade (Manzanilla & Bruneau, 2012; E.J.M. Koenen & al., in prep.). This clade of 3300+ species is morphologically highly distinctive with radially symmetrical flowers with valvate aestivation of the calyx and corolla (except in Parkia, which has partially imbricate calyx lobes). Typically, flowers are combined in spicate or capitate inflorescences, often these are in turn combined into compound inflorescences (e.g., a panicle of globose heads). Pantropical, common in both wet and dry regions, with a handful of species extending to the temperate zone, and less frequently into frost-prone regions.