SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Linear Context-Free Rewriting Systems
  5. 3. Mild Context-Sensitivity
  6. 4. Parsing
  7. 5. Data-Driven Parsing
  8. 6. LCFRS and Dependency Grammar
  9. 7. Conclusion
  10. Short Biography
  11. Works Cited

This paper introduces Linear Context-Free Rewriting Systems, a mildly context-sensitive grammar formalism that displays a range of interesting formal properties and that has attracted a lot of interest in the context of natural language parsing recently because of its capacity to describe discontinuous constituents and non-projective dependencies. Besides a presentation of the formalism, the paper includes a discussion of recent LCFRS applications in parsing and dependency grammar.


1. Introduction

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Linear Context-Free Rewriting Systems
  5. 3. Mild Context-Sensitivity
  6. 4. Parsing
  7. 5. Data-Driven Parsing
  8. 6. LCFRS and Dependency Grammar
  9. 7. Conclusion
  10. Short Biography
  11. Works Cited

Linear Context-Free Rewriting Systems (LCFRSs) (Vijay-Shanker et al. 1987) extend context-free grammars (CFGs) in a natural way such that non-terminals can span tuples of strings that need not be adjacent. This is an attractive property for modelling natural languages, both for discontinuous constituents as well as for non-projective dependencies. This property, together with the fact that many CFG techniques for parsing can be extended to LCFRS has recently led to an increased interest in using LCFRSs for natural language processing.

There are different reasons why LCFRSs are interesting for computational linguistics. Firstly, LCFRS represents an important complexity class concerning natural languages. In the 1980s it became clear that natural languages are not context-free. The question that arose was how far beyond CFG one has to go in order to capture natural languages. In this context, (Joshi 1985) proposed the class of mildly context-sensitive languages as a class that contains natural languages while still being computationally tractable. The formalism LCFRS has always been linked to the notion of mild context-sensitivity, which refers to extensions of CFG that can be parsed in polynomial time and that generate languages of constant growth. LCFRS belongs to this class and, so far, we do not know of any other mildly context-sensitive formalism generating a larger set of languages. Therefore LCFRS is tacitly assumed to characterize mild context-sensitivity.

Another reason why LCFRSs are interesting is that there is a range of grammar formalisms that has been proposed to model natural language phenomena and that turned out to be equivalent to LCFRS. These are, among others, set-local multicomponent TAG (MCTAG) (Joshi 1985; Joshi et al. 1975; Weir 1988) and Minimalist Grammar (MG) (Stabler 1997). This supports the assumption that LCFRS represents an important language class in the context of natural language modelling.

Finally, as already mentioned, LCFRS extends CFG such that non-terminals can span a tuple of strings that need not be adjacent. In other words, the yield of a non-terminal can be discontinuous. Since discontinuities occur frequently in natural languages, this property makes LCFRS an interesting formalism for natural language processing. For this reason, there has been a recent increase of interest in LCFRS in the context of parsing (Gildea 2010; Gómez-Rodríguez et al. 2009; Kallmeyer 2010b; Kallmeyer and Maier 2010; Kallmeyer et al. 2009; Kuhlmann and Satta 2009; Levy 2005; Maier 2010; Maier and Kallmeyer 2010). Related to this is the capacity of lexicalized LCFRSs to describe non-projective dependencies that has been investigated by (Kuhlmann 2010) and that can be exploited for data-driven grammar-based dependency parsing (Kuhlmann and Satta 2009; Maier and Kallmeyer 2010).

A sample construction with discontinuous constituents that CFGs cannot deal with properly are cross-serial dependencies in Dutch, as in (1), and in Swiss German (Bresnan et al. 1982; Shieber 1985).

(1)... datJanPietdekinderenzaghelpenzwemmen
... thatJanPietthechildrensawhelpswim
‘... that Jan saw Piet help the children swim’

In (1), we have a sequence of three noun phrases followed by three verbs where the first NP is an argument of the first verb, the second an argument of the second verb and the third NP an argument of the third verb. Furthermore, the VP of the last verb is embedded under the VP of the second verb which is, in turn, embedded under the S constituent headed by the first verb. If we adopt trees with crossing branches, this gives the constituency structure in Figure 1. This tree cannot be a CFG derivation tree but we can give LCFRS rules that generate it (see Figure 5).

Figure 1.  Constituency structure for (1).

Download figure to PowerPoint

image

This paper aims at introducing LCFRS and describing its current applications in natural language processing. The structure of the paper is as follows: Section 2 introduces the formalism, Section 3 discusses its mild context-sensitivity, Section 4 gives a bottom-up chart parsing algorithm for LCFRS, Section 5 reports on data-driven constituency parsing using LCFRS and Section 6 sketches the relation between LCFRS and dependency grammar.

2. Linear Context-Free Rewriting Systems

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Linear Context-Free Rewriting Systems
  5. 3. Mild Context-Sensitivity
  6. 4. Parsing
  7. 5. Data-Driven Parsing
  8. 6. LCFRS and Dependency Grammar
  9. 7. Conclusion
  10. Short Biography
  11. Works Cited

Originally, LCFRSs were introduced in the context of Tree Adjoining Grammars (TAGs) (Joshi et al. 1975, Joshi and Schabes 1997) and mild context-sensitivity (Joshi 1985). More or less at the same time the equivalent Multiple Context-Free Grammars (MCFGs) were developed by (Seki et al. 1991). A third formalism that is equivalent to MCFG and LCFRS and that can roughly be considered a syntactic variant of them is simple Range Concatenation Grammar (SRCG) (Boullier 2000).

2.1. Intuition

LCFRS (Vijay-Shanker et al. 1987; Weir 1988) is an extension of CFG where the non-terminals can span tuples of possibly non-adjacent strings. This is illustrated in Figure 2. On the left, we have a CFG derivation tree. The node with the non-terminal A yields a single string γ. In contrast to this, in the LCFRS derivation tree on the right, the node with category A spans three non-adjacent strings γ1,γ2 and γ3.

Figure 2.  Yields of non-terminals in context-free grammar (CFG) and linear context-free rewriting system (LCFRS).

Download figure to PowerPoint

image

Discontinuities occur frequently in natural languages, in particular in so-called free word order languages such as German. In the German NeGra treebank (Skut et al. 1997), approximately 25% of the sentences display discontinuous constituents. A few examples are given in (2) with the discontinuous constituents in italic. See (Kallmeyer et al. 2009; Maier and Lichte 2011) for more examples.

(2)a.Fronting:
Darüber muss nachgedacht werden. (NeGra)
Thereof must thought be
‘‘One must think of that’’
b.Extraposed relative clauses:
…ob auf deren Gelände der Typ von Abstellanlage gebaut werden könne, der… (NeGra)
…whether on their terrain the type of parking facility built get could, which
‘‘…whether one could build on their premises the type of parking facility, which…’’

Discontinuous constituents also occur in languages with a rather fixed word order such as English, resulting for instance from long-distance movements. (3) shows examples from the Penn Treebank (PTB) (Marcus et al. 1994). See (Evang 2011; Evang and Kallmeyer 2011; Levy 2005) for further examples.

(3)a.Long Extractions:
Those chains include Bloomingdale's, which Campeau recently said it will sell.
b.Extraposed relative clauses:
They sow a row of male-fertile plants nearby, which then pollinate the male-sterile plants.

Originally, LCFRSs were introduced as generalized CFGs that generate terms which, in turn, can be interpreted in different ways (Weir 1988). They can, for instance, yield trees or other graphs or strings. In the following, we restrict ourselves to the string-generating form of LCFRS. This is the form of LCFRS that is widely used nowadays. Instead of using the notation as generalized CFGs for LCFRSs, we will use the syntax of the weakly equivalent SRCGs (Boullier 1998, 2000).

Let us illustrate the idea of LCFRS with the grammar from Figure 3. In LCFRSs (as in CFGs) the rules describe how to compute the yield of the left-hand side non-terminal from the yields of the right-hand side non-terminals and further terminal symbols. The specification of this computation is encoded in the components of the left-hand side of the rule. For instance, the first rule A(a,b,c) → ɛ in Figure 3 tells us that the triple 〈a,b,c〉 is in the yield of the non-terminal A. The rule A(aX,bY,cZ) →  A(X,Y,Z) specifies that one can compute a new tuple in the yield of A from an already existing one by concatenating an a to its first, a b to its second and a c to its third component. The variables X, Y, Z stand for the three components of the already existing tuple in the yield of A that feeds into the computation of the left-hand side yield. The third rule says that from every triple in the yield of A we can obtain a string in the yield of S by concatenating the three components. As a result, we obtain the language {anbncn | n ≥ 1} as the yield of the start symbol S. Figure 4 shows the syntactic tree obtained for a3b3c3.

Figure 3.  An LCFRS for {anbncn | n ≥ 1}.

Download figure to PowerPoint

image

Figure 4.  Tree obtained for a3b3c3.

Download figure to PowerPoint

image

Figure 3 also gives the original LCFRS notation with term-generating rules that yield strings. For aabbcc for instance, we generate the term f(g(h())) which yields f(g(h())) = f(g(〈a,b,c〉)) = f(〈aa,bb,cc〉) = 〈aabbcc〉.

The LCFRS rules that generate the cross-serial dependencies exemplified in Figure 1 are given in Figure 5. The second VP rule for instance tells us that a potentially discontinuous VP can consist of an NP in its first component and the head verb in its second component. This is the case for the VP de kinderen zwemmen in (1). The S rule tells us that we can have an embedded discontinuous VP that is wrapped around the head verb. This is the case for the embedded VP Piet de kinderen helpen zwemmen that has a gap containing the verb zag.

Figure 5.  An LCFRS that generates the cross-serial dependencies from Figure 1.

Download figure to PowerPoint

image

The number of yield components is fixed for each non-terminal. It is called the fan-out of the non-terminal. In Figure 3, S has fan-out 1 and A fan-out 3.

2.2. Definitions

In the following, we assume that there is a set of variables V that we use for our LCFRS rules.

Definition 1 (LCFRS)

A LCFRS is a tuple 〈N,T,P,S〉 where

  • (a)
    N is a finite set of non-terminals with a function dim: inline image that determines the fan-out of each A  ∈  N;
  • (b)
    T is a finite set of terminals disjoint from V;
  • (c)
    S  ∈  N is the start symbol with dim(S) = 1;
  • (d)
    P is a finite set of rules
    • image
    for m ≥ 0 where A,A1,…,Am  ∈  N, inline image for 1 ≤ i ≤ m,1 ≤ j ≤ dim(Ai) and αi  ∈  (TV)* for 1 ≤ i ≤ dim(A). For all r  ∈  P, it holds that every variable X occurring in r occurs exactly once in the left-hand side and exactly once in the right-hand side of r.

Now we define the tuples of terminal strings yielded by a non-terminal in an LCFRS. As a special case, the components of the yield of the start symbol are then the strings generated by the grammar. In our example from Figure 3 we have for instance yield(A) = {〈an,bn,cn〉 | n ≥ 1} and yield(S) = {〈anbncn〉 | n ≥ 1}.

Definition 2 (Yield, language)

Let G = 〈N,T,P,S〉 be an LCFRS.

  • 1
     For every A  ∈  N, we define the yield of A, yield(A) as follows:
    • (a)
      For every inline image, inline image;
    • (b)
      For every rule
      • image
      and all 〈wi,1,…,wi,dim(Ai)〉  ∈  yield(Ai) for 1 ≤ i ≤ m: 〈f(α1),…,f(αdim(A))〉  ∈  yield(A) where f is defined as follows:
      • (i)
        f(t) = t for all t  ∈  T,
      • (ii)
        inline image for all 1 ≤ i ≤ m,1 ≤ j ≤ dim(Ai) and
      • (iii)
        f(xy) = f(x)f(y) for all x,y  ∈  (TV)+.
    • (c)
      Nothing else is in yield(A).
  • 2
     The language of G is then L(G) = {w | 〈w〉  ∈  yield(S)}.

We distinguish different types of LCFRS depending on the maximal fan-out of the non-terminals: An LCFRS G is of fan-outk if the fan-outs of its non-terminals are lower or equal to k. If k is the fan-out of G, then G is called a k-LCFRS. A language that can be generated by a k-LCFRS is called a k-LCFR language (k-LCFRL). Furthermore, the right-hand side length of a rule p  ∈  P is called the rank of p and an LCFRS G is called of rankr if the rank ot its rules is bounded by r.

2.3. Formal Properties

There are special types of rewriting rules that facilitate formal proofs and parsing techniques. Firstly, one would like to be able to assume that the order of the components in the yields of the non-terminals corresponds always to their order in the input. This can be guaranteed if all the rules in the grammar are monotone. A rule r  ∈  P is called monotone (Michaelis 2001b) if for every right-hand side non-terminal A in r and each pair X1, X2 of components of A in the right-hand side of r, X1 precedes X2 in the right-hand side iff X1 precedes X2 in the left-hand side. Secondly, it is an advantage to have a grammar where none of the left-hand side components in the rules is empty. Such rules are called ɛrules. An LCFRS is called ɛfree if it either contains no ɛ-rules or S(ɛ) → ɛ is the only rule with empty components and S does not appear in any right-hand side.

Concerning these properties, it has been shown that for every k-LCFRS G there is 1. an equivalent ɛ-free k-LCFRS G (Boullier 1998; Seki et al. 1991) and 2. an equivalent monotone k-LCFRS Gmon (Kracht 2003; Michaelis 2001b). However, the size of the grammar can increase exponentially in the size of the original grammar when transforming it into its monotone or ɛ-free equivalent.

In the context of non-deleting MCFGs, which are exactly like LCFRSs, the property monotone is called non-permuting (Kanazawa 2009), while for simple RCGs it is called ordered (Villemonte de La Clergerie 2002).

As shown in (Seki et al. 1991), k-LCFRLs are closed under substitution, union, concatenation, Kleene star and intersection with regular languages. Furthermore, (Seki et al. 1991) show a pumping lemma that states that for every infinite k-LCFRL, we can find a word in this language containing 2k iterable parts. In combination with the closure properties, this lemma can, for instance, be used to show that the so-called (2k + 1)-counting language inline image is no k-LCFRL. A (k + 1)-LCFRS for this language can be given. The rules S(XYZ) → A(X,Y,Z), A(a1Xa2,a3Ya4,a5Z)→A(X,Y,Z), A(a1a2,a3a4,a5) → ɛ for example constitute a 3-LCFRS for the 5-counting language. This result concerning the counting languages shows that the language of k-LCFRLs (1 ≤ k) form a proper hierarchy.

2.4. Well-Nested LCFRS

A restricted form of LCFRS that has attracted a lot of interest recently is well-nested LCFRS (Gómez-Rodríguez et al. 2010; Kanazawa 2009; Mönnich 2010). Originally, the term ‘‘well-nested’’ stems from dependency grammar. Intuitively, an LCFRS is well-nested if it does not allow two pairs of components from two different non-terminals to be in a cross-serial configuration. In other words, the rules A(α1,…,αdim(A)) →inline imageinline image must be monotone and for all 1 ≤ i,j ≤ m, ij and 1 ≤ k < dim(Ai): if inline image precedes inline image and inline image precedes inline image in α1αdim(A), then inline image precedes inline image in α1αdim(A). In Figure 6 for example, in the first rule, the composition of the yields of A and B is ill-nested.

Figure 6.  A non well-nested LCFRS for the language {anbmanbm}.

Download figure to PowerPoint

image

It seems that in many cases where we might want to use LCFRSs, well-nested LCFRSs are enough (Gómez-Rodríguez et al. 2010). There are, however, cases of ill-nested structures, as pointed out in (Chen-Main and Joshi 2012; Maier and Lichte 2011). Examples are the extraposed relative clause constructions from (2) and (3).

3. Mild Context-Sensitivity

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Linear Context-Free Rewriting Systems
  5. 3. Mild Context-Sensitivity
  6. 4. Parsing
  7. 5. Data-Driven Parsing
  8. 6. LCFRS and Dependency Grammar
  9. 7. Conclusion
  10. Short Biography
  11. Works Cited

A central problem in computational linguistics is the determination of the complexity of natural languages. In the 1980s, it became clear that CFGs were not powerful enough to describe all natural language phenomena (Bresnan et al. 1982; Huybregts 1984; Shieber 1985) and, as a consequence, the question of the appropriate context-sensitive formalism for natural languages arose. In an attempt to characterize the amount of context-sensitivity required, Joshi (1985) introduced the notion of mild context-sensitivity. Mild context-sensitivity characterizes formalisms via the class of languages that they generate. A formalism inline image that generates the class inline image of languages is mildly context-sensitive if

  • 1
    inline image contains all context-free languages.
  • 2
    inline image contains languages describing a limited amount of cross-serial dependencies.
  • 3
     The languages in inline image are polynomially parsable, i.e., inline image.
  • 4
     The languages in inline image have the constant growth property.

In this case, inline image is called a mildly context-sensitive class of languages.

The constant growth property roughly means that, if we order the words (i.e., sentences in the case of natural language) according to their length, then the length grows in a linear way. {a2n | n ≥ 0} for example does not have the constant growth property. See (Weir 1988) for a formal definition.

One of the weakest mildly context-sensitive formalisms is Tree Adjoining Grammar (TAG) (Joshi et al. 1975; Joshi and Schabes 1997). The languages generated by TAGs are contained in the set of 2-LCFRLs. This makes TAG rather efficient to process but sometimes too limited to describe all natural language phenomena in an adequate way.

So far, it has not been possible to identify a grammar formalism that generates the largest possible mildly context-sensitive class of languages. The closest approximation we know of are LCFRSs. Their mild context-sensitivity has been shown by (Vijay-Shanker et al. 1987).

Joshi's hypothesis that natural languages are mildly context-sensitive has been questioned by two natural language phenomena that seem to display non-constant growth when being iterated, namely case stacking in Old Georgian (Michaelis and Kracht 1996) and Chinese number names (Radzinski 1991). The analyses of Old Georgian, however, are based on very few data since there are no speakers of Old Georgian today. Therefore, it is hard to tell whether there is really an infinite progression of case stacking possible. Concerning Chinese number names, it is not totally clear to what extent this constitutes a syntactic phenomenon. Furthermore, the constant growth property (Joshi 1985, Weir 1988) is an existential condition since it only requires that there be some constructions where iteration under constant growth is possible. It does not say that all iterations must be of constant growth. Therefore so far there is no reason to doubt that natural languages are mildly context-sensitive.

There has been a lot of discussion recently around the notion of mild context-sensitivity. In particular its tacit identification with LCFRS has been questioned.

Several authors have observed that the class of well-nested LCFRSs is important because of its formal properties, its equivalence to various other formalisms and because of the fact that its expressive capacity might be enough to account for natural languages (Gómez-Rodríguez et al. 2010; Kanazawa 2009; Mönnich 2010). If we assume that mild context-sensitivity is a term that specifies the complexity class necessary for natural languages, this seems to call for a revision of the notion of mild context-sensitivity towards well-nested LCFRS.

Another perspective points in the other direction, namely beyond LCFRS. There are languages that are polynomial and of constant growth without being LCFRLs. This seems to call for the search for other mildly context-sensitive formalisms that extend LCFRS (Kallmeyer 2010a).

4. Parsing

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Linear Context-Free Rewriting Systems
  5. 3. Mild Context-Sensitivity
  6. 4. Parsing
  7. 5. Data-Driven Parsing
  8. 6. LCFRS and Dependency Grammar
  9. 7. Conclusion
  10. Short Biography
  11. Works Cited

Many recent LCFRS applications concern parsing (Chiang 2004; Evang and Kallmeyer 2011; Gildea 2010; Gómez-Rodríguez et al. 2010; Kallmeyer and Maier 2010; Levy 2005; Maier 2010). Besides using LCFRS for data-driven parsing, which we will sketch in Section 5, LCFRS is also used as an intermediate formalism for parsing Stabler's (1997) MG (exploiting the equivalence between LCFRS and MG (Michaelis 2001a, c)), for parsing TAG (Boullier 1998, 1999) and multicomponent extensions of TAG (Parmentier et al. 2008) and for parsing the Grammatical Framework (Ljunglöf 2004).

In the following, we present the non-directional bottom-up parsing algorithm (CYK parser) for LCFRS from (Seki et al. 1991), applied to grammars of rank 2. We formulate the algorithm using deduction rules (Pereira and Warren 1983, Shieber et al. 1995, Sikkel 1997). The algorithm will be extended to probabilistic LCFRS (Kallmeyer and Maier 2010; Maier and Kallmeyer 2010f) in Section 5. Other LCFRS parsing algorithms can be found in (Burden and Ljunglöf 2005; Villemonte de La Clergerie 2002).

As already mentioned, we can assume our grammars without loss of generality to be ɛ-free and monotone. We further assume that the left-hand sides of the rules either contain only variables or only terminal symbols. Finally, we even assume that the grammars are binarized, which means that their rules are of rank  ≤ 2. This can be assumed since every LCFRS can be transformed into an LCFRS of rank 2 (Gildea 2010; Gómez-Rodríguez et al. 2009; Kallmeyer 2010b). We will discuss different binariztion strategies in Section 5.3. We furthermore assume unary rules (rank 1) to be such that the left-hand side fan-out equals the right-hand side fan-out.

Our items are interpreted with respect to the input string w. They have the form [A,〈〈l1,r1〉,…,〈ldim(A),rdim(A)〉〉] and represent a tuple in the yield of a non-terminal A where the index pairs give the start and end positions of the components of the tuple. The first item in the table in Figure 7 tells us that the part between positions 0 and 1 (i.e., the first symbol) of the input is in the yield of Ta.

Figure 7.  Sample CYK parsing.

Download figure to PowerPoint

image

We will explain the details of the algorithm while going through the example in Figure 7. The items we find are stored in a table, a so-called chart.

There are two types of rules in this algorithm that serve to deduce new items. The scan rule is applied with respect to a terminating rule (i.e., a rule with an empty right-hand side and only terminal symbols in the left-hand side components). For a rule A(α1,…,αk) → ɛ, whenever we find substrings of the form α1,…,αk in the input (in this order), we conclude that these substrings can be the components of an A category and we add a corresponding item to our chart. This gives the first three items in Figure 7.

  • image

The rules unary and binary move bottom-up from the completed right-hand side of a rule to its left-hand side. A unary rule has the form A(X1,…,Xk) → B(X1,…,Xk). It is applied whenever we already have a B item. From this, we can deduce an A-item with the same yield:

  • image

For the application of a binary rule, we need the notion of instantiation of a rule with respect to a given input (Boullier 2000). A rule instantiation specifies the computation of a left-hand side yield element from elements in the yields of the righthand side non-terminals based on the corresponding vectors of index pairs. An instantiated rule is a rule in which variables and arguments are consistently replaced by index pairs. This means that adjacent variables must be mapped onto index pairs whose concatenation is defined. Binary performs a bottom-up parsing step using a binary rule. Examples are given in Figure 7.

  • image

The goal item of our algorithm is [S,〈〈0, | w | 〉〉] since this item represents an S category spanning the entire input. The last item in Figure 7 is a goal item.

This algorithm is polynomial in the length of the input string. More precisely, it has the following complexity: Let us assume that the fan-out of the binarized LCFRS is k. Then we have maximal 2k range boundaries in each of the antecedent items of the rule binary. For variables X1,X2 being in the same left-hand side argument of the rule that is applied such that X1 is left of X2 and no other variables in between, the right boundary of X1 gives us immediately the left boundary of X2. In the worst case, A, B, C all have arity k and each left-hand side argument contains only two variables. This leads to 3k independent range boundaries and consequently to a time complexity of inline image for the entire algorithm. Note, however, that this is only the complexity in the size of the length of the input string. The universal recognition problem (i.e., recognition in the size of the grammar and the input word) is NP-hard for LCFRS (Kaji et al. 1992; Satta 1992).

5. Data-Driven Parsing

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Linear Context-Free Rewriting Systems
  5. 3. Mild Context-Sensitivity
  6. 4. Parsing
  7. 5. Data-Driven Parsing
  8. 6. LCFRS and Dependency Grammar
  9. 7. Conclusion
  10. Short Biography
  11. Works Cited

The use of LCFRS for data-driven parsing was first proposed by (Levy 2005) in order to deal with long-distance dependencies in English and the discontinuities that arise out of them. The first efficient LCFRS parser usable for treebank-based parsing is rparse1 (Kallmeyer and Maier 2010; Maier 2010; Maier and Kallmeyer 2010). It has been used for parsing German (using NeGra and Tiger) (Kallmeyer and Maier 2010; Maier 2010; Maier and Kallmeyer 2010) and English (using the PTB) (Evang and Kallmeyer 2011).

5.1. PLCFRS Parsing

The definition of a probabilistic LCFRS is a straightforward extension of the definition of PCFG (Kato et al. 2006; Levy 2005):

Definition 3 (PLCFRS)

A probabilistic LCFRS (PLCFRS) 〈N,T,P,S,p〉 is such that 〈N,T,P,S〉 is a LCFRS and p:P→[0..1] a function such that for all A  ∈  N:

  • image

The deduction rules for the probabilistic version of the CYK parser presented above are shown in Figure 8. The weight is the logarithm of the inside probability of a category and its span. (Kallmeyer and Maier 2010; Maier 2010) perform a weighted deductive parsing (Nederhof 2003), based on these deduction rules.

Figure 8.  Weighted CYK deduction system.

Download figure to PowerPoint

image

5.2. Grammar Extraction

(Maier and Søgaard 2008) propose a straight-forward extraction of LCFRS rules from trees with crossing branches that occur for instance in the German NeGra and Tiger treebanks. The extraction algorithm roughly interprets the treebank trees as LCFRS derivation trees. Every internal node together with its daughters yields a rewriting rule. Consider for instance the tree in Figure 9. The S node has two daughters, a VMFIN node and a VP node. This yields a rule S→VP VMFIN. The VP is discontinuous with two components that wrap around the yield of the VMFIN. Consequently, the LCFRS rule is S(XYZ)→VP(X,Z) VMFIN(Y). In addition, one has to distinguish between occurrences of the same non-terminal with different fan-outs, for instance a VP without a gap (fan-out 1), a VP with a single gap (fan-out 2), and so on. In the corresponding LCFRS, one has different non-terminals VP1, VP2 and so on.

Figure 9.  A sample tree from NeGra.

Download figure to PowerPoint

image

The probabilities are then computed based on the frequencies of rules in the treebank, using a Maximum Likelihood estimator (MLE). Such an estimation has been used before (Kato et al. 2006; Levy 2005).

For a treebank such as the PTB that annotates long-distance dependencies using traces, a transformation into a crossing branches format is needed first (Evang and Kallmeyer 2011) in order to apply the (Maier and Søgaard 2008) extraction algorithm.

5.3. Binarization

Similarly to the transformation of a CFG into Chomsky Normal Form (CNF), an LCFRS can be binarized, i.e., transformed into an LCFRS of rank 2. As in the CFG case, in the transformation, we introduce a non-terminal for each right-hand side longer than 2 and split the rule into two rules, using this new intermediate non-terminal. This is repeated until all right-hand sides are of length ≤2 (Gómez-Rodríguez et al. 2009; Kallmeyer 2010b). A rule S(XYZUVW) →  A(X,U) B(Y,V) C(Z,W) for instance, when binarizing such that a new non-terminal C1 for B and C is introduced, is replaced with two rules S(XPUQ) →  A(X,U) C1(P,Q) and C1(YZ,VW) →  B(Y,V) C(Z,W).

A crucial difference between CFG and LCFRS is that in the latter the righthand sides of rules are in principle not ordered. As a convention, we ordered them according to the linear precedence of the first components. However, any other order is possible and furthermore, during binarization, any partition of the righthand side can lead to a new intermediate rule. This leaves room for different binarization strategies. (Gómez-Rodríguez et al. 2009) treats the aspect of finding a binarization that yields a minimal fan-out while (Gómez-Rodríguez and Satta 2009; Sagot and Satta 2010) treat the special case of binarizing a 2-LCFRS, if possible without increasing its fan-out. Besides these proposals that aim at reducing the number of variables per rule as far as possible, one can also choose a head-outward binarization along the lines of (Klein and Manning 2003). In this binarization strategy, the head is the deepest embedded element. Starting from the head, we add first the sisters to the right and then the ones to the left. Note that here again, in principle any order is possible for adding the non-head righthand side elements. (Crescenzi et al. 2011) show that the problem of finding the optimal choice in terms of minimizing the fan-out is NP-hard.

The binarization sketched above introduces unique new non-terminals for every rule that needs to be binarized. This produces a large amount of non-terminals and fails to capture certain generalizations. Therefore, (Maier and Kallmeyer 2010) apply Markovization (Collins 1999) to LCFRS binarization: They introduce only a single new non-terminal for the new binarizing rules and add vertical and horizontal context from the original trees to each occurrence of this new non-terminal. As vertical context, the first v labels on the path from the root node of the tree that gets binarized to the root of the entire treebank tree are used. As horizontal context, during binarization of a rule inline image, for the new non-terminal that comprises the right-hand side elements AiAm (for some 1 ≤ i ≤ m), we add the first h elements of Ai,Ai−1,…,A0. Figure 10 gives an example (the superscript is the vertical context and the subscript the horizontal context of the new non-terminal X).

Figure 10.  Sample Markovization with v = 1,h = 2.

Download figure to PowerPoint

image

For parsing German, after having experimented with different binarization orders, different Markovizations and with a small amount of category splitting, (Kallmeyer and Maier 2010) report a labeled F-score of 77 with the EVALB evaluation metric, using the NeGra treebank. This is comparable to PCFG results on the same data. (Evang and Kallmeyer 2011) use PLCFRS-parsing for English, experimenting with the PTB, and obtain a labeled F-score of 79, which is slightly better than a PCFG parser on the same data after removal of traces, using the same Markovization setting and splittings.

6. LCFRS and Dependency Grammar

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Linear Context-Free Rewriting Systems
  5. 3. Mild Context-Sensitivity
  6. 4. Parsing
  7. 5. Data-Driven Parsing
  8. 6. LCFRS and Dependency Grammar
  9. 7. Conclusion
  10. Short Biography
  11. Works Cited

6.1. LCFRS and Non-projective Dependencies

Kuhlmann (2010) has investigated relations between dependency languages and lexicalized grammars and, in this context, he has shown that the dependency languages induced by lexicalized k-LCFRSs are exactly the regular dependency languages over dependency structures with block-degree at most k, i.e., with at most k non-adjacent blocks in the yield of a lexical item. See (Kuhlmann 2010) for more details. This interesting result opens a new perspective on grammars describing non-projective dependencies.

In order to illustrate the intuition behind this result, consider the lexicalized LCFRS rules in Figure 11 that describe our initial example (1) of cross-serial dependencies, repeated here as (4). If the non-terminals are understood as dependencies that link the left-hand side lexical head to the lexical head of the respective righthand side non-terminal, we obtain the desired non-projective dependency structure.

Figure 11.  Lexicalized LCFRS rules for cross-serial dependencies.

Download figure to PowerPoint

image
(4)... JanPietdekinderenzaghelpenzwemmen
... JanPietthechildrensawhelpswim
‘... Jan saw Piet help the children swim’

6.2. Data-Driven LCFRS Dependency Parsing

Kuhlmann and Satta (2009) propose to exploit the ideas from (Kuhlmann 2010) for data-driven dependency parsing. They show that dependency treebanks allow for the extraction of LCFRSs in a rather straight-forward way. As mentioned above, the advantage of using an LCFRS (over a CFG) is that non-projective dependencies can be described. As an example, consider the dependency tree and corresponding LCFRS rules in Figure 12 (from the dependency version of NeGra).

Figure 12.  LCFRS rules extracted from a dependency tree.

Download figure to PowerPoint

image

Gildea (2010) and Maier and Lichte (2011) have used this extraction algorithm in order to investigate the LCFRSs arising from dependency treebanks. It turned out that the resulting LCFRSs were ill-nested, though only a very low percentage of ill-nested structures can be found.

Maier and Kallmeyer (2010) have used a dependency version of the NeGra treebank and the Prague Dependency Treebank 2.0 for experiments with grammar-based dependency parsing using LCFRS. The accuracy is more than 20 percentage points lower than with state-of-the-art dependency parsers. However, this is probably to a large extent due to the lack of meaningful non-terminals in the extracted LCFRS. So, there might be room for improvements if state-of-the-art splitting techniques are applied.

7. Conclusion

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Linear Context-Free Rewriting Systems
  5. 3. Mild Context-Sensitivity
  6. 4. Parsing
  7. 5. Data-Driven Parsing
  8. 6. LCFRS and Dependency Grammar
  9. 7. Conclusion
  10. Short Biography
  11. Works Cited

This paper has introduced LCFRS, a formalism that is interesting for natural language processing because of the complexity class that it represents and because of its capacity to describe discontinuous constituents and non-projective dependencies. The latter has led to a recent increased interest in using LCFRS for data-driven parsing.

Short Biography

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Linear Context-Free Rewriting Systems
  5. 3. Mild Context-Sensitivity
  6. 4. Parsing
  7. 5. Data-Driven Parsing
  8. 6. LCFRS and Dependency Grammar
  9. 7. Conclusion
  10. Short Biography
  11. Works Cited

Laura Kallmeyer is a professor for Computational Linguistics at the University of Düesseldorf. She has received her PhD in Computational Linguistics at the University of Tüebingen on model-theoretic definitions of Tree Adjoining Grammars (TAG). Afterwards, she spent 2 months as a postdoc researcher at the University of Pennsylvania and 5 years at the University Paris 7. During this time her research focussed on semantics in TAG. After that, she has directed an Emmy Noether Research group at the University of Tüebingen, concentrating on tree-based grammars for free word order languages and on approaches to parsing using Linar Context-Free Rewriting Systems.

Footnotes

Works Cited

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Linear Context-Free Rewriting Systems
  5. 3. Mild Context-Sensitivity
  6. 4. Parsing
  7. 5. Data-Driven Parsing
  8. 6. LCFRS and Dependency Grammar
  9. 7. Conclusion
  10. Short Biography
  11. Works Cited
  • Boullier, P. 1998. A Proposal for a Natural Language Processing Syntactic Backbone. Technical Report 3342, INRIA.
  • Boullier, P.. 1999. On TAG parsing. TALN 99, 6e conférence annuelle sur le Traitement Automatique des Langues Naturelles, 7584. Corse: Cargése.
  • Boullier, P.. 2000. Range concatenation grammars. Proceedings of the Sixth International Workshop on Parsing Technologies (IWPT2000), 5364. Italy: Trento.
  • Bresnan, J., R. M. Kaplan, S. Peters, and A. Zaenen. 1982. Cross-serial dependencies in Dutch. Linguistic Inquiry 13(4). 61335. Reprinted in Bach, Marxh and Safran-Naveh 1987).
  • Burden, H., and P. Ljunglöf. 2005. Parsing linear context-free rewriting systems. IWPT’05, 9th International Workshop on Parsing Technologies, 1117. Canada: Vancouver.
  • Chen-Main, J., and A. Joshi. forthcoming. A dependency perspective on the adequacy of tree local multi-component tree adjoining grammar. Journal of Logic and Computation.
  • Chiang, D. 2004. Mildly context-sensitive grammars for estimating maximum entropy parsing models. Proceedings of FGVienna: The 8th conference on Formal Grammar, ed. by G. Jaeger, P. Monachesi and G. Penn. Vienna: CSLI.
  • Collins, M. 1999. Head-driven statistical models for natural language parsing. PhD thesis, University of Pennsylvania, PA, USA.
  • Crescenzi, P., D. Gildea, A. Marino, G. Rossi, and G. Satta. 2011. Optimal head-driven parsing complexity for linear context-free rewriting systems. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ed. by Y. Matsumoto and R. Mihalcea, 4509. Oregon: Portland.
  • Evang, K. 2011. Parsing discontinuous constituents in English. Master's thesis, Eberhard Karls Universität Tübingen, Tübingen, Germany.
  • Evang, K., and L. Kallmeyer. 2011. PLCFRS parsing of english discontinuous constituents. Proceedings of the 12th International Conference on Parsing Technologies (IWPT 2011), ed. by H. Bunt, J. Nivre and Ö. Çetinoǧlu, 10416. Dublin, Ireland.
  • Gildea, D. 2010. Optimal parsing strategies for linear context-free rewriting systems. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, ed. by R. Kaplan, J. Burstein, M. Harper and G. Penn, 76976. Los Angeles, California: Association for Computational Linguistics.
  • Gómez-Rodríguez, C. and G. Satta. 2009. An optimal-time binarization algorithm for linear context-free rewriting systems with fan-out two. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, ed. by R. Kaplan, J. Burstein, M. Harper and G. Penn, 98593. Suntec, Singapore: Association for Computational Linguistics.
  • Gómez-Rodríguez, C., M. Kuhlmann, G. Satta, and D. Weir. 2009. Optimal reduction of rule length in linear context-free rewriting systems. Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, ed. by M. Ostendorf, M. Collins, S. Narayanan, D. W. Oard and L. Vanderwende, 53947. Boulder, Colorado: Association for Computational Linguistics.
  • Gómez-Rodríguez, C., M. Kuhlmann, and G. Satta. 2010. Efficient parsing of well-nested linear context-free rewriting systems. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ed. by K.-Y. Su, J. Su, J. Wiebe and H. Li, 27684. Los Angeles, California: Association for Computational Linguistics.
  • Huybregts, R. 1984. The weak inadequacy of context-free phrase structure grammars. Van Periferie naar Kern, ed. by G. de Haan, M. Trommelen and W. Zonneveld, 8199. Dordrecht, Holland: Foris.
  • Joshi, A. K. 1985. Tree adjoining grammars: how much contextsensitivity is required to provide reasonable structural descriptions?. Natural Language Parsing, ed. by D. Dowty, L. Karttunen and A. Zwicky, 20650. Cambridge, UK: Cambridge University Press.
  • Joshi, A. K., and Y. Schabes. 1997. Tree-adjoning grammars. Handbook of Formal Languages, ed. by G. Rozenberg and A. Salomaa, 69123. Berlin: Springer.
  • Joshi, A. K., L. S. Levy, and M. Takahashi. 1975. Tree adjunct grammars. Journal of Computer and System Science 10. 13663.
  • Kaji, Y., R. Nakanishi, H. Seki, and T. Kasami. 1992. The universal recognition problems for multiple context-free grammars and for linear context-free rewriting systems. IEICE Transactions on Information and Systems E75–D(1). 7888.
  • Kallmeyer, L. 2010a. On mildly context-sensitive non-linear rewriting. Research on Language and Computation 8(4). 34163.
  • Kallmeyer, L.. 2010b. Parsing beyond context-free grammars. Heidelberg: Springer.
  • Kallmeyer, L., and W. Maier. 2010. Data-driven parsing with probabilistic linear context-free rewriting systems. Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), ed. by C.-R. Huang and D. Jurafsky, 53745. Beijing, China: Coling 2010 Organizing Committee.
  • Kallmeyer, L., W. Maier, and G. Satta. 2009. Synchronous rewriting in tree banks. Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09), ed. by É. Villemonte de la Clergerie, H. Bunt and L. Danlos, 6972. Paris: Association for Computational Linguistics.
  • Kanazawa, M. 2009. The pumping lemma for well-nested multiple context-free languages. DLT 2009, Vol. 5583 of LNCS, ed. by V. Diekert and D. Nowotka, 31225. Berlin Heidelberg: Springer.
  • Kato, Y., H. Seki, and T. Kasami. 2006. Stochastic multiple context-free grammar for rna pseudoknot modeling. Proceedings of The Eighth International Workshop on Tree Adjoining Grammar and Related Formalisms (TAG+8), ed. by T. Becker and L. Kallmeyer, 5764. Sydney, Australia: Association for Computational Linguistics.
  • Klein, D., and C. D. Manning. 2003. Accurate unlexicalized parsing. Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, 42330. Sapporo, Japan: Association for Computational Linguistics.
  • Kracht, M. 2003. The mathematics of language, number 63 in studies in generative grammar. Berlin: Mouton de Gruyter.
  • Kuhlmann, M. 2010. Dependency structures and lexicalized grammars, Vol. 6270 of LNCS. Berlin, Heidelberg, New York: Springer.
  • Kuhlmann, M., and G. Satta. 2009. Treebank grammar techniques for non-projective dependency parsing. Twelfth Conference of the European Chapter of the Association for Computational Linguistics, ed. by L. Lascarides, C. Gardent and J. Nivre, 47886. Athens, Greece: Association for Computational Linguistics.
  • Levy, R. 2005. Probabilistic models of word order and syntactic discontinuity. PhD thesis, Stanford University, Stanford, CA, USA.
  • Ljunglöf, P. 2004. Expressivity and complexity of the grammatical framework. PhD thesis, Department of Computer Science, Gothenburg University and Chalmers University of Technology, Gothenburg, Sweden.
  • Maier, W. 2010. Direct parsing of discontinuous constituents in German. Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages, ed. by D. Seddah, S. Kübler and R. Tsarfaty, 5866. Los Angeles, CA, USA: Association for Computational Linguistics.
  • Maier, W., and L. Kallmeyer. 2010. Discontinuity and non-projectivity: using mildly context-sensitive formalisms for data-driven parsing. Proceedings of the Tenth International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+10), ed. by S. Bangalore, R. Frank and M. Romero, 11926. New Haven: Linguistics Department, Yale University.
  • Maier, W., and T. Lichte. 2011. Characterizing discontinuity in constituent treebanks, Formal Grammar. 14th International Conference, FG 2009. Bordeaux, France, July 25-26, 2009. Revised Selected Papers, Vol. 5591 of Lecture Notes in Artificial Intelligence, ed. by P. de Groote, M. Egg and L. Kallmeyer, 16782. Berlin/Heidelberg/New York: Springer-Verlag.
  • Maier, W., and A. Søgaard. 2008. Treebanks and mild context-sensitivity. Proceedings of FG 2008: The 13th conference on Formal Grammar, ed. by Philippe de Groote, 6176. Hamburg, Germany: CSLI Publications.
  • Marcus, M., G. Kim, M. A. Marcinkiewicz, R. MacIntyre, A. Bies, M. Ferguson, K. Katz, and B. Schasberger. 1994. The penn treebank: annotating predicate argument structure, HLT ’94. Proceedings of the Workshop on Human Language Technology, 1149. Morristown, NJ, USA: Association for Computational Linguistics.
  • Michaelis, J. 2001a. Derivational minimalism is mildly context-sensitive. Logical Aspects of Computational Linguistics, Vol. 2014 of LNCS/LNAI, ed. by M. Moortgat, 17998. Berlin, Heidelberg: Springer.
  • Michaelis, J.. 2001b. On formal properties of minimalist grammars. PhD thesis, Potsdam University, Potsdam, Germany.
  • Michaelis, J.. 2001c. Transforming linear context-free rewriting systems into minimalist grammars. Logical Aspects of Computational Linguistics, Vol. 2099 of LNCS/LNAI, ed. by C. R. Philippe de Groote and Glyn Morrill, 22844. Berlin, Heidelberg: Springer.
  • Michaelis, J., and M. Kracht. 1996. Semilinearity as a syntactic invariant. Logical Aspects of Computational Linguistics, LNCS/LNAI Vol. 1328, ed. by C. Retoré, 32945. Springer: Berlin, Heidelberg.
  • Mönnich, U. 2010. Well-nested tree languages and attributed tree transducers. Proceedings of the Tenth International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+10), ed. by S. Bangalore, R. Frank and M. Romero, 3543. New Haven: Linguistics Department, Yale University.
  • Nederhof, M.-J. 2003. Weighted deductive parsing and Knuth's. Algorithm, Computational Linguistics 29(1). 13543.
  • Parmentier, Y., L. Kallmeyer, W. Maier, T. Lichte, and J. Dellert. 2008. TuLiPA: a syntax-semantics parsing environment for mildly context-sensitive formalisms. Proceedings of the Ninth International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+9), 1218. Tübingen.
  • Pereira, F. C. N., and D. Warren. 1983. Parsing as deduction. 21st Annual Meeting of the Association for Computational Linguistics, 13744. Cambridge, MA: Association for Computational Linguistics.
  • Radzinski, D. 1991. Chinese number-names, tree adjoining languages, and mild context-sensitivity. Computational Linguistics 17. 27799.
  • Sagot, B., and G. Satta. 2010. Optimal rank reduction for linear context-free rewriting systems with fan-out two. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ed. by J. Hajič, S. Carberry, S. Clark and J. Nivre, 52533. Uppsala, Sweden: Association for Computational Linguistics.
  • Satta, G. 1992, Recognition of linear context-free rewriting systems. Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics. Newark, DE, USA: Association for Computational Linguistics.
  • Savitch, W. J., E. Bach, W. Marxh, and G. Safran-Naveh. (eds.) 1987. The formal complexity of natural language. Reidel, Dordrecht, Holland: Studies in Linguistics and Philosophy.
  • Seki, H., T. Matsumura, M. Fujii, and T. Kasami. 1991. On multiple context-free grammars. Theoretical Computer Science 88(2). 191229.
  • Shieber, S. M. 1985. Evidence against the context-freeness of natural language. Linguistics and Philosophy 8. 33343. Reprinted in (Savitch et al. 1987).
  • Shieber, S. M., Y. Schabes, and F. C. N. Pereira. 1995. Principles and implementation of deductive parsing. Journal of Logic Programming 24(1 and 2). 336.
  • Sikkel, K. 1997. Parsing schemata, texts in theoretical computer science. Berlin, Heidelberg, New York: Springer-Verlag.
  • Skut, W., B. Krenn, T. Brants, and H. Uszkoreit. 1997. An annotation scheme for free word order languages. Proceedings of the Fifth Conference on Applied Natural Language Processing (ANLP), 8895. Washington, D.C: Association for Computational Linguistics.
  • Stabler, E. P. 1997. Derivational minimalism. Logical aspects of computational linguistics, number 1328 in lecture notes in computer science, ed. by C. Retoré, 6895. NY: Springer-Verlag.
  • Vijay-Shanker, K., D. J. Weir, and A. K. Joshi. 1987. Characterizing structural descriptions produced by various grammatical formalisms. Proceedings of the 25th Annual Meeting of the Association for Computational Linguistics. Stanford: Association for Computational Linguistics.
  • Villemonte de La Clergerie, E. 2002. Parsing mildly context-sensitive languages with thread automata, COLING 2002: The 19th International Conference on Computational Linguistics.
  • Weir, D. J. 1988. Characterizing mildly context-sensitive grammar formalisms. PhD thesis, University of Pennsylvania, Philadelphia, USA.