Abstract
- Top of page
- Abstract
- 1. Introduction
- 2. Linear Context-Free Rewriting Systems
- 3. Mild Context-Sensitivity
- 4. Parsing
- 5. Data-Driven Parsing
- 6. LCFRS and Dependency Grammar
- 7. Conclusion
- Short Biography
- Works Cited
This paper introduces Linear Context-Free Rewriting Systems, a mildly context-sensitive grammar formalism that displays a range of interesting formal properties and that has attracted a lot of interest in the context of natural language parsing recently because of its capacity to describe discontinuous constituents and non-projective dependencies. Besides a presentation of the formalism, the paper includes a discussion of recent LCFRS applications in parsing and dependency grammar.
1. Introduction
- Top of page
- Abstract
- 1. Introduction
- 2. Linear Context-Free Rewriting Systems
- 3. Mild Context-Sensitivity
- 4. Parsing
- 5. Data-Driven Parsing
- 6. LCFRS and Dependency Grammar
- 7. Conclusion
- Short Biography
- Works Cited
Linear Context-Free Rewriting Systems (LCFRSs) (Vijay-Shanker et al. 1987) extend context-free grammars (CFGs) in a natural way such that non-terminals can span tuples of strings that need not be adjacent. This is an attractive property for modelling natural languages, both for discontinuous constituents as well as for non-projective dependencies. This property, together with the fact that many CFG techniques for parsing can be extended to LCFRS has recently led to an increased interest in using LCFRSs for natural language processing.
There are different reasons why LCFRSs are interesting for computational linguistics. Firstly, LCFRS represents an important complexity class concerning natural languages. In the 1980s it became clear that natural languages are not context-free. The question that arose was how far beyond CFG one has to go in order to capture natural languages. In this context, (Joshi 1985) proposed the class of mildly context-sensitive languages as a class that contains natural languages while still being computationally tractable. The formalism LCFRS has always been linked to the notion of mild context-sensitivity, which refers to extensions of CFG that can be parsed in polynomial time and that generate languages of constant growth. LCFRS belongs to this class and, so far, we do not know of any other mildly context-sensitive formalism generating a larger set of languages. Therefore LCFRS is tacitly assumed to characterize mild context-sensitivity.
Another reason why LCFRSs are interesting is that there is a range of grammar formalisms that has been proposed to model natural language phenomena and that turned out to be equivalent to LCFRS. These are, among others, set-local multicomponent TAG (MCTAG) (Joshi 1985; Joshi et al. 1975; Weir 1988) and Minimalist Grammar (MG) (Stabler 1997). This supports the assumption that LCFRS represents an important language class in the context of natural language modelling.
A sample construction with discontinuous constituents that CFGs cannot deal with properly are cross-serial dependencies in Dutch, as in (1), and in Swiss German (Bresnan et al. 1982; Shieber 1985).
| (1) | ... dat | Jan | Piet | de | kinderen | zag | helpen | zwemmen |
| ... that | Jan | Piet | the | children | saw | help | swim |
| ‘... that Jan saw Piet help the children swim’ |
In (1), we have a sequence of three noun phrases followed by three verbs where the first NP is an argument of the first verb, the second an argument of the second verb and the third NP an argument of the third verb. Furthermore, the VP of the last verb is embedded under the VP of the second verb which is, in turn, embedded under the S constituent headed by the first verb. If we adopt trees with crossing branches, this gives the constituency structure in Figure 1. This tree cannot be a CFG derivation tree but we can give LCFRS rules that generate it (see Figure 5).
This paper aims at introducing LCFRS and describing its current applications in natural language processing. The structure of the paper is as follows: Section 2 introduces the formalism, Section 3 discusses its mild context-sensitivity, Section 4 gives a bottom-up chart parsing algorithm for LCFRS, Section 5 reports on data-driven constituency parsing using LCFRS and Section 6 sketches the relation between LCFRS and dependency grammar.
3. Mild Context-Sensitivity
- Top of page
- Abstract
- 1. Introduction
- 2. Linear Context-Free Rewriting Systems
- 3. Mild Context-Sensitivity
- 4. Parsing
- 5. Data-Driven Parsing
- 6. LCFRS and Dependency Grammar
- 7. Conclusion
- Short Biography
- Works Cited
A central problem in computational linguistics is the determination of the complexity of natural languages. In the 1980s, it became clear that CFGs were not powerful enough to describe all natural language phenomena (Bresnan et al. 1982; Huybregts 1984; Shieber 1985) and, as a consequence, the question of the appropriate context-sensitive formalism for natural languages arose. In an attempt to characterize the amount of context-sensitivity required, Joshi (1985) introduced the notion of mild context-sensitivity. Mild context-sensitivity characterizes formalisms via the class of languages that they generate. A formalism
that generates the class
of languages is mildly context-sensitive if
- 1

contains all context-free languages.
- 2

contains languages describing a limited amount of cross-serial dependencies.
- 3
The languages in

are polynomially parsable, i.e.,

.
- 4
The languages in

have the
constant growth property.
In this case,
is called a mildly context-sensitive class of languages.
The constant growth property roughly means that, if we order the words (i.e., sentences in the case of natural language) according to their length, then the length grows in a linear way. {a2n | n ≥ 0} for example does not have the constant growth property. See (Weir 1988) for a formal definition.
One of the weakest mildly context-sensitive formalisms is Tree Adjoining Grammar (TAG) (Joshi et al. 1975; Joshi and Schabes 1997). The languages generated by TAGs are contained in the set of 2-LCFRLs. This makes TAG rather efficient to process but sometimes too limited to describe all natural language phenomena in an adequate way.
So far, it has not been possible to identify a grammar formalism that generates the largest possible mildly context-sensitive class of languages. The closest approximation we know of are LCFRSs. Their mild context-sensitivity has been shown by (Vijay-Shanker et al. 1987).
Joshi's hypothesis that natural languages are mildly context-sensitive has been questioned by two natural language phenomena that seem to display non-constant growth when being iterated, namely case stacking in Old Georgian (Michaelis and Kracht 1996) and Chinese number names (Radzinski 1991). The analyses of Old Georgian, however, are based on very few data since there are no speakers of Old Georgian today. Therefore, it is hard to tell whether there is really an infinite progression of case stacking possible. Concerning Chinese number names, it is not totally clear to what extent this constitutes a syntactic phenomenon. Furthermore, the constant growth property (Joshi 1985, Weir 1988) is an existential condition since it only requires that there be some constructions where iteration under constant growth is possible. It does not say that all iterations must be of constant growth. Therefore so far there is no reason to doubt that natural languages are mildly context-sensitive.
There has been a lot of discussion recently around the notion of mild context-sensitivity. In particular its tacit identification with LCFRS has been questioned.
Several authors have observed that the class of well-nested LCFRSs is important because of its formal properties, its equivalence to various other formalisms and because of the fact that its expressive capacity might be enough to account for natural languages (Gómez-Rodríguez et al. 2010; Kanazawa 2009; Mönnich 2010). If we assume that mild context-sensitivity is a term that specifies the complexity class necessary for natural languages, this seems to call for a revision of the notion of mild context-sensitivity towards well-nested LCFRS.
Another perspective points in the other direction, namely beyond LCFRS. There are languages that are polynomial and of constant growth without being LCFRLs. This seems to call for the search for other mildly context-sensitive formalisms that extend LCFRS (Kallmeyer 2010a).
4. Parsing
- Top of page
- Abstract
- 1. Introduction
- 2. Linear Context-Free Rewriting Systems
- 3. Mild Context-Sensitivity
- 4. Parsing
- 5. Data-Driven Parsing
- 6. LCFRS and Dependency Grammar
- 7. Conclusion
- Short Biography
- Works Cited
As already mentioned, we can assume our grammars without loss of generality to be ɛ-free and monotone. We further assume that the left-hand sides of the rules either contain only variables or only terminal symbols. Finally, we even assume that the grammars are binarized, which means that their rules are of rank ≤ 2. This can be assumed since every LCFRS can be transformed into an LCFRS of rank 2 (Gildea 2010; Gómez-Rodríguez et al. 2009; Kallmeyer 2010b). We will discuss different binariztion strategies in Section 5.3. We furthermore assume unary rules (rank 1) to be such that the left-hand side fan-out equals the right-hand side fan-out.
Our items are interpreted with respect to the input string w. They have the form [A,〈〈l1,r1〉,…,〈ldim(A),rdim(A)〉〉] and represent a tuple in the yield of a non-terminal A where the index pairs give the start and end positions of the components of the tuple. The first item in the table in Figure 7 tells us that the part between positions 0 and 1 (i.e., the first symbol) of the input is in the yield of Ta.
We will explain the details of the algorithm while going through the example in Figure 7. The items we find are stored in a table, a so-called chart.
There are two types of rules in this algorithm that serve to deduce new items. The scan rule is applied with respect to a terminating rule (i.e., a rule with an empty right-hand side and only terminal symbols in the left-hand side components). For a rule A(α1,…,αk) → ɛ, whenever we find substrings of the form α1,…,αk in the input (in this order), we conclude that these substrings can be the components of an A category and we add a corresponding item to our chart. This gives the first three items in Figure 7.
The rules unary and binary move bottom-up from the completed right-hand side of a rule to its left-hand side. A unary rule has the form A(X1,…,Xk) → B(X1,…,Xk). It is applied whenever we already have a B item. From this, we can deduce an A-item with the same yield:
For the application of a binary rule, we need the notion of instantiation of a rule with respect to a given input (Boullier 2000). A rule instantiation specifies the computation of a left-hand side yield element from elements in the yields of the righthand side non-terminals based on the corresponding vectors of index pairs. An instantiated rule is a rule in which variables and arguments are consistently replaced by index pairs. This means that adjacent variables must be mapped onto index pairs whose concatenation is defined. Binary performs a bottom-up parsing step using a binary rule. Examples are given in Figure 7.
The goal item of our algorithm is [S,〈〈0, | w | 〉〉] since this item represents an S category spanning the entire input. The last item in Figure 7 is a goal item.
This algorithm is polynomial in the length of the input string. More precisely, it has the following complexity: Let us assume that the fan-out of the binarized LCFRS is k. Then we have maximal 2k range boundaries in each of the antecedent items of the rule binary. For variables X1,X2 being in the same left-hand side argument of the rule that is applied such that X1 is left of X2 and no other variables in between, the right boundary of X1 gives us immediately the left boundary of X2. In the worst case, A, B, C all have arity k and each left-hand side argument contains only two variables. This leads to 3k independent range boundaries and consequently to a time complexity of
for the entire algorithm. Note, however, that this is only the complexity in the size of the length of the input string. The universal recognition problem (i.e., recognition in the size of the grammar and the input word) is NP-hard for LCFRS (Kaji et al. 1992; Satta 1992).