Applying modelling experiences from the past to shape crop systems biology: the need to converge crop physiology and functional genomics


Author for correspondence:
Xinyou Yin
Tel: +31 317 482348
Fax: +31 317 485572


Functional genomics has been driven greatly by emerging experimental technologies. Its development as a scientific discipline will be enhanced by systems biology, which generates novel, quantitative hypotheses via modelling. However, in order to better assist crop improvement, the impact of developing functional genomics needs to be assessed at the crop level, given a projected diminishing effect of genetic alteration on phenotypes from the molecule to crop levels. This review illustrates a recently proposed research field, crop systems biology, which is located at the crossroads of crop physiology and functional genomics, and intends to promote communications between the two. Past experiences with modelling whole-crop physiology indicate that the layered structure of biological systems should be taken into account. Moreover, modelling not only plays a role in data synthesis and quantitative prediction, but certainly also in heuristics and system design. These roles of modelling can be applied to crop systems biology to enhance its contribution to our understanding of complex crop phenotypes and subsequently to crop improvement. The success of crop systems biology needs commitments from scientists along the entire knowledge chain of plant biology, from molecule or gene to crop and agro-ecosystem.


A major agricultural challenge facing the world today is providing sufficient food, feed, fibre and fuel to meet the demand by a growing population, while maintaining the sustainability of various agricultural ecosystems, without simultaneously increasing the pressure on land use and resources. Breeding programmes, combined with agronomic management, have been successful in achieving this goal over recent decades. However, the pace at which progress has been made in achieving higher crop yields has become stagnant for several major crops, compared with that required by the growing demands (Cassman, 1999). Therefore, proposals have been made for holistic approaches to correct the imbalance between the two paces.

Plant systems biology has been proposed as such a new approach (Minorsky, 2003). In analogy to ‘systems biology’ for other and general biological systems (Hood, 1998; Ideker et al., 2001; Kitano, 2002), ‘plant systems biology’ aims to synthesize complex datasets from so-called ‘wet’ experiments in various plant genomic hierarchies (genome, transcriptome, proteome, metabolome and cellome) into useful mathematical models. It seeks to explain biological functioning in terms of ‘how things work’ in (sub)cellular units of plants.

There is no doubt that such systems biology, as currently defined, is challenging in the domain of molecular sciences. Functional genomics aims to discover the function of genes, typically through high-throughput experimental studies combined with bioinformatics tools for data analysis, in addition to functional analysis using ‘loss of function’ mutant genotypes or using naturally occurring genetic variation. Systems biology will facilitate the development of functional genomics as a scientific discipline by generating novel hypotheses via quantitative modelling. The scientific community is currently collecting new and detailed information at a much faster rate than the rate with which it can get a grip on this new knowledge, and can learn to understand, integrate and use it, at least partly because it does not tap into the resources provided by modelling at higher hierarchical scales. Hammer et al. (2004) have argued that the current definition of plant systems biology not only largely overlooks the rich history of crop systems modelling, but is probably also not the best approach to solve the real-world problems related to crop improvement for increased production, the ultimate goal that plant systems biology (Minorsky, 2003) wants to achieve in an effective and fast manner. The sense of optimism surrounding systems biology today is similar to the situation during the heyday of crop physiology in the 1970–1980s, when physiologists expected to bring significant advances to breeding programmes in major crops (Fischer, 2007). Physiological research has actually supported breeding and management, but the impact has probably been slower in coming, and smaller in extent, than physiologists would have imagined. In few cases has the physiological approach been applied to develop crop cultivars (Sinclair et al., 2004) and it is likely that the proponents of systems biology today will also underestimate the difficulties of the challenge to raise crop yields.

In this paper we first discuss the complexity of crop phenotypes relevant to the real-world attempt to increase crop production. We then review past experiences of crop physiologists in modelling the whole crop. We also discuss why it is important to integrate modern functional genomics with traditional crop physiology in order to achieve a solution to the growing demand for increased crop production. To this end, we stress the significance of a recently proposed approach – crop systems biology.

Complexities of phenotypes at the crop level

Many crop traits related to agricultural production are quantitative and complex in nature. These traits will affect or be reflected in one way or another in the crop yield. Crop physiologists have distinguished potential yields from actual yields, with the former being the yield level in the absence of abiotic (water, nutrients, temperature) and biotic (pathogens, insects, weeds, parasites) stresses, and the latter being the yield in the presence of these stresses, because approaches to raise yield depend on whether, which of, and to what extent these environmental limitations exist (Penning de Vries et al., 1989). There is considerable pressure to increase yield potential of major crops because the yield in farmers’ fields is increasingly approaching the ceiling of existing cultivars (Cassman, 1999; Fischer, 2007). In the face of resource scarcity (e.g. water) and pressure to reduce environmental pollution arising from fertilizers and pesticides, improving use efficiencies of various resources have also been receiving attention in agriculture.

Phenotypes at the crop level, irrespective of yield per se or resource use efficiencies, are extremely complex, regulated by multiple interacting genes whose effects and expression may be highly dependent on environmental conditions and developmental stages. These phenotypes are achieved not only by molecular pathways but also through multiple intermediate component processes and orchestrated feedback mechanisms, by both intra- and interplant competition, and by interactions between stress factors. Because of this competition, a change of one component may result in an often unexpected, but negative, consequence on other components, and yield ha−1 of a crop (i.e. the community of plants) cannot be simply predicted from the yield of its plants grown in isolation. In the case of cereal yield, crop physiologists and agronomists have used the following simple equation to analyse limitations to yield formation:

Grain yield ha−1 = (plants ha−1 × ears per plant) × (grains per ear × proportion of filled grain) × single filled-grain weight(Eqn 1)

where the two components in brackets are sometimes combined into ears ha−1 and filled grains per ear, respectively. Similar equations can be derived for other crops and traits. Because a series of interactions and feedbacks operate along a crop developmental cascade, the negative correlations among these yield components have been observed for a single genotype grown across a range of environmental conditions or management practices, or among multiple genotypes (e.g. individual lines of a segregating population) when grown in the same environment (Fig. 1). In other words, a significant increase of one component may not necessarily result in an increase of crop yield ha−1.

Figure 1.

The correlations of three yield components (ears ha−1, filled grains per ear, and filled-grain weight) among barley recombinant inbred lines (based on the data of Yin et al., 2002).

Recent genomic studies have often claimed that genes identified for one or a few of these yield components have a strong significance for a new ‘green revolution’ to increase yield. However, the negative correlations barely received appropriate attention in these genomic studies.

For example, Ashikari et al. (2005) identified a QTL (quantitative trait locus) for grain number in rice (Oryza sativa L.), which was found to encode for cytokinin oxidase/dehydrogenase (OsCKX2), an enzyme that degrades the phytohormone cytokinin. Reduced expression of OsCKX2 causes cytokinin accumulation in inflorescence meristems and increases the number of grains per ear. However, the impact of this gene on other yield components (e.g. single-grain weight) was not reported; so its impact on yield even on isolated plants is elusive.

Song et al. (2007) reported characterization of GW2, a new QTL that controls rice grain width and weight. GW2 was found to encode a RING (Really Interesting New Gene)-type protein with E3 ubiquitin ligase activity, which is known to function in the degradation by the ubiquitin-proteasome pathway. Loss of GW2 function increased cell number, resulting in a wider spikelet hull, accelerated grain filling rate and increased grain weight. Their data already showed that an increased grain weight had a negative impact on the number of grains per ear; so the positive impact of GW2 on yield per single rice plant was already marginal, considering a large standard error of variation of this trait (cf. their Fig. 4), let alone its impact on yield per ground area if plants were grown in community.

Figure 4.

Days from sowing to flowering for plants of three rice cultivars transferred from short-day (SD, 10 h d−1) to long-day conditions (LD, 12.5 h d−1 for indica cv. Carreon and 14 h d−1 for japonica cvs Nipponbare and Lao Lai Qing) (circles) or from LD to SD (squares) at various times after sowing (data of Yin et al., 1997). Plants remained under the new photoperiod condition once they were transferred. The standard errors of observations were predominantly smaller than symbols, and are therefore not shown in the figure. The first point of SD-to-LD transfer and LD-to-SD transfer series in each panel of the figure corresponds to the plants continuously grown in LD and SD, respectively. The lines for ‘Carreon’ were drawn according to a model to which the data of both sets of transfers were simultaneously fitted (R2 = 0.978). This model presumes the period from sowing to flowering is in three subphases, and the photoperiod-sensitive subphase is flanked by the pre- and postsensitive subphases (Yin et al., 1997). No lines were drawn for ‘Nipponbare’ and ‘Lao Lai Qing’, because their observed flowering dates for plants of earlier SD-to-LD transfers exhibited unexpected delays, which do not agree with this model. Among a total of 20 cultivars tested, seven showed this pattern of delay, whereas the flowering dates of the others could be predicted by the model. Of those seven cultivars with the delay, three (including ‘Nipponbare’) were tested in two seasons, and two (again including ‘Nipponbare’) were tested using different photoperiods for SD; and the delay patterns were shown to be consistent across seasons and across photoperiods used for the SD condition.

Another example is the study to identify a senescence-regulating NAC gene (underlying a previously mapped grain protein content QTL, known as Gpc-B1) and its effect on improving grain protein, zinc and iron concentration in wheat (Uauy et al., 2006) – important traits for improving human nutrition and health. The wild wheat allele encodes a NAC transcription factor (NAM-B1) that accelerates senescence and increases nutrient remobilization from leaves to developing grains, whereas modern wheat cultivars carry a nonfunctional NAM-B1 allele. While the study elucidated direct molecular evidence for the link between the regulation of senescence and nutrient remobilization, long recognized by crop physiologists (Sinclair & de Wit, 1975), no information was provided on possible pleiotropic effects of this gene on yield components other than grain weight; hence its implication for contributing to breeding for food with enhanced nutritional value remains unknown.

Similarly, for resource use efficiency, Karaba et al. (2007) reported that introduction of HARDY, an Arabidopsis drought and salt tolerance gene, into rice can improve its water-use efficiency (WUE). Their calculation of WUE was based on biomass production of single isolated rice plants, and hence it is unclear whether the gene can result in a real improvement of WUE of a plant stand in terms of grain yield on a ground area basis.

There is no doubt about the scientific value of these few genomic studies on yield or resource use efficiency-related traits in major food crops. However, if simple principles of crop physiology had been applied in these genomic studies, the efficiency with which they could be used to enhance our understanding of crop physiology and subsequently to improve crops would have been greatly enhanced. Obviously, the link between functional genomics and crop physiology is missing, and badly needs to be established.

Experience of crop physiology in modelling the whole crop

While the simple relationship outlined in Eqn 1 is very useful in identifying factors limiting yield formation, it is not sufficient to unravel the underlying physiology and feedback mechanisms of crop growth. To this end, de Wit (1965, 1978) introduced the ‘general systems theory’ of von Bertalanffy (1933, 1969) and simulation methodologies of Forrester (1961) into crop physiology, and initiated research for modelling the whole crop (see Fig. 2 for an example of such a model). The emergence of the whole-crop physiology modelling in the late 1960s was analogous to the initiative to develop systems biology today, because of the need for instruments that could summarize increasing quantities of experimental data. The main difference between them lies in the scale of organizational hierarchy: data used for the whole-crop physiology modelling were largely from crop fields, whereas those of systems biology today are from ‘wet’ laboratories. In view of this and modelling efforts in other disciplines such as ecology, the concept ‘systems biology’ is not entirely new; its appearance merely exemplifies the long-held propositions that biological systems are layered and that simulation of biological systems is a broad field.

Figure 2.

A conceptual whole-crop model that captures the conservation and balance of energy, water, carbon (C) and nitrogen (N) assimilations. The scheme is represented using standard Forrester's (1961) symbols (i.e. shaded boxes are state variables, valves are rate variables, ellipses are intermediate variables, crossed small cycles are environmental inputs, solid lines are flows of material, and dotted lines are flows of information). A whole-crop physiology model captures a number of underlying processes (de Wit, 1978; Yin et al., 2004). Each process is quantified in relation to other processes and to environmental factors. For model computation, the rate of change of processes is assumed constant during a short time-step (usually 1 d). Crop growth rate at a time-step is computed, depending on the actual crop status and current environmental conditions. The biomass formed in a time-step equals the multiple of growth rate and the length of the time-step. This is added to the quantity of biomass already present. The growth rate is then recalculated for the next time-step. Calculating growth rates and updating the quantity of biomass are repeated in sequence until the entire growing season is predicted to end by the developmental submodel. The quantity of biomass is a state variable and the growth rate is its rate variable. Both tangible quantities (e.g. biomass weight, nitrogen content, leaf area) and abstract quantities (e.g. development stage) can be considered as state variables. Biomass weight of plant organs is distinguished as separate state variables; the partitioning of the newly produced assimilates at each time-step among growing organs needs then to be described, largely based on their strength as sink. The key issue in model development is to describe rate variables, for which one equation may not be enough and many intermediate variables are to be introduced. Environmental variables as inputs to crop model include climatic factors (e.g. radiation, temperature) and edaphic variables (water and nutrient availabilities, which are influenced by management practices). Genetic coefficients are another type of model input but these are not shown in the diagram. This figure is reproduced from Yin & van Laar (2005) for the generic model GECROS.

Biological systems are hierarchically organized with layers: molecules, membranes, organelles, cells, tissues, organs, plants, crops, and so on. Functional genomics and systems biology emphasize the ‘wholeness’ below the cellular level, so biological systems are extended downwards to levels such as genome, transcriptome, proteome, metabolome and cellome. Each layer has its own language and concepts. For example, for the crop level, crop physiologists successfully used the concept ‘leaf area index’ (LAI) – the ratio of the total area of all green leaves to the area of the ground on which the crop grows – to indicate the size of the crop stand canopy. Passioura (1979) discussed properties of the layered systems and their implications for modelling –‘our understanding of a biological phenomenon is incomplete unless we can relate it to (or translate it into) phenomena in the adjoining levels of the organizational scale’. Specifically, any given phenomenon may be the centre of our attention (level n); to model it, we need to find its explanations at a lower level (n – 1); otherwise our knowledge on level n is merely descriptive or superficial. Modelling the phenomenon at level n should also achieve significance at a higher level (n + 1); otherwise our work on level n is likely to be called trivial and irrelevant.

The practice of the whole-crop physiology modelling during recent decades followed this principle of the layered system. The central attention of this practice is ‘crop growth and development’ (level n), whose ultimate goal (significance) is, among other things, to achieve an accurate prediction of crop productivity (level n + 1). To model crop growth and development mechanistically, it is necessary to seek the underlying mechanisms at a lower level (n – 1) such as canopy development, canopy photosynthesis, respiration, transpiration, nitrogen uptake, phenology (vernalization and photoperiodism), assimilate partitioning and remobilization (Fig. 2). Each of these underlying processes is described by differential equations for rate variables of the model; and very often one equation may not be enough, and many intermediate equations are introduced as needed. The same principle can be applied to particular underlying processes along the chain, for example photosynthesis of crop canopy – photosynthesis of leaf – reactions of photosynthesis in chloroplast – energy conversion in thylakoid (Table 1). Irrespective of the level of organization where the central attention is set, roles of mechanistic modelling have been recognized in synthesis, prediction, heuristics, and design.

Table 1.  Representation of biological phenomena at various levels of photosynthesis-related hierarchy, and the relations between them
Significance (n + 1)CH2O productionSteady-state CO2 uptakeSteady-state or transient CO2 uptakeO2 evolution
Attention (n)Canopy photosynthesisLeaf photosynthesisChloroplast reactionsLight reactions
Explanation (n – 1)Light distribution, light scattering, leaf photosynthesisLeaf absorbance, CO2 diffusions, chloroplast reactionsLight reactions, dark reactions, alternative metabolismsLight harvesting, excitation partitioning, electron transport, proton translocation, ATP synthesis


Bridging the gap between a large body of experimental data and knowledge and a potentially useful mathematical model is not trivial. We will highlight several successful examples of synthesizing empirical data into a physiological model, which will also be used in later parts of the paper. The principles of designing algorithms in these examples would also be instrumental to systems biology modelling.

A most compelling example is the development of the widely used model of Farquhar et al. (1980) for net photosynthesis (A) in C3 leaves as affected by chloroplast CO2 concentration (Cc), O2 concentration (O) and absorbed irradiance (I):

A= min(Ac, Aj)  (Eqn 2)
image(Eqn 2a)
image(Eqn 2b)

where S is the Rubisco specificity factor, KmC and KmO are the Michaelis-Menten coefficients of Rubisco for CO2 and O2, respectively, Rd is the dark respiration in the light, α is the linear electron transport efficiency under limiting light, and θ is a curvature factor. Building on the basis of others’ work, Eqn 2 considers that photosynthesis cannot go faster than a Rubisco carboxylation capacity (Vcmax)-limited rate (Ac), nor than an electron transport capacity (Jmax)-limited rate (Aj). Preceding this model, a great body of work had accumulated describing the responses of CO2 exchange by leaves under a wide range of conditions (e.g. CO2 concentration, O2 concentration, and irradiance). These responses were quite consistent but difficult to explain. The model provides a tool for making links between steady-state gas exchange and leaf biochemistry (von Caemmerer & Farquhar, 1981). Apparently, the model makes no attempt to describe all the processes of photosynthesis from light harvesting to metabolism; rather it is a synthesis, a summary of the knowledge of the contributing mechanisms that focuses on a small number of key processes.

The success of the model of Farquhar et al. (1980) indicates that summarizing data from experiments is not merely an empirical procedure of using a proper equation to statistically fit the measured data. A simple empirical curve-fitting procedure will probably not reveal biological insights. To minimize the empiricism in translating experimental knowledge, the mathematical formalism of the model should be generic and its parameters should have an unambiguous biological meaning and represent certain genetic characteristics. For example, Yin et al. (2000) developed an equation for describing LAI (L) in relation to amount of canopy nitrogen (N) as:

image(Eqn 3)

The equation was derived on the basis of two widely observed experimental results: the profile of leaf nitrogen content in the canopy follows an exponential function with leaf area accounted from the top of the canopy (Field, 1983); and there is a base nitrogen content (nb), at or below which leaf photosynthetic rate at a saturating light is zero (Evans, 1983; Sinclair & Horie, 1989; also cf. Fig. 3a). Equation 3 has two parameters, nb and kn (the coefficient for the extinction of leaf nitrogen in the canopy); both may represent characteristics of crop species or their genotypes. Not only is the meaning of the two parameters unambiguous, but the equation also engenders a simple yet robust method for predicting the onset and quantity of leaf senescence, based on crop carbon and nitrogen balance. The LAI calculated by Eqn 3 can be designated as the nitrogen limited LAI (LN). Conventionally, LAI has simply been calculated from leaf carbon or dry matter (Penning de Vries et al., 1989), denoted here as LC. The rate of the LAI decrease as a result of senescence, ΔL, can be formulated as follows (Yin et al., 2000; Yin & van Laar, 2005):

Figure 3.

(a) Leaf photosynthesis of C3 (dashed lines) and C4 (solid lines) rice at high (thick lines, 1000 µmol m−2 s−1) and low (thin lines, 100 µmol m−2 s−1) light intensities in relation to leaf nitrogen content; (b) time course of simulated canopy photosynthesis of C3 (thin line) and C4 (thick line) rice using the 1992 weather data; (c) simulated grain yields of C3 and C4 rice from 1979 to 2005.

image(Eqn 4)

where Δt is the length of time step for dynamic simulation. Equation 4 avoids the use of time-dependent, empirical leaf-turnover coefficients, as used in many earlier models (Penning de Vries et al., 1989). It also agrees with a coherent biological picture of leaf senescence in relation to the decreasing amount of nitrogen in the canopy resulting from nitrogen remobilization to support growing seeds or grains (Sinclair & de Wit, 1975; Hirel et al., 2001; Uauy et al., 2006).

The situation is more complicated if a variable is to be described by two or more other variables. Simply, often being led by statistical analysis, one may think of a model in which the main effects of the variables and all possible two-way and/or higher-order interactions between these variables are summed; for example, in a recent systems biology study on seedling morphology as affected by the fluence rates of red or far-red light and exogenous brassinolide (Nemhauser et al., 2003). Parameters of such a model are extremely hard to interpret biologically. An alternative approach is to assume a multiplicative form of the effects of individual factors. For example, using this form Jarvis (1976) described the nonlinear dependence of stomatal conductance (gs) on a number of environmental variables: irradiance, temperature, water vapour deficit, CO2 concentration and soil moisture content. Such a model gives a multidimensional response surface, which is complex because responses to individual variables are nonlinear. On the basis of earlier model developments and experimental observation for the apparent close link between gs and leaf net CO2 assimilate rate (A), Leuning (1995) proposed a more robust model, as follows:

image(Eqn 5)

where Γ is the CO2 compensation point for A, gso is a residual gs at the light compensation point for A, Cs is leaf-surface CO2 concentration, a is the coefficient related to intercellular CO2 concentration at saturating light, and b reflects the sensitivity of the stomata to water vapour deficit (D). In Eqn 5, stomatal conductance and CO2 assimilation are interlinked, and hence the effect of temperature, irradiance and leaf nitrogen on gs can be reflected via their effect on A. While more variants of the model were later developed, Eqn 5, when combined with a model for A of Farquhar et al. (1980), that is, Eqn 2, was found to be able to reproduce the observed behaviour of leaf stomatal conductance over a wide range of environmental conditions (Leuning, 1995).


When algorithms for individual components are developed, they need to be assembled into a model for predicting processes or a phenomenon at the higher adjacent level. The assembling itself is also an upscaling process, which sometimes involves both spatial and temporal integrations; for example, extension of instantaneous leaf photosynthesis to daily total of canopy photosynthesis. In such a scaling exercise, different time-steps for integration may be needed for different processes, especially for those where nonlinearity is dominant in the involved algorithms.

Because the model is based on the assembled algorithms for underlying low-level processes, the model is able to predict the variables at a higher organizational level beyond the situations from which the model parameters have been estimated. Whole-crop physiology models have widely been used to support various crop studies at the field level (Kropff et al., 2001). In their application to breeding programmes, model-input parameters are also called ‘genetic coefficients’, as evidenced by studies that the genetic factors underlying the variation of these ‘coefficients’ can be identified, via either a regression approach (White & Hoogenboom 1996) or a genetic QTL approach (Yin et al., 1999). The models therefore have the potential to analyse or predict the relative magnitude of a particular gene or QTL in relation to environmental schemes and management conditions, that is, the ‘G × E × M’ problem (Yin et al., 2004) – a well-known enigmatic phenomenon in crop breeding and agronomy. Several studies have shown the potential of this model approach, at least for relatively simple crop traits such as leaf elongation rate (Reymond et al., 2003; Tardieu, 2003), flowering time (Nakagawa et al., 2005; Yin et al., 2005), and fruit quality (Quilot et al., 2005).


When different algorithms are combined, the rules by which elements interact can translate correlations into causality and give rise to system behaviour and emergent properties. The emergent behaviours are sometimes expected; for example, the linear relations between both Vcmax and Jmax and leaf nitrogen can give rise to a nonlinearity between Amax (irradiance-saturated leaf photosynthetic rate) and leaf nitrogen (Fig. 3a), as often observed experimentally (Evans, 1983; Sinclair & Horie, 1989). However, in many cases, the emergent properties may well be unexpected and even counterintuitive. For example, using sensitivity analysis based on an extended version of the Farquhar et al. (1980) model, Yin et al. (2006) identified that even under limiting irradiance, cyclic electron transport around photosystem I (CET) can operate as a ‘brake’ for linear electron transport (LET) to match the efficiency of the light reaction with the efficiency of the dark reaction of photosynthesis. This result is surprising, given that for C3 vascular plants, CET has been supposed only to play a role, (if any) under stress or at high irradiance (Johnson, 2005). Although it is hard to confirm experimentally (CET is cyclic in nature and a cyclic process is not amenable to be measured directly because it does not involve any gas exchange, especially not under limiting irradiance), an indirect measuring procedure was proposed, based on the extended model (Yin et al., in press). Therefore, the model and model-based analysis can generate novel hypotheses and methods, which can guide further experimental research to increase understanding of particular elements and/or the system as a whole. This ‘symbiosis’ between modelling and experimentation means that successful model development is often an iterative process. The role of models as a heuristics engine is a criterion to distinguish mechanistic and empirical models. Empirical models can perform quite well in terms of prediction within a certain range of conditions, but have little use in heuristics.


Model based systems design is strongly analogous to the concept of ‘design principles’ in engineering. When a mechanistic model is well tested, the model can be used for system design. Crop models have been used, for example, to optimize agronomic management strategies and to design sustainable agro-ecosystems (Kropff et al., 2001). Within the design role for plant breeding, modellers and physiologists have explored potential uses of whole-crop physiology models in various aspects of genetic analysis and breeding (Boote et al., 2001; Yin et al., 2004), including identification of main yield determining traits (Bindraban, 1997); defining optimum selection environments (Aggarwal et al., 1997); evaluation of selection efficiency (Chapman et al., 2003); design of crop ideotypes for a target environment (Kropff et al., 1995); and assisting multi-environment testing (Dua et al., 1990).

Sinclair et al. (2004) summarized three cases (i.e. improved WUE of wheat, cowpea heat-stress tolerance, and soybean nitrogen fixation tolerance to water deficit), where physiological research, when integrated to breeding selection, did lead to improved crop cultivars with increased yield. A further notable example of the design approach to influence breeding is the development of super rice in China. Setter et al. (1995) used a crop model and showed that thick erect final leaves displayed well above a low panicle are essential for high yields. This ideotype concept, combined with other approaches (i.e. heterosis and wider genetic resource utilization), was successfully realized in China (Normile, 1999). Several super rice varieties, both inbred and hybrid, have now been released to farmers, and some are widely grown in China.

Implications of crop modelling experience in systems biology

Biological modelling and simulation is a broad field, given the plethora of layers of biological systems. The success of a modelling study needs to be assessed in its robustness to play roles for synthesis, prediction, heuristics and design. To fulfil these roles, it is important to think at three relevant levels simultaneously. The current definition of systems biology is overwhelmingly focused on the cellular level and below. The definition and resultant actions of research are driven by the search for an ‘explanation’ at low levels, without a proper balance with a concern for ‘significance’ at high levels. This drives one's research further and further down the organizational levels, and increasingly tends to distort the knowledge base of plant biology (Lawlor, 2003). The movement is facilitated by what Passioura (1979) called the ‘molecular chauvinism’, that the only worthwhile explanation of biological phenomenon is at the molecular (now, the ‘omic’) level. This proposition is nonsense (Passioura, 1979) in the context of the layered structure of biological systems, in which no level of organization is more important than any other for understanding the behaviour of the entire system, especially considering the initiative of plant systems biology (Minorsky, 2003) for contributing to provision of fast means to solve some imminent real-world problems in relation to increased global crop production.

Crop production is a complex process, and a change at the molecular level may not result in the anticipated change at the crop level, as revealed by the following two modelling examples for soybean and rice, respectively. Both analyses highlighted roles of modelling in synthesis, prediction, heuristics and design. Sinclair et al. (2004) analysed the impact on yield of hypothetical improvements in the molecular capacity of photosynthesis, given that increasing leaf photosynthetic rates had been proposed as a straightforward way of increasing crop yields. Their model calculation began by assuming that soybean leaves can be transformed to produce 50% more mRNA than currently for synthesis of the subunits of Rubisco. This increase in mRNA production should result in the synthesis of 37% more Rubisco, which, in turn, was calculated to result in a 33% increase in light-saturated leaf photosynthetic rates, and 30% increase of isolated plant photosynthesis. Photosynthesis for an isolated plant is increased less than leaf photosynthesis because c. 40% of the leaves of an isolated plant are not exposed to photosynthetically saturating light intensities. A further decrease in advantage occurs in moving from photosynthesis of the isolated plant to assimilation by a community of plants (i.e. crop) where there is competition for light, so an increase of crop canopy carbon gain by 50% increase in mRNA was only 18%. Finally, the benefit of seed mass accumulation was estimated. As the seed growth requires both carbon and nitrogen to form the essential components of grain, the estimated seed yield depended on assumptions about nitrogen uptake scenarios. If nitrogen is readily available in the soil to meet the increased nitrogen requirement of grain growth, a 6% increase in crop yield was predicted. If no additional nitrogen accumulation occurred, the calculated yield increase of the transformed crop was negative, because the transformed crop had larger vegetative organs that needed more nitrogen to be incorporated into their structural components, and consequently less nitrogen was available for subsequent transfer to the seed and so seed growth became nitrogen-limited.

Rice is a major food crop that differs from soybean in terms of nitrogen requirement for grain growth. It has been suggested that supercharging photosynthesis is the only way to improve yield potential substantially in rice whilst not increasing the demand for water and nitrogen (Mitchell & Sheehy, 2006). These authors strongly proposed adding the C4 biochemical pathway and modifying leaf anatomy so that the C4 system works at its best in rice. To examine this proposition, we conducted a model analysis for rice, similar to the analysis Sinclair et al. (2004) did for soybean, outlined earlier, to assess the impact if the full C4 system were to be introduced into rice successfully. Equation 2 used as C3 photosynthesis, and those adapted from von Caemmerer & Furbank (1999) for C4 photosynthesis, combined with a stomatal conductance model similar to Eqn 5, were incorporated into the mechanistic crop model GECROS (cf. Fig. 2 for its conceptual structure), which can simulate the impact of the first ‘green revolution’ genes (dwarfing and photoperiod-insensitivity genes) on grain yield (Yin & van Laar, 2005). The CO2 concentrating mechanism, mimicked largely in the C4-photosynthesis model via the use of low bundle sheath conductance, is illustrated by a significant increase of leaf photosynthesis in C4 rice under high irradiance (approx. 67% at high leaf nitrogen, Fig. 3a). The simulations were conducted for the conditions of the International Rice Research Institute (IRRI) experimental farm during the dry seasons of the years 1979 to 2005, assuming that there is no other change but the photosynthetic routine, that is, crop duration of c. 110 d, plant height of c. 1 m, and the same timing and amount (= 220 kg N ha−1, typical for the dry season at IRRI, Setter et al., 1994) of nitrogen uptake by the crop under irrigated conditions. The advantage of C4 crop in canopy photosynthesis fluctuated, depending mainly on daily radiation; and the canopy photosynthesis was increased in C4 rice largely during the pre-flowering period, whereas during a second part of the post-flowering period, the canopy photosynthesis of C4 rice was even lower (Fig. 3b), because of an increased carbon : nitrogen ratio that resulted in more senescence in C4 rice (cf. Eqn 4). This mimics, in a phenomenological way, the photosynthetic acclimation as revealed by many long-term CO2 enrichment studies (Woodward, 2002). Simulated yield advantage of C4 vs C3 rice varied with years (Fig. 3c); on average, grain yield increased from 10.5 t ha−1, the maximum rice yield currently observed at the IRRI farm, to 13.1 t ha−1 for C4 rice, that is, an increase of 23%– lower than the 50% increase hoped for (Mitchell & Sheehy, 2006). In contrast to Mitchell & Sheehy (2006)'s suggestion that C4 rice of 15 t ha−1 does not require an increased demand for nitrogen, higher nitrogen uptake is needed to predict a higher yield, because C4 photosynthesis responds more to leaf nitrogen than the C3 routine under high-light conditions (Fig. 3a). This requires that plants develop a larger capacity to hold carbon reserves from pre-flowering and early post-flowering photosynthesis. In short, many mechanisms, in addition to the full C4 system, need to function in concert with substantially increased rice yield potential.

The simulated damping effect by both studies is confirmed by experiments at the isolated plant level (Makino et al., 2000), and by a large-scale field FACE (free air CO2-concentration enrichment) experiment of Long et al. (2006) for several C3 crops (including soybean and rice) where the effect of CO2 enrichment on photosynthesis is equivalent to that of transformation for increased photosynthesis. Therefore, there is a series of interactions and feedbacks operating along a crop developmental cascade which constrain the realization of alterations made at lower levels. This explains the negative correlations among yield components in Eqn 1, because these yield components are formed during different developmental phases. This damping effect has important implications for the establishment of plant systems biology. Undoubtedly, systems biology, although still in its infancy in terms of its robustness to play the modelling roles discussed earlier (i.e. synthesis, prediction, heuristics, and design), will ultimately yield valuable information for gene functions, gene interaction and genetic regulatory networks. Discovery of gene functions is a basic task in functional genomics (Miflin, 2000); it is not sufficient for crop improvement and probably of little use for enhancing selection for quantitative traits (Bernardo, 2001). The approach also pays little attention to the intra- and interplant competition and to the modulation by (multiple) varying environments as perceived by the whole crop, and therefore is a long way from helping to explain the connections between multiple genes and complex phenotypes, such as grain yield and quality traits, that are crucial for agriculture (Lawlor, 2003). Therefore, systems biology, if only rooted in the ‘omics’ area, will result in an ever-increasing gap between genotypes and crop phenotypes. Although the importance for horizontal, cross-discipline cooperation (between biology, mathematics, bioinformatics, chemistry, computer science, etc.) is well recognized (Minorsky, 2003; Gutiérrez et al., 2005), it has been less recognized for the importance of the vertical dimension – that is, various biological scales. Systems biology needs to reach the crop scale if it wants to contribute to improving food production and energy supply.

Crop systems biology as a platform for communications across scales

We have discussed how using simple principles of crop physiology can enhance the potential relevance of genomic studies (Ashikari et al., 2005; Uauy et al., 2006; Karaba et al., 2007; Song et al., 2007) to the improvement of complex yield- or resource use efficiency-related traits in major crops. A more systematic approach is needed, in view of the following: the need to bring the information from functional genomics to the crop level, as discussed earlier; the need to better understand the organization, intra- and interplant competition of the whole crop and its response to environmental conditions; the need to fill the vast middle ground between ‘omics’ and relatively simple crop models (White & Hoogenboom 2003); the concern about the lack of true biological mechanisms in many current crop models (Lawlor, 2003; Long et al., 2006); and the need to promote communications across scales. Yin & Struik (2007) therefore proposed a viable concept, ‘crop systems biology’. Crop systems biology aims at modelling complex crop-level traits relevant to global food production and energy supply, by building links between ‘omics’-level information, underlying biochemical understanding, and physiological component processes.

To develop crop systems biology, it is necessary to map the organization levels and the communication systems between these levels for the different key processes. A model to achieve this is likely to become too complicated. Examples shown earlier in the ‘Synthesis’ section indicate that a robust model may not necessarily be a complex one. Much of the fine detail may not be needed (Hammer et al., 2006), and certain details of organization may be skipped as irrelevant or unnecessary to develop a prototype model. For example, Eqn 2 and associated algorithms are, in principle, valid for modelling photosynthesis at the chloroplast level (Farquhar et al., 1980), but have been widely used to predict photosynthesis successfully at the leaf level without using additional algorithms. Similarly, by making use of the concept of LAI, detailed algorithms for light perceived by individual organs at the plant level may be skipped for calculating canopy photosynthesis. Thus, crop systems biology models at this step may not necessarily be more complex in structure, nor in their computational requirement, than existing whole-crop physiology models. However, there is a need to make more comprehensive synthesis of the rich and robust biological understanding of the functional relations between carbon and nitrogen metabolism (Noctor & Foyer, 1998), between sources and sinks, between shoots and roots, and between structural and nonstructural components. For a comprehensive model such as this, a modular design is needed to ensure that changes or extensions of a sub-model will not affect other parts of the model. Algorithms at the process level are then assembled and scaled up to the crop level in a similar way to temporal and spatial integrations as practised in existing whole-crop physiology models. In relation to crop improvement, a key element would be to identify the parts of mechanisms that are conservative in energy and water transfer, and carbon and nitrogen metabolism, and the parts of mechanisms that show genetic variation and are potentially amenable to selection and engineering. The prototype models should allow identification and quantitative assessment of specific parts of metabolic pathways and processes which could be altered to achieve trait improvement.

Genomic tools and techniques are beginning to reveal the mechanisms that underlie change of crop phenotypes in response to environmental variables. For instance, molecular drivers for the increase of soybean leaf area at elevated CO2 were investigated using cDNA microarrays (Ainsworth et al., 2006). They demonstrated that at the transcript and metabolite levels, elevated CO2 stimulates the respiratory breakdown of carbohydrates, which likely provides increased fuel for leaf expansion at elevated CO2, in addition to the well documented explanation that leaf expansion results from increased photosynthetic carbon fixation at elevated CO2. It can be envisaged that the functioning of the whole crop can, in simple terms, be described by integrating responses across multiple levels of biological organization. There have been debates about whether models need to be made across more than three scales (White & Hoogenboom 2003) even if computational time is hardly a constraint. In our view, the answer to this question depends on the goal of the research. Such a multiscale model might be of little use in terms of prediction of crop-level phenotypes because of its high input requirements, which may add uncertainties to the model. However, such coupled models could be very useful in terms of heuristics and systems design, since they enable in silico assessment of crop response to genetic fine-tuning under defined environmental scenarios, thereby being powerful tools in designing breeding and engineering for complex crop traits. For example, by simulating the spatial and temporal heterogeneity of light flux in crop canopies (using a reverse ray-tracing algorithm) and incorporating the cost of delayed recovery in photosystem II (PSII) photochemical efficiency on transfer from high to low light, Zhu et al. (2004) analysed the benefit of selection or engineering for genotypes better able to recover rapidly from the photoprotected state of PSII. Because photoprotection is at the level of the cell, not at the level of the leaf, light was simulated for small points of 104 µm rather than as an average for a leaf. They predicted an increase of daily carbon uptake of at least 13% for a typical crop canopy of LAI = 3. If the approach is embedded in a whole-crop model, potential crop yield gain under various field conditions could be assessed. This type of multilevel modelling will integrate knowledge of processes at various scales, and translate it into tools and methodologies that will facilitate the use of the outputs of genomics research in a plant breeding programme (Dwivedi et al., 2007).

Reaching down to lower organizational levels is most likely to be done one process at a time. For almost all processes, it is important to harness the massive amount of information for molecular understanding based on the model plant species Arabidopsis. A first candidate process is the most understood trait – flowering time. Based on the qualitative, genetic characterization of major flowering time genes in Arabidopsis (Koornneef et al., 1998), Welch et al. (2003) have proposed a preliminary quantitative neural network model of flowering time control in this model species. Similar modelling may be explored for phenology in crop species, given that homologues of some Arabidopsis flowering genes have been found in major crops (Imaizumi & Kay, 2006). It is worth noting here that scaling down to a lower level is not a one-directional process that only ‘omicists’‘donate’. Crop physiologists can contribute to the whole process. So, again, a consideration of the systems framework of the n – 1, n, and n + 1 layers (i.e. gene regulatory network, flowering induction pathways and flowering date, respectively) will facilitate the elucidation of gene–gene and gene–environment interactions of flowering control. In Arabidopsis, recent advances have indicated that the core of the day-length measurement mechanism of the plant lies in the circadian regulation of CONSTANS (CO) expression and the subsequent photoperiodic induction of the expression of the FLOWERING LOCUS T (FT) gene, which might encode a major component of ‘florigen’ long postulated from classical grafting experiments (Imaizumi & Kay, 2006; Corbesier et al., 2007). Both CO and FT genes have homologues in rice, HEADING DATE 1 (HD1) and HD3a, respectively (Cremer & Coupland, 2003), and the Hd3a protein may be the rice florigen (Tamaki et al., 2007). The FT gene also has a homologue in wheat and barley (Yan et al., 2006). The question is to what extent this CO-FT module can explain a diverse response to day length in different species. In a crop physiological experiment of reciprocal transfers between long-day (LD) and short-day (SD) photoperiods to identify the onset and end of photoperiod-sensitive phase and to quantify photoperiod sensitivity, Yin et al. (1997) found that there was an unexpected delay of flowering for plants of early SD-to-LD transfers, relative to the flowering date for the plants grown continuously in LD, in some but not in other rice cultivars (Fig. 4). A 5 d difference between two consecutive transfers around the time the number of SD inductive cycles was fulfilled resulted in up to 145 d difference in flowering in var. Lao Lai Qing (Fig. 4c). The delay was found mostly in japonica rice varieties, including var. Nipponbare, which was a parent of some mapping populations used to detect rice HD genes (Yano et al., 2001). Understanding the underlying gene regulation for the observed increasing delay of flowering date by early SD-to-LD transfers in some but not other cultivars (Fig. 4) poses a major research question for flowering biologists. Also we should explore whether the CO-FT module, combined with the effect of other characterized gene cascades (Corbesier & Coupland, 2006), can generate a large spectrum of quantitative variation in photoperiod sensitivity and basic vegetative phase among genotypes, such as those observed by Yin et al. (1997).

For more complex candidate traits/processes such as carbon assimilation, nitrogen assimilation, structural-stem formation and stress tolerance, for which the full understanding of underlying gene regulatory control in response to environmental perturbation is not expected in the near future, it is more important to try to understand and model the molecular-physiological basis of these traits/processes (Struik et al. 2007). The rich history of such physiological and biochemical studies should provide the necessary information for the development of crop systems biology models. For example, combined modelling of photosynthetic electron transport, the Benson–Calvin cycle and the photorespiratory cycle has been published (Laisk et al., 2006). To better understand nitrogen use efficiency (Hirel et al., 2001), the carbon assimilation model could be extended to associate with the stoichiometry of nitrogen assimilation, in relation to the activity of key enzymes (e.g. nitrate reductase and glutamine synthetase), as a result of the close coupling between carbon and nitrogen assimilation in plants (Noctor & Foyer, 1998). With the future development of functional genomics, combined studies of physiological components with gene expression profiles should illustrate the function of genes, biochemical pathways and cellular processes that are affected in a coordinated manner. Such studies should lay the groundwork for extending models to include regulatory networks and linkages among gene products, biochemistry and whole-plant physiology. Obviously, different developmental, temporal, spatial and structural scales are required for different components, pathways and processes of the system. Ultimately, crop systems biology may develop into a highly computer-intensive discipline, and crop systems biology models will act as predictive and heuristic engines and lead to innovations for crop improvement.


Manipulation of a relatively small number of genes (notably, dwarfing and photoperiod-insensitivity genes in many crops) using the conventional breeding approach resulted in the first ‘green revolution’. For the next ‘green revolution’ to happen, we have to deal with many genes so that they work in concert. As pointed out by the father of the first ‘green revolution’, the Nobel Prize laureate Norman E. Borlaug in his address ‘Challenges facing crop scientists in the 21st century’ at the 2007 meeting of American Society of Agronomy, while conventional breeding will continue to contribute, great benefit from biotechnology is expected in the coming decades. Advances in ‘omics’ are assisting researchers to address complex biological issues of significant agricultural importance. However, the rate, scale and scope of use of genomics in crop breeding programmes have continually lagged behind expectations (Struik et al., 2007). There is an increasing recognition from geneticists and breeders (Tuberosa & Salvi, 2006; Dwivedi et al., 2007) of an immediate need for computational tools to help breeders more effectively translate and integrate the outputs from genomics research and to help them efficiently select the best technology interventions and associated breeding systems for their target traits.

Plant systems biology was proposed as a possible rapid means to solve some imminent food-, feed-, and energy-related, ‘real-world’ problems (Minorsky, 2003). Although this approach combined with functional genomics might offer a few shortcuts, its proponents may have underestimated the difficulties of a successful programme to develop improved crop cultivars. As pointed out by Miflin (2000), ‘farmers cultivate phenotypes’. Phenotypes at the crop level, even without biotic or abiotic stresses, are extremely complex, because they are achieved not only by molecular pathways but also through multiple environment-dependent intermediate processes and orchestrated developmental feedback mechanisms, and by intra- and interplant competition. Alterations made at the genome level, although substantial, could have little effect on the crop-level phenotypes. Hence, systems biology should not be considered the privilege of those working on molecular, subcellular or cellular levels; instead, it can and should be applied across the whole spectrum of plant biological hierarchies. To allow systems biology to have significant impact on the next ‘green revolution’, the information from ‘omics’ should reach up to the crop level, and thus a viable approach, crop systems biology, should be established. Crop systems biology can narrow the genotype-to-phenotype gaps, enhance the link between traditional and modern sciences, and restore the balance of the distorted base of plant biology.

Crop systems biology is not meant to replace systems biology; rather, it will strengthen the position of systems biology to exploit genomics and systems approaches for crop improvement. While the success of crop systems biology-based breeding approaches cannot be guaranteed with certainty, given the complexities of crop phenotypes, a recent comprehensive review (Dwivedi et al., 2007) suggests an optimism that knowledge-led design-driven breeding, supported by computational tools, is a feasible option to enable the potential impact of genomics to be realized. For that to happen, funding organizations need to recognize the long-term, multidisciplinary efforts towards crop improvement (Sinclair et al., 2004; Wollenweber et al., 2005; Struik et al., 2007). Several international symposia (Weiss, 2003; White et al., 2004; Cooper & Hammer, 2005; Spiertz et al., 2007) have been organized to address similar topics, and crop scientists have shown great enthusiasm to be involved in cooperation with ‘omicists’. What about vice versa?


We greatly acknowledge its organisers for inviting X. Y. to speak at the 17th New Phytologist Symposium, which formed the basis for the present paper. We also thank Profs M. Koornneef, M. J. Kropff, P. Stam and F. A. van Eeuwijk of Wageningen University for their contribution, and Prof. A. Weiss of the University of Nebraska-Lincoln for valuable discussions.