Out of the woods. Ash dieback and the future of emergent pathogenomics

Authors


The recent emergence of the ash dieback fungus Hymenoscyphus pseudoalbidus (anamorph Chalara fraxinea) in the UK after a long and devastating journey across Europe has prompted new approaches to how pathogenomics studies are carried out that could transform a very active field and provide a new, ultra-collaborative, ultra-open model of working across all of plant pathology, and all of biology, that mirrors the model used in massively collaborative projects in the physical sciences.

The people of the UK have a strong cultural connection to their woodland. The greenwood has been a major figure in the mythology and traditions of these isles for millennia, and many of our most enduring legends, such as the Green Man, King Arthur and Robin Hood, link us back to the ancient forests. Other customs and traditional children's games, such as ‘conkers’ (Wikipedia Contributors, 2013), mean that, even those living in urban areas, have a lifelong fondness for the trees among which they live. Within living memory, we have witnessed a massive reduction in the number of elm trees, a prominent tree in parks and urban environments, as a consequence of Dutch elm disease. We have also suffered from other highly publicized agricultural disasters in the same time frame; therefore, when ash dieback was identified in the east of England in 2012, the nation took the subject to heart and there was a large public and media outcry and legal challenges forced the government into taking emergency action (Press Association, 2012). The need for a fast response was recognized at the highest level.

Concurrent with the public debate, research groups at The Sainsbury Laboratory and The John Innes Centre, situated together only 10 miles from the first sighting of ash dieback in the wild, quickly began genomic analyses of the pathogen. Recognizing the need for a quick response and seeing an opportunity to try a new way of working, we began an open-access research programme into the genomics of ash dieback.

Mechanisms for a Fast Response to Emergent Plant Pathogens – Ash Dieback in the UK as a Test Case

In order to try and kick-start genomic analyses across the pathogenomics community, we took a pioneering approach that inverts the normal way in which we carry out scientific work. It is usual for scientists to generate data, analyse them and keep them away from all but a few select collaborators until they have had sufficient use from them to publish (and are required by journals to make their data available). In the case of ash dieback data, the tactic has been to release genomics data as soon as they have been created, prior to beginning any actual analysis. Speed is an essential component in responding to rapidly emerging, dangerous plant pathogens, and we believe that, by immediately releasing data to the wider community, to the people who can analyse them, we will bring the heads of many experts together sooner, and allow for concerted community access and a more rapid overall response. Working like this is called crowdsourcing, a form of potentially massively parallel collaboration whose power comes from the sheer number of people whose efforts can be included as part of the work towards the ultimate goal of the project.

What Has Been Achieved So Far

The ash dieback crowdsourcing effort has been contributed to by many independent groups from across the globe. The initial step taken was to create a website, http://oadb.tsl.ac.uk (MacLean et al., 2013), that would act as a crowdsourcing hub through which we could distribute and gather together data and analyses and publicize them as widely as possible. The first data released in this way were sequences of internal transcribed spacer (ITS) and calmodulin that positively identified the pathogen as H. pseudoalbidus.

A series of ‘interaction-transcriptome’ (Birch and Kamoun, 2000) data from samples collected in three woods in the east of England were used to assess mating type (MAT) loci, revealing at least two mating types in a small geographical area, indicating that the UK H. pseudoalbidus population is of heterogeneous origin. Transcriptome assemblies from these sequence reads were provided by groups in Canada and the UK, and later contributions from Forest Research, a governmental agency, identified both MAT 1-1 and MAT 1-2 mating types in tree nurseries, recent plantings and natural spread sites (woodland), further suggesting the roles of seed and sapling trade as a cause of the spread of the disease. Ratios of the mating types were close to 1:1, indicating that the fungus was likely to be regularly sexually outcrossing.

Within 2 months of the hub website going live, The Genome Analysis Centre contributed two libraries of genomic DNA from H. pseudoalbidus and a draft assembly of the approximately 63-Mb genome in 1600 scaffolds. Gene model prediction de novo and by the incorporation of the transcriptome data gave us a set of preliminary annotations that facilitated searches by a group of chemists specializing in compounds with potential fungicidal activity. A blast search of these annotations using a trypanosomal alternative oxidase revealed the genome to contain a sequence with close identity that also contained a mitochondrial targeting sequence, thus providing a candidate target for fungicides. Specialists in fungal secondary metabolic pathways at UC Riverside's Fungal Evolutionary Genomics group used the genomics data in pipelines for the identification of potential toxins. They uncovered pathways for numerous members of polyketide synthase (PKS) and non-ribosomal peptide synthase (NRPS) families which may be indicative of particular toxic interactions with the ash tree host.

Our own initial analyses of carbohydrate active enzyme protein domains encoded in the genome indicated the presence of reasonable numbers of genes involved in carbohydrate metabolism, and suggested that H. pseudoalbidus could be entering into the tree by decaying its wood. In an interesting example of live ‘publish and filter’ peer review facilitated by discussions on the hub website, the analysis was rethought and refocused to compare more widely with genomes of fungi that rot wood and, in comparison, H. pseudoalbidus was found to be rather sparse in the types of enzymes the rotters use to degrade wood.

Easy access to the sequence reads that were used to generate the draft assembly of the pathogen meant that secondary analyses on the structure of the genome could be performed. In doing so, it was discovered that particular regions of the genome contained highly AT-enriched regions coincident with very high back-aligned read coverage in sequence with strong identity to known repeats. Together, these results are suggestive of collapsed repeat regions in the assembly, and are indicative of a repeat-enriched genome reminiscent of those in fast-evolving plant filamentous pathogens, such as the oomycete Phytophthora infestans and the fungus Leptosphaeria maculans (Raffaelle and Kamoun, 2012). Effector protein-coding genes were searched for in the genome and catalogues of secreted candidates were produced.

Expression data from varied combinations of tissues have been collected. RNAseq of primordial fruiting bodies that have not developed into mature structures may help to reveal the pathways interrupted when compared later with similar datasets that have been collected from mature fruiting bodies. Interaction-transcriptome RNAseq data from tree and fungus co-samples have been used in multiple studies and with tree-only RNAseq data to assess the transcript profiles of the tree with and without infection.

Contributions of reads from samples of H. pseudoalbidus from France sequenced by GenePool in Edinburgh and some from Japan allowed us to carry out phylogenetic analysis, and revealed a very mixed invasion pattern into the UK, and perhaps even mainland Europe, that is still being worked out. Further metatranscriptomic analyses from Fera, another governmental agency, have suggested that the fungus itself suffers from infection by various mitoviruses which may affect its pathogenicity and confound other analyses.

Open Access Open Science: a New Way of Working

Working in the open on openly available data provides us with many advantages. Peer review can be inverted and democratized, so that we need not wait until a narrative suitable for a paper-sized unit of publication is ready; having a community closely watch the release of data and results from them as they appear means that review, criticism and improvement can happen much more quickly. Our predominant ‘filter then publish model’, in which work has to be approved by a very small number of people before it is available to a restricted readership in traditional scientific journals, is a major bottleneck we can do without in time-sensitive cases. With the Internet, we can take advantage of a ‘publish then filter model’, which allows us to progress much more quickly as a community of researchers.

Dealing with emergent pathogens in a useful time scale means that we cannot wait to go through the normal grind of scientific funding, work and paper publication cycle, and we need to find new ways of tracking and valuing what scientists do in order that they might continue to contribute outside of the concerns of obtaining funding or names on papers. It is absolutely vital to the development of the careers of scientists that they receive appropriate credit for the contributions they make in all aspects of their work, and it is equally vital, therefore, to repay the good faith of workers contributing to any crowdsourcing project by honouring this. To ensure that appropriate accreditation is given in our project, we have made the cornerstone of our crowdsourcing database a ‘git’ repository mirrored and hosted on the associated website GitHub. Essentially, a version control system for keeping track of the contributions of individual programmers to large software projects that often involve hundreds of people working on the same project simultaneously, git's two main functions are to maintain accurate records of changes to text files and to track the author of a change; together, these allow us to compare and contrast updates as we go and to help to ensure that credit will be given where it is due, instantly and without prejudice about the size of the contribution.

Genomics projects are enriched immeasurably by contributions of knowledge from experts in other communities. Capturing the work of specialists in cellular processes or on particular gene and protein families on top of automated gene finding performed by genomicists would be a massive advantage. Doing so requires that genomicists and bioinformaticians develop new tools through which non-bioinformaticians can easily add information; these people, too, need to have their contributions recognized, and web-based tools are starting to appear that enable this (Ramirez-Gonzalez et al., 2011).

Moving towards successful crowd- and cross-community-based pathogenomics projects therefore requires that we, as scientists, re-assess what we believe constitutes a citable and therefore valuable contribution. There are an increasing number of ways in which micro or atomic publication can be performed and cited reliably. Websites, such as figshare (http://figshare.com), provide digital object identifiers (reliable, persistent and therefore citable addresses) for individual figures, presentations or posters that each receive their own ‘impact’ style metric. This measurement includes not only web views of the figure, but also reposts through social media, etc. Figures in our ash dieback project presented in this way have generated hundreds of views, indicating that it is a great way to produce strong and fast impact. Recent changes in the way in which datasets are viewed in the biological sciences and forward thinking by some publishers have resulted in data publications becoming possible. These are papers that contain a description of a large and valuable data resource and often only a perfunctory analysis. The data themselves are then citable and the generator of the data receives credit for releasing early, and other analyses can be carried out by other groups and the impact of the data is felt more widely.

The Story Stays the Same

There is still a place for the scientific narrative, the deeper story that makes up any traditional paper amidst all this. One strategy for the creation of a paper from the separate strands of analysis that emerge from an open access way of working is to prepare preprints. These are a common product of the physical, chemical and mathematical sciences, particularly those that work in very large collaborative communities and with very large datasets. Preprints are essentially early versions of papers hosted on publicly accessible servers that all involved can comment upon and bring together as a community; the development of a preprint is an important and valuable part of the peer review process. A common concern amongst scientists contemplating preprints is that it removes the apparently crucial aspects of novelty and priority from groups and limits their ability to publish in the journals with the highest impact factor, and thereby to receive most recognition for the work. Again, the physical sciences have a working model for us. Preprints are considered as the working version and crucially also establish priority over the ideas and narrative, but publication will eventually occur in journals of record appropriate to the field of work. The impact factor of the journal is a lesser concern, and ‘glamour’ journals, such as Nature and Science, are not considered to be the premier place for publishing high-quality work, but are largely vehicles for establishing good PR. Interestingly, in physics, the existence of a preprint does not preclude publication in Nature or Science; there is no reason why this situation cannot apply to plant pathogenomics too.

Will Plant Pathogenomics Change the World?

We will not change the world by finding a cure for cancer. This is the incendiary and downright provocative statement that one of my colleagues once offered as a conversation starter to a friendly cancer doctor. Exactly what sort of atmosphere that created for the remaining nine-and-a-half-hour transatlantic flight they had both just boarded is not recorded, although it is easy to imagine that the doctor would be at least a little stunned by such a statement. That claim is not just bravura; in many ways, the world is ready for the cure for cancer; the global health and logistical systems are in place to disseminate a cure rapidly to where it is needed and its impact would be quickly felt. Probably no-one would think of objecting to the nature of that cure, in whatever form it may arrive, and, although many lives would be extended and much suffering would be avoided, and as wonderful as a cure would be, nothing at grass-roots level would need to change. The world will not work in a fundamentally different way when a cure for cancer arrives.

In plant pathology, we work in a vastly different world to the cancer doctors. We find ourselves continually faced with new, emerging pathogens that threaten ecologically and economically important plants. The spread of known diseases and the rise of new ones are made less predictable and increased by familiar factors, such as shifting climate patterns, population growth and movement, changing land-use patterns and global trade. Plant pathologists have been working for many years and with great success to produce and deliver solutions to these grand challenges, but, as the rate of emergence, and therefore the pressure to find new solutions to new problems, increases, we find that time too becomes a major factor. The speed at which science can react in emergencies is stifled by the day-to-day business of doing science, and many of our current mechanisms for responding to emergent pathogens will prove to be too slow (Kamoun, 2012). In plant pathology, we have a grander challenge yet; as well as finding the cures for these plant diseases, we must also change the world in which we work. With projects such as Open Ash Dieback, plant pathogenomics is in the vanguard of a new movement, and the steps that we have taken in responding to the ash dieback threat are pointing us in a new direction. If we can succeed in following the path further, it will undoubtedly help us to realize our lofty goal of being able to change the world for the better.

Ancillary