Epistemic misalignments in microbiome research

We argue that microbiome research should be more reflective on the methods that it relies on to build its datasets due to the danger of facing a methodological problem which we call “epistemic misalignment.” An epistemic misalignment occurs when the method used to answer specific scientific questions does not track justified answers, due to the material constraints imposed by the very method. For example, relying on 16S rRNA to answer questions about the function of the microbiome generates epistemic misalignments, due to the different temporal scales that 16S rRNA provides information about and the temporal scales that are required to know about the functionality of some microorganisms. We show how some of these exist in contemporary microbiome science and urge microbiome scientists to take some measures to avoid them, as they may question the credibility of the field as a whole.


INTRODUCTION
The microbiome is usually conceived as the set of microorganisms of different species that reside in a specific environment. [1]These environments include distinct body sites of a so-called "host" (e.g., skin microbiome, gut microbiome, vaginal microbiome), or different parts of an ecosystem (e.g., soil microbiome). [2]Microbiome research is the study of the microbiome (e.g., its dynamical stability or how it changes over time), as well as how the microbiome influences the biology of the hosts that are associated with it.This includes status such as host health, but also others such as host development, host physiology and host evolution.Microbiome research thus requires gathering information about the microbiome itself, and also its relationship to the host related to it.To do so, scientists employ different methodologies, and information about the microbiome keeps growing over time.
Given this context, the purpose of this paper is double.On the one hand, we aim to show that the preference for certain methodologies in microbiome research should primarily depend on the research question being addressed, and the opportunities that the methodologies This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.© 2024 The Authors.BioEssays published by Wiley Periodicals LLC.offer for surrogate reasoning.Basically, surrogate reasoning is a form of reasoning about the world that is assisted or mediated by the use of tools such as graphs, or models. [3]We use the concept of "research question" as a technical theoretical term to refer to the type of scientific inquiry that a scientific community is concerned with, and the type of explananda that such inquiry requires. [4]We show how scientists need to choose methodologies that allow surrogative reasoning about the research question they are inquiring about, and we contend that a failure to do this leads to what we call epistemic misalignment, that is, a situation in which even though the data is extremely rich, it is useless for the research question under investigation and its use may lead to scientific failure.On the other hand, we have the more global purpose of showing how philosophical investigation can be extremely useful for improving scientific investigation.

METHODOLOGIES IN MICROBIOME RESEARCH
While microbial research before the 2000s was solely based on the possibility of independently culturing the bacteria (culturomics), BioEssays.2024;46:2300220.
wileyonlinelibrary.com/journal/bies 1 of 7 https://doi.org/10.1002/bies.202300220microbiome research has been relying on several methods to reconstruct microbiome composition since the advent of high-throughput methodologies.In this section, we review the main approaches and systematically link them to the type of research questions these were designed to answer (see, ref. [5, 6] for extensive reviews).
Next-generation sequencing is probably the most used family of high-throughput methodologies in contemporary microbiome research.The first (and still most used) set of methodologies rely on the sequencing and amplification of the 16S rRNA as the phylogenetic marker of bacterial species, and thus, studies based on it ground their insights about the composition of the microbiome on this specific genetic marker.However, due to some specific limitations of 16S rRNA in gaining enough resolution at the strain or even at the species level, some other genes have also been used in recent years, including the 23S rRNA or the internal transcribed spacer.More recent research relies on shotgun sequencing, which sequences the whole DNA of a sample instead of a specific gene or genetic region.This allows a better and more fine-grained analysis of the strains composing a microbiome, even though it only provides resolution at the genetic level and not at the functional or activity levels, with the clear limitations that this imposes.This, in turn, poses a second significant challenge, for it is hard to discard that the detection of microorganisms using any of these methods can result from cross-contamination, which makes necessary the use of other methods that allow better detection. [7]lti-omics approaches, in contrast with previous approaches, allow both genomic and functional analysis of a whole sample of different microorganismal species in a given sample.For instance, metagenomics allows discovering the different bacterial species in a specific sample, thus allowing to discover "who is there."A problem with this technology is that it may result in several false positives.In contrast, functional multi-omics approaches allow uncovering whether the microorganisms that are discovered in a specific sample play a certain role in the microbiome, hence substantially reducing the probability of false positives).Shortly, in contrast with metagenomic analysis, which detects roughly who is there (i.e., the genes or phylogenies of the microorganisms composing the microbiome), multi-omics approaches detect what is being done by whoever is there.Currently, several multi-omics approaches are used to study microbiome composition.Metatranscriptomics, for example, analyzes the mRNA of a sample to determine which genes are actually transcribed into proteins and, therefore, which genes play a role in the function of the microorganisms in a specific microbiome.Metabolomics goes a step further, directly detecting the metabolites that are produced by a given sample, thus reflecting the dependency of the microbial species on one another as well as on their environment and providing information about the very physiological and ecological activities of the microorganisms.Metabolomics allows detecting microbial products such as short-chain fatty acids (acetate, butyrate, etc.), whose specific role in the microbial host can then be studied independently from the microorganism producing them.A final interesting and rather recent example is the so-called culturomics, a high-throughput analysis of the compositions of microbial cultures.The main problem of culturomics consists of the impossibility of cultivating some bacterial species. [8,9] addition to these molecular methods, recent microbiome research has also relied on in vitro models, which basically seek to recapitulate the conditions of the host environment to study microbiome assembly and behavior.These include the Simulator of the Human Intestinal Microbial Ecosystem (SHIME), the Human Microbial Cross Talk or Rapid Assay of Individual Microbiome. [10]In contrast with sequencing technologies or multi-omics approaches, these in vitro models seek to simulate the conditions of a specific tissue or organ (mostly the gut) to understand microbiome assembly and behavior in "real" conditions.While these methods do not directly allow investigating microbiome composition (i.e., they rely on the other methods to determine phylogenetic composition and/or microbial function), they serve to test microbiome reactions to specific substances (drugs) or extra microbial components.
Direct interventions on the microbiome are also possible.For example, therapeutic interventions such as fecal microbial transplantation, currently extremely popular in microbiome research, are considered a method to conduct experimental research on the microbiome.As in the case of in vitro methodologies, these methodologies do not directly provide information on microbiome composition.However, they enable investigation of how the microbiome changes due to changes in the specific ecological equilibria reached by their component microorganisms.
Finally, the advancement of computational methods is triggering the appearance of network analyses of the microbiome.These network analyses seek to understand the ecological principles allowing the generation of stable microbial communities that are able to synthesize specific components or produce certain microbial functions.An advantage of the use of these network approaches is that they allow uncovering the general principles guiding microbiome assembly, and in doing so allowing the detection of biologically important clusters. [11,12]ese methods can even rely on the use of machine learning. [7]The analysis of functional networks in turn allows the development of the so-called minimal synthetic microbiomes, such as the microbial ecosystem therapeutics or the synthetic gut community. [13]The term "minimal" derives from the necessity of a specific number of species to make the community stable.An important problem with this approach, though, is that several bacterial species still remain uncharacterized, and even if minimal microbiomes give important insights of the hostmicrobiome relationship, their potential to allow direct interventions remains to be explored.

Scientific methodologies and scientific aims
From our perspective in this section, what is more relevant is to understand that each of these methods has been developed with different scientific aims in mind.That is, scientists developing these technologies did so, ultimately, to answer a specific set of research questions.
For example, 16S rRNA was chosen as the most appropriate gene to determine phylogenetic relationships. [3,14]In fact, 16S rRNA was key to discovering the three domains of life, and its choice as the chronometer from bacterial species classification is grounded on extremely important properties of the molecule for such purpose.However, it was soon clear that 16S rRNA sequencing was not fine-grained enough because different bacterial strains may share the same 16S rRNA profile.Thus, new methods were required to reach the strain level, which is extremely important for correct phylogenetic comparisons.In this spirit, technologies such as shotgun sequencing were introduced, as they allow more fine-grained analyses.
The case of multi-omics is important since these technologies originally developed due to the realization that knowing "who is there" (i.e., the phylogenetic composition of the microbiome) does not provide enough information to know what she is doing and why what she is doing is relevant. [15,16]In fact, the information captured by metabolomics is considered the best indicator of a host's health, being a reliable source to measure the probability of dysbiosis.Additionally, some have even argued that metabolomics holds the key to the improvement of personalized medicine, as the metabolites in an organism directly relate to many of its potential diseases.In this case, thus, the development of multi-omics was mainly triggered by the need of better tools to investigate the relationship between a host's health and its microbiome.
In vitro models, in contrast, were introduced as soon as it was perceived that the host environment has an important influence on the behavior of the microbes.Thus, testing drugs in an in vitro model was a good way of ensuring that the results would be trustable before the patients were treated.
Finally, network models were introduced to account for the ecological interactions between the members of the microbiome, as well as to uncover potential biological clusters.While they are limited due to their reliance on bioinformatics, they can be extremely useful to predict the usefulness of some treatments as well as the build synthetic stable microbiome clusters.

SURROGATIVE REASONING AND EPISTEMIC MISALIGNMENT IN SCIENTIFIC RESEARCH
In the pursuit of their research, scientists, when they have to explain a phenomenon or even just try to account for its behavior, or when they "simply" have to think about it, do not merely reason in the abstract.
Instead, they often adopt particular tools.These tools may include certain models (whether embodying a mechanism as it is classically the case in molecular biology or formally representing relationships within phenomena as in systems biology or bioinformatics), or specific experimental approaches or peculiar instruments (such as those relying on the amplification of the 16S rRNA) that act as surrogates for the reasoning around the object of their investigation.In other words, such objects help scientists to reflect on a certain problem in a more circumscribed, controlled, and precise manner than simply thinking in the abstract.In this sense, therefore, one speaks of "surrogacy." [3]e choice of a surrogate is not only the subject of a deliberate/arbitrary decision on the part of the researcher, but is also shaped and conditioned by material constraints, especially in the case of specific technologies or experimental techniques.In this sense, not all the tools that scientists choose to use in lieu of reasoning can serve this purpose.For example, genes that are subject to selection pressures are known not to be the best choice for phylogenetic hypotheses, while structural genes are much better for this purpose, yet they are not so useful to predict the traits that an organism would express. [14] the material conditions (such as specific instruments, experimental apparatuses, and protocols, but also the choice of model organisms, homologous genes, etc.) would not shape the choice and effectiveness of surrogates, every instance of surrogate reasoning would be equivalent and would always serve its purpose.If this were the case, not only would one lose the specificity of a particular instance in relation to another, but one would not even be able to understand how this specific resource contributes differently in order to provide a greater understanding of the phenomena at play.Indeed, for a certain technological device to fulfill the characteristics of surrogate reasoning, it must exhibit certain material features, such that the surrogate and the phenomenon that is investigated by means of it follow the same pattern of temporal change.This is especially the case in microbiome research, where "big questions" are answered by relying on highly heterogeneous surrogates (section 2).This is so in spite of the fact that various investigation programs in the field of microbiome research are exploratory in nature (see e.g., [17,18] for the notion of exploratory research).Exploratory research usually presents greater epistemic freedom regarding experimental design and protocols.Yet it would nevertheless be simplistic to ascribe total methodological freedom to such projects with regard to theoretical frames of reference.As Waters has indeed shown, such scientific endeavors are multidimensional and combine more exploratory aspects with others more within a precise theoretical framework.Therefore, the problem of possible epistemic misalignment applies even more in such a scenario (on this point see also [19] ).
Indeed, epistemological analysis has thoroughly investigated these dimensions, producing various concepts and analytical tools to account for it.Based on these reflections, we will argue that for an object or procedure to function as surrogate or to enter into surrogate reasoning adequately, it must be capable of answering those research questions that are congenial to the research field.When a tool is used to answer questions for which it was not designed or constructed, from the point of view of surrogate reasoning, it is as if categorical errors were committed.In other words, it is as if the inferential process is compromised and is "out of focus."It follows that the knowledge content offered by surrogate reasoning cannot fulfill the aims embedded in the experimental hypothesis.When this happens, one speaks of epistemic misalignment.This misalignment, as a matter of fact, makes effective and efficient surrogate reasoning practically impossible.

EXAMPLES OF EPISTEMIC MISALIGNMENT AND WHY IT AFFECTS SURROGATIVE REASONING
In this section, we will briefly examine some aspects that show how a poorly critical or reasoned application of certain approaches or methods can be described as an epistemic misalignment.Moreover, we will show how this situation impacts on the conclusions of scientific practice.
Especially in medical microbiome research, the need to characterize the microbiome and its functionality appears increasingly crucial.
Given the influence of the microbiome (especially the gut one) on human health, [20][21][22] the researchers focused their investigations on determining what in the microbiome contributes to a healthy condition of the organism and, conversely, when the microbiome contributes to the onset or development of pathological situations.
For this reason, and because of the compositional nature of the microbiome, a considerable amount of research has focused on determining the structure of the human microbiome.The search for a "microbiome-molecular signature" of the healthy microbiome soon became extremely relevant to the point of developing the two concepts of "eubiosis" and "dysbiosis" to represent healthy and deleterious conditions respectively. [23]1 , 24] This experimental and theoretical stance (i.e., understanding "who does what") has obviously stimulated and contributed to the production of specific methodological approaches (e.g., the study of the molecular pathways involved in the host-microbiome interactions) and technological apparatuses (e.g., 16S rRNA amplification or the omics) to answer these specific questions concerning microbial composition.
, 25] Let us now briefly summarize the main problems of the mismatch between taxonomy and function.
• In the study of molecular mechanisms, which are, in fact, representations of causal networks, it is essential to know the actors in these architectures.Many mechanistic studies have indeed succeeded in cataloging a great variety of microbial species associated with humans.However, the functioning of the complex interactive structure of symbiont-host functionality does not appear to be individually derivable from this bulk of knowledge.This is also due to the fact that taxonomic characterization is based on the presence or absence of certain genes but tells us little about their transcriptional context and, thus, their expression.In order to understand the function, it is, therefore, necessary to consider not only the metagenomic aspect (e.g., "who is there") but also the metatranscriptomic, metaproteomic and metabolomic aspects (e.g., "who is activated, produced and how they interact with other gene products"). [24] The discrepancy between metagenomic and other (more dynamic and interactional) omics profiling shows how the results between 1 Which also received a lot of criticism.More on this later.
the two types of analysis are not immediately superimposable (but rather should be integrated).Furthermore, considering composition and function from an evolutionary perspective, and including the co-evolutionary aspects between the microbiome and the human species, it is possible that what is conserved is not that specific assemblage of populations and species but certain "interaction patterns" that are gradually modified and reconstructed. [26] The friction between taxonomic classification and functionality is particularly interesting when discussing, for example, the functional distinction between beneficial bacteria and pathogenic organisms.Some authors have already systematically explained how this distinction presents methodological difficulties in considering the nature of associated microorganisms from an essentialist perspective. [27]In this respect and concerning the issue of misalignment (from a methodological perspective), an interesting case in point is constituted by some studies on the role of Enterococcus faecalis in relation to certain oncological diseases.The difficulty arises because part of the literature considers E. faecalis as potentially baneful while other studies suggest a positive role.In a recent review, [28] some researchers argued that this seemingly paradoxical effect could result from studying different strains of E. faecalis in isolation.E. faecalis (like perhaps many other microbial species) is in fact subject to frequent mutations in its different populations, both in response to environmental conditions and interaction with different systems of the host organism and by horizontal gene transfer (HGT).
Moreover, such contextual differences can easily cause its functionality to "fluctuate" (from beneficial to detrimental to the host organism) even if the concentration of E. faecalis in the population keeps stable).
• The ecological nature of the microbiome and the fact that this feature is central to future therapeutics.Indeed, another central aspect concerns the ecological nature of the microbial communities associated with the host organism.Their activity (both in terms of their own genetic functionality and that of the context in which they operate) is closely dependent on environmental and interactional dynamics that make experimental manipulation in a strictly mechanistic key much more difficult.This means that the study of possible therapeutic interventions (such as fecal transplantation, prebiotic and probiotic modulation, or the design of engineered bacterial strains) necessarily depends on the ability to study the capabilities/properties of such ecological communities.31][32] • The evolutionary nature of the host-microbiome relationship requires taking into account phenomena that are not always considered as key markers of evolution.For example, understanding microbiome-host evolution requires knowing community genetics and understanding how inter-species epistatic effect may be a mark of joint selection. [4]Additionally, evolutionary effects on the microbiome include phenomena such as horizontal gene transfer, which has been documented as enhanced in the context of a microbiome as well as a signature of joint host-microbiome evolution. [16,33]en these phenomena occur, phylogenetic methods become insufficient to understand how the host-microbiome relationship will work, as phylogenetic markers are usually not involved in HGT or epistasis.
• The friction between function and taxonomy is not just a theoretical controversy.A recent perspective paper on sequencing techniques and other computational methods sums up very well how answering certain questions of a functional nature mechanically using methods designed to answer other types of question is anything but undisputed.As these researchers write: "Overall, it is intuitive and generally thought that function is much more informative than taxonomic information since it is what the organisms do that we care about and not who they are.Indeed, it has been noted by several groups that function seems more highly conserved across samples than across taxa, suggesting that function is more resilient across communities than the individual strains that come and go.
However, comparisons of taxa and function conservation that were more technical and philosophical in nature have suggested that these comparisons are not meaningful due to their being based on completely different scales.For example, are metabolic pathways equivalent to taxonomic phyla, genera, or species?The problem is that, although taxa and function are linked, it is impossible to access them on similar scales.As expected, when using comprehensive gene families instead of broadly conserved functional pathways, functional conservation disappears.Therefore, describing functions as being more conserved than taxa is an artifact of the methods and databases used in the comparison rather than an actual biological statement.Nonetheless, function provides information on possible mechanisms present between microbes and in microbehost interactions.These interactions are essential for understanding and modeling microbial communities, especially with respect to the various microbiome-related diseases." [34] Last, but not least, such a change in perspective, therefore, also requires updating, modifying, and possibly integrating research methodologies.This clearly shows how new types of questions can only be researched after new research tools have been put in place. [24,35]Regarding this point, it is also crucial to remember, both from a theoretical and practical point of view, that the use of new methodologies and the integration of different approaches is never a straightforward operation.Large-scale computational studies certainly offer the possibility of mapping many different elements within a common framework.However, the correlations (even strong ones) that these approaches can produce are based on specific assumptions that always and primarily concern the phylogenetic dimension.Moreover, such studies are not immune to potential methodological errors that also reveal a "misalignment" between technological tools that are deployed and the research questions of interest.For example, a recent study showed how the precise association of certain species or genera of microorganisms with certain types of cancer revealed some crucial problems. [36]Firstly, the genome databases used reported sequences of bacterial species never associated with the human species but rather with extreme environments or other species.Perhaps more attention to the context of biological assumptions should have raised suspicions about that.Secondly, false positives in the bacterial reads on the one hand and an amplification by machine learning models of errors in the data processing led to the creation of false categories that clearly associated certain types of tumors with a precise "microbial signature." Without wishing to go into too much detail, this case also shows the need to think in more detail, and at the experimental design stage, what questions one wants to try to answer and why certain approaches or experimental tools would be the most suitable and adequate ones to do so.

REDUCING THE RISK OF EPISTEMIC MISALIGNMENT: POTENTIAL AVENUES
The presence of biases and epistemic errors in research settings is a relatively common phenomenon, both ubiquitous and not completely eradicable, as part of these derive from theoretical and cultural biases that are not simply errors in reasoning. [37]This does not entail that "anything goes'" (pace Feyerabend), or that methodological rigor and careful discussion of the experimental design should not be a primary concern of a scientific researcher.As a matter of fact, reflection on these very biases may allow to overcome their negative epistemic effects.
Particularly, in scientific studies (even in so-called data-driven research), the choice and discussion of initial hypotheses, models, and methodological approaches remains a crucial element of the research.
That is also discussions on questions (such as "what is evidence?"or "what is good evidence?"or again "what counts as evidence?"or "why is this the adequate method to address this issue?"or finally "how far can we extend these results to similar scenarios?")concerning the nature of the evidence, the methods used and the degree of generalizability of the results (and thus genuinely epistemological questions, see [38] ).
In this regard, as recently discussed about the misuse and distorted uses of statistical methods in scientific research [39] (including microbiome one), we believe that a first way to diminish the risk of epistemic misalignment for a researcher is to try to critically ask themselves whether the technologies and apparatuses they intend to use are really useful in answering the questions they are interested in or whether these technologies are the most appropriate in capturing their working hypotheses (see also). [40]This entails that researchers must ask further questions to the ones they already ask when deciding to use a specific methodology to answer a research question.For example, if their inquiry is about ecological relationships, they may wonder whether the use of network models is a good method by itself, or whether they need to complement it with something else.Or, if the research concerns uncovering the effects of some microorganisms for host health, it may be useful to ask whether 16S rRNA is a good surrogate for this question or whether it may be better to rely on metabolomics.The scientists must thus ask: what information does this methodology exactly provide?What is it a surrogate for?Does it capture the adequate rate of change that I am trying to measure?What is the probability of cross-contamination? Another important potential avenue for research consists in asking about the possibility of integrating different methods to answer the research questions.A well-known position in philosophy called integrative pluralism suggests that the sciences of complexity, including biology, require explanations and research combining methods at different scales. [41]Applied to the case of microbiome research, the idea is that molecular data needs to be made coherent with ecological, physiological, and metabolic data to obtain an empirically adequate scientific image of what is going on in the microbiome and how the microbiome affects their hosts.While integration is not easy to obtain, it is still a good way of avoiding undesirable misunderstandings, as well as avoiding mistakes that may arise from an exclusive reliance on a single method or a single scale.The talk across scales can definitely diminish the risk of epistemic misalignments, as the scaled one will show opposing results when the latter occur.
In this vein, we contend that the epistemic and non-epistemic biases that may arise in microbiome research due to the presence of epistemic misalignments can be overcome.One way of doing so consists in asking more sophisticated questions and reflecting more deeply about the real potential of the methodology that is being used.Another one, consists in trying to integrate the information derived from the use of different methodologies across different scales.

CONCLUSION
Overall, our paper shows why microbiome scientists must, in the interest of the reliability and soundness of their outcomes, take into account the type of inferences that are allowed by the technological tools they are employing before they use the dataset produced by them to answer a research question.This is particularly important for two reasons: on the one hand, failing to do so may generate answers that are ultimately misaligned with the questions being asked.This may lead to scientific failure, for even if the answer happens to be correct, it would be so for the wrong reasons (meaning also that the conclusion is not adequately justified).While it is obviously possible and legitimate that science produces wrong answers, since this is inherent to the nature of scientific research and methodology, it is not permissible that it does so for the wrong reasons.On the other hand, because of this first point, it may turn out that the field as a whole loses respectability, and thus the extremely valuable insights that it may produce, would be ignored due to the epistemic problems related to the (mis)use of the different methodologies.
To avoid any of these consequences, we advise microbiome researchers to be more explicit about the reasons why a given technology has been selected or used to answer a concrete question.
Particularly, scientists should reflect on what exactly the datasets generated by the technology ultimately track before they re-use it for answering microbiome related questions.In this way, the perils of epistemic misalignment that we have so far described would become minimal, and microbiome science would become a more reflective sub-field.Overall, thus, microbiome research would become more solid, and would gain the relevance that it really deserves.