An empirical exploration of the subjectivity problem of information qualities

Information qualities such as usefulness, accuracy, and comprehensiveness are to some extent subjective. Information resources have different meanings to different people and at different moments. This apparent subjectivity hinders indexing based on qualities for retrieval and filtering purposes. We conceptualize this as the subjectivity problem and address it through two studies. Study One explores whether, on public fora, people consider qualities as claims they should agree upon. Study Two explores, through a vignettes study, which conditions foster this inter‐subjective validity of quality claims. We conclude that information qualities become agreeable given the right set of conditions. We discuss the need for transparency about information qualities and quality considerations in order to offer these conditions to end users.


| INTRODUCTION
The quality of information is seemingly hard to agree upon.Well documented are cases of echo chambers where users hold distinctive views on information quality as well as cases of misinformation where users seem to disagree about the veracity of disseminated knowledge (Mößner & Kitcher, 2017;Nguyen, 2020).Within academia, unwarranted rejections of high-impact works are likewise not uncommon (Campanario, 1996).Such disagreements can both restrict the distribution of important findings as well as facilitate the distribution and uncritical acceptance of information of questionable quality.The societal implications of these disputes are similarly well documented.Problems associated with information quality have been identified as sources of political polarization, vaccine hesitancy, and the proliferation of conspiracy theories (Kahan, 2015;Kidd & Birhane, 2023).Furthermore, a perceived lack of consensus stands as a core factor hindering progress on several critical issues, including climate change, responsible consumption, and public health (Oreskes & Conway, 2011).Intriguingly, the extent of dispersion and disagreement regarding information quality may even play a role in solidifying political regimes (Helland et al., 2021).This makes the agreeableness of information quality and qualities a salient topic with profound societal relevance.
Information qualities likewise play a pivotal role in the operation of information retrieval and recommender systems.These systems tend to group items together because they share certain qualities, making them relevant to similar user requests and similar users.For information retrieval, this idea is chiefly captured by Van Rijsbergen's (1979) clustering hypothesis: "closely associated documents tend to be relevant to the same requests" (p.30), which asserts that similar items behave similarly with respect to a query or information need, presumably because these items share certain qualities.For recommender systems, this idea is illustrated by item-to-item collaborative filtering, where items are matched because they engage the same users, presumably because they share certain engaging qualities.Conversely, in user-to-user collaborative filtering, users are matched because they presumably share preferences or criteria for resources.The extent to which qualities are agreed upon is empirically assessed through these operations, utilizing implicit feedback to measure whether users react similarly to an item.These operations suggest that they rely at least in part on the existence of certain qualities that are perceived as such by users and shared by items.
The importance of information qualities, however, varies amongst users and across requests.This variability is evident in the diverse criteria users employ during their search processes (Barry & Schamber, 1998;Saracevic, 2007;Schamber, 1994).These criteria cover expectations about qualities an information resource ought to have.For instance, a resource is to be understandable, interesting, useful, and culturally appropriate.
Seen from this perspective, information qualities are an instantiation of relevance criteria.Users might hold a range of criteria for the information resources they seek, whereas the qualities of a resource determine whether it aligns with these criteria ( Van der Sluis, 2022).This perspective implies that information quality is highly subjective and contingent on the specific criteria users apply at any given moment.The notion of information quality, however, emphasizes that information resources must not only meet the criteria established by individual end users but also conform to the standards within a specific domain (Hjørland, 2011;Mai, 2013b).Recognizing the reciprocity between criteria and qualities, it is well acknowledged that qualities are subjective in nature (Borlund, 2003;Cosijn & Ingwersen, 2000;Ingwersen & Järvelin, 2005;Mizzaro, 1998;Saracevic, 2007) yet simultaneously dependent on a complex relation between authors, readers, inter-textual knowledge, and cultural norms and values (Mai, 2013b).
Both in contemporary conceptualizations and implementations, qualities are considered as highly subjective and contextualized, bounded by certain users or certain requests (Mai, 2013b;Saracevic, 2007;Van Rijsbergen, 1979).This makes it unlikely that users perceive the quality and qualities of resources in the same way.This apparent subjectivity of information qualities complicates the possibility of encoding qualities as attributes of information resources; they "remain difficult to operationalize especially those that reflect more subjective assessments of information" (Pennington & Ruthven, 2018, p. 141).In fact, already seemingly more objective attributes like the topicality or aboutness of documents give rise to vastly different interpretations (Rafferty, 2018).We refer to this as the subjectivity problem: The gap between those qualities that users and communities seek in their information resources and what ostensibly can be encoded as attributes for retrieval and recommendation purposes.
Empirical studies illustrate the apparent subjectivity of information qualities.Qualities are often studied as subjective phenomena (Ghasemaghaei & Hassanein, 2016) and studies generally report low levels of agreement between raters when judging qualities.Amongst others, Arazy and Kopak (2011) found an interrater agreement of 6%-16% (intra-class agreement, analogous to Fleiss' kappa) for 3-6 raters on four qualities with some differences between qualities; Shah and Pomerantz (2010) found that 5 raters agree in $27% of cases when asked to rate 13 qualities; Kakol et al. (2013) found an overall kappa of 20% for an average of 28 credibility judgments per document, but also noted an overall higher level of alignment between expert raters, and; Mao et al. (2016) found a Cohen's kappa of 32% between user and annotator assessments of usefulness.These findings suggest that users overall might not agree on assigning qualities to resources, but also point at factors like the expertise of assessors and the context to the assessment that might heighten their potential agreement.
Little empirical work has investigated reasons why information qualities are seemingly difficult to agree upon.We consider either of two possible sources of difficulty explaining the seemingly marginal levels of agreement over qualities.We evaluate: (a) To what extent people consider quality judgments as claims that have inter-subjective validity-to be valid, not only for the one who judges but also for others, and; (b) To what extent different qualities and different conditions heighten or lower inter-subjective validity.Essentially, this investigation asks whether quality judgments are considered the product of our individual and subjective experience or if they claim to be true for other people-that is, they have inter-subjective validity-and under which conditions that might be the case.
We divide this investigation over two studies.First, in Study One (Section 2) we present a qualitative study of quality discussions on Internet fora.We explain how people support their quality claims using practice theory as a novel theoretical framework for interpreting quality discussions.Second, in Study Two (Section 3) we present a quantitative study of the perceived inter-subjective validity of quality claims.We explore how to increase intersubjective validity using vignettes as a novel method to scale a range of conditions in a structured way and quantify their contribution to perceived inter-subjectivity.This mixed method approach gives us rich, detailed observations together with structured, quantitative data to draw a comprehensive picture of the inter-subjective validity of quality claims.Finally, in Section 4 we argue for the importance of transparency on quality claims and the conditions in which they are made in order to align information resources and users alike.

| STUDY ONE
Study One provides an initial exploration of the intersubjectivity of quality claims and the supporting context by examining discussions on information quality in public fora.Such discussions indicate that interlocutors assume that their claims apply to others.The arguments presented in these discussions furthermore shed light on the extent of their validity, indicating whether they are confined to the participating interlocutors or are supported by more widely accepted, contextual references.
Previous studies have shown that qualities are commonly discussed online (Savolainen, 2011(Savolainen, , 2023;;Stvilia et al., 2008).A well-known example is Wikipedia's talk pages, where an article's contributors argue in a collaborative setting about a pre-specified set of qualities that Wikipedia articles are expected to have (Stvilia & Gasser, 2008;Xiao & Askin, 2014).Several studies furthermore showed that qualities and quality issues are also raised on public, online discussion fora.Notably, Savolainen (2011Savolainen ( , 2023) ) focused on online debates (e.g., about racism) and controversial topics (e.g., COVID-19 vaccines) that provoke arguments about the correctness of information and believability of claims presented by participants.Explicit judgments of quality were common on these fora with 971 posts containing 1479 judgments about either information quality or author credibility (Savolainen, 2011).Different qualities were referred to for different topics, which suggest each respective community sets its own quality standards.Online debates (Savolainen, 2011) and discussions on controversial topics (Savolainen, 2023), however, showed a strong tendency to target the arguer rather than the argument, discrediting the author rather than attempting to create a mutual understanding of quality.This raises doubts on the extent to which interlocutors assume the inter-subjective validity of quality claims beyond collaborative settings such as Wikipedia.
In contrast to examining collaborative settings, online debates, and controversial topics, Study One takes a different approach by analyzing discussions about information quality on more everyday topics.Specifically, we focus on topics that exemplify a practice, drawing from MacIntyre's theory of practice and virtue.According to MacIntyre (2007), a practice refers to a coherent and complex form of socially established cooperative human activity that realizes internal goods associated with that activity.In other words, it extends beyond individual tasks and encompasses a broader collective system of virtues and standards of excellence.For instance, while "throwing a football with skill" or "planting tulips" are not considered practices, "football" and "gardening" are recognized as such (MacIntyre, 2007, p. 187).Participants within a practice strive to uphold the adopted standards of excellence and derive intrinsic, internal goods from their engagement.Given the collective character of practices, practice theory likely aligns well with the open nature of public online fora.By adopting a practice theory perspective (cf.Mai, 2013a), we provide a novel framework for comprehending and interpreting quality discussions on public, online fora.
We expect quality claims to be particularly prevalent and well-defined within practices.Within a practice, it can be objectively right or wrong to call something good.What is good becomes a matter of fact given a specific purpose, tradition, or function (MacIntyre, 2007).Provided that practices are socially established, formal and objective (Donozo, 2014), anyone discussing a practice must likewise acknowledge the rules and criteria associated with it.As a consequence, quality discussions may be settled without reference to subjective feelings or an all-knowing authority.They can rather be settled with reference to the standards of excellence shared by its collective.This suggests that, within a practice, quality judgments should be regarded as claims, not feelings, that can be defended and debated and are applicable to other people: A practice plausibly offers the context needed for inter-subjective validity.
Study One surveys quality discussions on four online fora: cooking, fashion, football, and politics.Each of these topics can be thought of to reflect a practice in MacIntyrean sense.They are coherent in the sense of a meaningful whole that is directed toward a common goal; respectively, to have a good meal, a good match, good governance, or look good.They are cooperative, where participants are being helped and guided by others and with a particular set of rules of conduct.They are complex, arising from an interaction between the present and a rich history of concepts, theories, and events passed on through traditions.And, finally, they are socially established, long-lasting, and continuous activities performed by groups of people.Given the alignment of these fora with these different notions of what a practice entails, we expect quality assessments to be prevalent on these fora, similar to their prevalence in comparative studies (Savolainen, 2011(Savolainen, , 2023;;Stvilia et al., 2008).
Study One serves to characterize information quality considerations on the chosen fora.The goal is to analyze discussions arising about qualities and interpret them using practice theory.The defense of quality claims and especially the use of references to constitutive elements VAN DER SLUIS ET AL. of their respective practices would support the inter-subjective validity of these claims as well as offer context to their inter-subjectivity.Furthermore, we will evaluate the prevalence of certain qualities over others as to inform which qualities to consider in Study Two.We will approach these goals in an exploratory manner.While a sizeable sample is described, the objective is not to provide a comprehensive "quantitative picture of the nature of the judgement of information quality and credibility" (Savolainen, 2011(Savolainen, , p. 1249) ) or an exhaustive and generalizable picture of information quality discussions within a given practice.Rather, we choose a sample from each practice based on recency and popularity of discussion threads for an initial exploration of how qualities are discussed in their respective practice.

| Methodology
The methodology for this study involved the analysis of internet fora, focusing on judgments of information quality and author credibility.The following steps were applied: (a) 16 large discussion threads were selected from four public online fora; (b) Between 500 and 800 posts were systematically sampled per thread, until a dominant quality appeared with at least 15 mentions; (c) Posts that explicitly referred to the quality of another post or the credibility of its authors were selected for analysis, and (d) Savolainen's (2011) coding scheme was applied to classify these remarks by quality and valence (positive or negative), using 13 coding labels for information qualities and 13 for author credibility.
The data used for this analysis were collected from the American discussion platform Reddit, one of the largest and most popular online fora in the English-speaking world.A detailed corpus selection process was employed to ensure the selection of relevant and popular discussion threads.The study focused on posts from four different subfora (politics, football, fashion, and cooking) sampled across four timeframes (April-June 2022, July-September 2022, October-December 2022, January-March 2023).Preliminary analyses from the sample of April-June 2022 were reported in Van Der Sluis et al. (2023).Coding was conducted through an iterative process with code verification overseen by a reviewer.This approach maintained the accuracy and consistency of assigned codes.In total, 21 coding labels were encountered, allowing for the identification of dominant qualities within each discussion board.
A total of 676 explicit quality or credibility judgments were observed in 480 posts.The resulting selection covered 159 judgments for cooking, 161 for fashion, 186 for football, and 170 for politics.This extensive selection allowed for the analysis of quality claims and their argumentation.A detailed description of the full methodology, including the coding scheme and corpus details, is provided in Supplemental Material Section S1.

| Results and interpretation
Figure 1 displays the occurrence counts of codes for both positively and negatively valenced posts.In total 496 (73.4%) quality judgments and 180 (26.6%) author credibility judgments were observed with 263 (38.9%) showing positive valence and 413 (61.1%) negative valence.Amongst these, the topics of football and politics demonstrated increased negativity rates (79.03% and 74.71%, respectively), but retained an overall focus on information quality rather than author credibility (25.27% and 24.12% credibility judgments, respectively).The most popular qualities overall were correctnessfalseness (281 total occurrences), comprehensivenessnarrowness (139 total occurrences), similaritydissimilarity (75 total occurrences), author expertise/lack of expertise (56 occurrences), and usefulness-uselessness (49 occurrences).We will provide detailed descriptions of these qualities; however, we will exclude the quality of similarity-dissimilarity as it pertains to utterances of (dis)approval without further argumentation.Any instances of argumentation will be categorized under other relevant qualities.
In the following, we will highlight quotes to illustrate the quality judgments as well as their relation to the practice in which they are embedded.Quotes were selected to be representative of the topics, with a preference for shorter posts to ensure a condensed format of presentation.

| Correctness/falseness
The correctness/falseness of information covers "The extent to which information provides a true description of reality" (Savolainen, 2011(Savolainen, , p. 1248)).With 78 judgments in fashion and 77 in football, against 64 in cooking and 62 in politics, this quality attribute was prevalent in all practices surveyed.Quality judgments on this attribute were of mixed valence, with 154 being positively valenced and 127 negatively.Comments in fashion and cooking were overall positively oriented (95 positive against 47 negative), while similar comments in football and politics were overall negatively oriented (80 negative against 59 positive).
Comments on correctness-falseness relied on various sources of knowledge.Participants raised primarily personal observations to underscore or refute a previously shared description of reality: "Omg, I could have written this comment word for word.I can't explain how it happens so often."(fashion, oct-dec 2022) "I don't know what you're talking about, I've made red onion soup before and it came out fantastically" (cooking, jan-mar 2023) "I've personally witnessed 2 coworkers lose a non-compete.I work in the healthcare industry and no we are not doctors.Both lost and owes $50,00" (politics, jan-mar 2023) correctness/falseness-personal observations The first comment confirms a similar experience with a type of clothing, the second a different experience with an ingredient, while the third shared a different observation on non-compete clauses.In doing so, participants relied on firsthand knowledge gained from active participation with the practice to contribute to a collective understanding and evaluation of a depiction of reality.
Rather than personal observations, common knowledge of a practice was often highlighted in discussions on football and politics: "this is incorrect: the ref is and always was able to overrule the linesman" (football, apr-jun 2022) "And it's not true.Very few things can be passed in the senate with 50 votes."(politics, apr-jun 2022) "(…) A labour market without checks and balances is not healthy for neither employers nor employees."(politics, jan-mar 2023) correctness/falseness-common knowledge The first two comments appealed to knowledge about specific rules and procedures.The third comment emphasized a commonly accepted narrative in governance.These comments appealed to knowledge that others should possess in order to make valid contributions to the ongoing discourse.
F I G U R E 1 Occurrence of codes as counted in 16 discussion threads covering four topics.Counts are multiplied by their valence, either positive (1) or negative (À1), and coding labels categorized as either credibility (C) or quality (Q).Coding labels are divided into groups based on their total sums, separating those with lower totals from those with higher totals.
Both sets of quotes demonstrate how participants in these discussions draw upon personal or common knowledge to assess the accuracy of statements.This knowledge is derived and validated from their engagement with the practice, either through firsthand participation or by acquiring a comprehensive understanding of it.By highlighting their active involvement and knowledge, participants not only contribute to a shared understanding but also assert their own authority in evaluating and challenging claims.

| Comprehensiveness/narrowness
The comprehensiveness/narrowness of information covers "the extent to which information covers a broad range of facts and opinions" (Savolainen, 2011(Savolainen, , p. 1248)).Quality judgments on this coding label were overwhelmingly negative (135 negative against 4 positive judgments) and particularly prevalent for the forum threads of politics (53 judgments) and football (39 judgments), while less common in fashion (30 judgments) and cooking (17 judgments).
Within football, and to a lesser extent within other topics, many commenters emphasized the incompleteness of other messages: "We're not playing shit tho, we're playing good.The finishing is shit but look at the stats, we dominated.Tempo fell off after the 4th." (football, apr-jun 2022) "First male, Marta of Brazil has already done it on the womens side" (football, oct-dec 2022) "Not necessarily the older ones, just the more educated ones."(politics, apr-jun 2022) comprehensiveness/narrownessincompleteness The first quote, discussing a recently lost match, refers to other standards of excellence in football besides winning.The second quote contributes to a discussion about the primacy of a recent achievement and puts it into perspective of other earlier events.The third comment emphasizes that an earlier post incompletely specified a group of beneficiaries.
Within the other topics, quality discussions mainly supplied extra arguments in favor or against a previous argument: "Rich people are not the only people who rely on the stock market.Its the only way regular people will retire."(politics, apr-jun 2022) "Also your ability to taste bitter diminishes with age so even if the Brussel sprouts were the same, you would perceive them differently."(cooking, oct-dec 2022) "And white sport socks with black leather slip on shoes."(fashion, jul-sep 2022) comprehensiveness/narrownessadditional arguments Here, the first comment offers an additional group that benefit from stock market participation, the second comment adds a reason for changing food preferences, and the third adds to a list of bad fashion choices.
The illustrated comments about comprehensiveness each referenced standards of excellence, past events, or narratives characteristic of their practice, such as past matches in football, fashion standards, or known discourses in politics.The comments overall did not explicitly show any form of agreement or disagreement.They rather contributed relevant knowledge omitted in the original message in a constructive way.

| Usefulness/uselessness
The usefulness/uselessness of information covers "the extent to which information is considered as helpful to meet the need of a person or a group" (Savolainen, 2011(Savolainen, , p. 1248).Judgments on this attribute were of mixed valence (25 negative and 24 positive judgments) and more prevalent during discussions about cooking, fashion, and football (15 occurrences each) than politics (6 occurrences).
Within fashion and cooking, commenters mostly expressed gratefulness for earlier recommendations: "This is very helpful and gives me a good idea of what to look for, thank you!" (fashion, janmar 2023) "Thank you for all the suggestions!I do currently wear mens shirts, which work great!I will also check out the plain t-shirt sections at hobby stores (hobby lobby/Michaels) next time!Ya'll rock!:)" (fashion, apr-jun 2022) "I just tried this.Thank you for the recommendation!It was a hit with everybody" (cooking, apr-jun 2022) usefulness/uselessness-helpfulness The first two comments shared gratitude for a shopping or fashion suggestion, the third for a recipe.All referred to how it benefited their (planned) activity and helped achieve their goal(s).
Across topics, commenters criticized the relevance of others' comments: "Who cares.They still on top.One loss means nothing.Learn from it and move on."(football, apr-jun 2022) "That isn't relevant to the discussion of whether they look good outside the show which is what this post is about" (fashion, jul-sep) "I'm sorry if you can't understand what that's like, but belittling struggling citizens who are disappointed that they were lied to by the current admin is not an effective strategy to get out the vote" (politics, apr-jun 2022) usefulness/uselessness-irrelevance The first comment contended that the primary objective in football is winning the championship, the second emphasized the goal of looking fashionable, and the third highlighted the overall goal of motivating voting.In doing so, these comments delineate the discussions by referring to the goal(s) of the practice.
These comments either formulated usefulness in terms of (ir)relevance to the ongoing discussion or helpfulness to an activity.In both cases, usefulness was justified by explicitly referring to the goals or purpose common to the practice.They framed the usefulness of information in terms of enabling participants to achieve a certain level of excellence while performing their respective activities.

| Expertise/lack of expertise
The expertise/lack of expertise of authors covers "the extent to which the author is considered as competent in a specific area" (Savolainen, 2011(Savolainen, , p. 1248)).Judgments on this attribute were mostly negative (51 against 5 positive judgments) and were found in discussions about football (25 occurrences), cooking and politics (both 14 occurrences), and to a lesser extent fashion (3 occurrences).
Most comments on expertise were not supported by arguments.In few cases, arguments were raised: "It's funny how people memed Xavi at first, turns out he knows more about football than our FM fans haha."(football, apr-jun 2022) "It's clear you've never officiated a game before in your life, and if you have, you didn't pay attention in any of the classes."(football, aprjun 2022) "I assume that you either have really bad knives or really bad knife skills because that makes no sense."(cooking, apr-jun 2022) expertise/lack of expertise The first comment points at the expertise of a professional football manager ("Xavi") whilst criticizing the expertise of forum participants as deriving from playing a video game ("Football Manager").The second comment calls into question the expertise of a participant by questioning their active participation with football.The third post criticized someone's ability to question the usefulness of a particular tool to the practice.
When supported by arguments, judgments of expertise refer to particular things a person should've experienced, participated in, or known before being justified to make their respective statement.These experiences are practice-specific as they indicate that participants have committed themselves to the demands and rules that are particularly required in the practice.In turn, these experiences offer a normative rightness to help and guide others on these topics (Donozo, 2014).

| Discussion
Study One analyzed 16 discussions threads, each embedded in four different practices, on their judgments of 21 information qualities.The observations were subsequently analyzed and interpreted according to practice theory.The findings indicate that, with a range of 20-46 judgments per thread, quality and credibility discussions were common amongst all threads.Differences between topics were found on the attributes judged and valence of judgments (see Figure 1).Overall, with 73.4% quality judgments and 26.6% credibility judgments, participants were more likely to discuss quality claims than to (dis) credit authors.And, with 38.9% positive and 61.1% negative judgments, participants also somewhat regularly supported others' statements.These findings extended to the potentially more sensitive topics of football and politics.They contrast with discussions specifically selected for debates, where author credibility and negative judgments dominated 72.8% and 93.3% in Savolainen (2011).Instead, for discussions set within a practice, quality judgments prevail over author credibility judgments and are overall more positive.
Practice theory offered a particularly suitable conceptual framework to analyze and interpret the observations.Even though the surveyed threads discussed vastly different topics and were embedded in different practices, the analyses through practice theory revealed commonalities between them.Practice-specific experiences provided credentials to confirm or challenge accuracy, while practice-specific traditions, identifiable events, concepts, and narratives collectively defined the boundaries of a comprehensive (or narrow) discourse.Additionally, practice-specific standards of excellence informed judgments about the usefulness (or uselessness) of information, tools, and actions within each practice.These and similar commonalities highlight the value of practice theory.Practice theory explains how qualities and quality discussions are settled with reference to the standards and traditions established within a practice.
The findings indicate variations in the prevalence of certain qualities across different practices and topics (see Figure 1).Notably, discussions about fashion rarely involved judgments of expertise.A possible explanation is that expertise in fashion primarily stems from personal experiences with specific clothing styles, rather than extensive knowledge of the practice.Additionally, the assessment of usefulness was not frequently observed in political discussions.Given that the analyzed forum threads focused on the actions and accomplishments of established practitioners, it is likely that participants were more observers than active practitioners in the political practice.This suggests that active engagement in the practice is necessary for making comments about usefulness, which these discussion threads did not demand.Such disparities between topics underscore the significance of specific yet currently mostly uncharted preconditions necessary for judging certain qualities.
The findings overall support the proposition that qualities have inter-subjective validity.In particular the many discussions about (a lack of) quality show that people can and do argue about quality, instead of considering qualities as subjective judgments or discrediting the arguer.This suggests that, at least within a practice, quality judgments can be regarded as claims, not feelings, that can be defended and debated and are applicable to other people.This also applies to the quality of usefulness, which is generally considered of a more subjective nature (Vakkari, 2020).These findings offer an initial reinterpretation of the low inter-rater agreement values found for qualities (Arazy & Kopak, 2011;Kakol et al., 2013;Mao et al., 2016).Such disagreements are, in our observations, met by arguments and contestation from interlocutors.The observed arguments show that, within a practice, contested quality claims can be solved by referencing the greater standards of excellence and knowledge common to the practice.These observations contribute to the proposition that, within a practice, it can be objectively right or wrong to call something good.

| STUDY TWO
Whereas Study One showed the existence of quality claims and disagreements in discussions embedded within practices, Study Two examines the extent to and conditions under which people are willing to accept a quality judgment.Whether qualities can be agreed upon depends on whether people consider qualities as normative claims about information resources.As a normative claim, the claimed quality is to be universally right or wrong.This contrasts with a relativist claim, which is bound to some individual, group, place, or time, and with an expressivist claim, which relies solely on the individual making the judgment.Depending on their normative stance to information qualities, people consider quality claims to be valid, not only for the one who judges but also for others.
The inter-subjectivity of qualities has been studied in experimental philosophy, for example on the objectivity of morals, conventions, and ordinary facts (Nichols, 2004) as well as on the inter-subjectivity of aesthetic judgments (Cova, 2018;Cova & Pain, 2012).Results indicate that most people employ apparently normativist moral intuitions, but that these intuitions are bounded by culture.The more distant the culture, the more likely people employ relativist beliefs (Nichols, 2004).On the other hand, results indicate that most people do not consider aesthetic judgments to be normative, they rather consider them to be either relativist or expressivist (Cova, 2018;Cova & Pain, 2012).These findings overall indicate the importance of contextual conditions to the intersubjectivity of judgments, but also suggest that people's stance depends strongly on the type of judgment they consider.Whether and how these findings translate to information qualities, however, is unclear.
Findings on information seeking and use provide possible insights into the normativity of quality judgments.In particular the criteria users consider during seeking indicate which qualities they regularly judge and in which context they do so.For example, convergent and divergent information behaviors are associated with distinct goals and criteria for the information sought.Whereas convergent behavior is directed at resolving a need and furthering a task, divergent behavior is directed at intrinsic epistemic or hedonic rewards (Björneborn, 2008;Borlund & Dreier, 2014;Van der Sluis et al., 2014;Xu, 2007).This suggests assessors are more entitled to judge qualities such as usefulness given convergent, goal-directed behaviors.Similarly, users judge the expertise and authority of a source through, amongst others, a source's ranking and reputation (Fogg et al., 2001;Kammerer & Gerjets, 2014).In particular a website that is linked to a well-known and reputable organization has a strong influence on the perception of source expertise (Fogg et al., 2001;Kammerer & Gerjets, 2014).These and similar heuristics possibly contextualize quality judgments.
Study Two investigates whether and when information quality judgments can be regarded as normative claims, similar to studies on the normativity of moral and aesthetic judgments.Specifically, we investigate conditions of quality judgments that promote their intersubjective validity.We expect that the inter-subjective validity of information qualities lays somewhere in between that of moral and aesthetic judgments.Information qualities are seemingly hard to agree upon in terms of inter-rater agreement, suggesting an expressivist stance, yet are seemingly considered as being true to other people in Study One, suggesting a normativist stance.We additionally expect that, with the right set of boundaries and conditions, the inter-subjective validity of information qualities can increase above the somewhat meager levels of inter-rater agreement typically observed.
To investigate these hypotheses, Study Two sets up a vignette study in which participants evaluate the intersubjective validity of quality judgments.The vignettes describe a situation in which two interlocutors disagree about a quality judgment and query participants' normative stance toward the described disagreement.Participants indicate whether one of them is right (normative), both are right (relativist), or neither are right (expressivist) (Cova, 2018;Cova & Pain, 2012;Nichols, 2004).Judgments pertain to the quality of an information resource and are set within one of four practices analyzed in Study One.The vignettes additionally vary on information behavior, source reliability, assessor expertise, and the quality judged.The use of vignettes enables the controlled study of a variety of situations that would be difficult or impossible to study through observation or experimentation.Even though they offer an abstraction from the situations encountered in Study One, they promote more reflection than would otherwise be the case through attitude prompts (Finch, 1987).
Study Two serves to benchmark the inter-subjective validity of information qualities as well as explore factors that might enhance it using 2128 evaluations of 96 vignettes.The goal is to quantify the importance of these factors and the range of inter-subjective validity they allow for.The goal is not to make general claims about laypeople's normative stance toward information qualities vis-à-vis their stance toward moral or aesthetic judgments.Neither is it to give a comprehensive analysis of possible conditions conducive to such a stance.Rather, we continue on the findings from Study One to scale the evaluation of different conditions and evaluate their influence on the perceived intersubjective validity of information qualities.

| Methodology
In Study Two, we adapted a methodology originally developed by Cova and Pain (2012) for evaluating the inter-subjective validity of judgments, adapted for assessments of information qualities.The primary approach involved the use of a series of vignettes, each describing a disagreement related to information quality.An example of a vignette used is: Mary is at a political meeting with some friends and talks about an article she read in a journal of political sciences.The article described the political views of the candidate who organizes the meeting.Mary claims that the article is accurate but her friend Sarah, who has a major in political science and works as a political journalist, disagrees.
In it, two interlocutors (Mary and Sarah) disagree over a quality assessment made by an assessor (Mary).Vignettes were varied along 4 practices, 2 types of behavior, 3 quality labels, 2 levels of source reliability, and 2 levels of expertise, giving 96 vignettes.These variations represent various cues that are expected to influence whether the discussed quality should be considered agreeable, such as knowing that one of the interlocutors has relevant expertise.
The study included 36 participants (17 males and 19 females) who evaluated a total of 58.5 vignettes on average.These participants had diverse socio-demographic backgrounds and were located in North-Western Europe, specifically Denmark (25 participants), France (9 participants), and England (2 participants).The experiment was administered via an online platform (Gorilla.sc),and participants assessed multiple vignettes in a within-subjects design.The order of vignettes was randomized, and participants had the option to exit the experiment after evaluating 48 vignettes.After reading each vignette, participants were asked to assess the presented disagreement using a four-option instrument following Cova and Pain (2012), Cova (2018), andNichols (2004): According to you: 1.One of them is right and the other is not.2. Both are right.3.Both are wrong.4. Neither is right or wrong.It makes no sense to speak in terms of correctness in this situation.
A comprehensive description of the full methodology, including vignette variations and the experiment setup, can be found in Supplemental Material Section S2.This methodological approach allowed for the systematic evaluation of various conditions and their possible influence on the inter-subjectivity of quality claims.

| Results and analyses
The data were analyzed using a generalized logit mixed-effects model.Mixed effects models are ideal for within-subjects measurements with multiple trials per participant.They allow to separate fixed effects of interest from random effects like unexplained differences between participants.The model included practice, information behavior, source, judgment, and expertise as fixed effects.Participant and trial (presentation order) were initially included as random effects.The latter, order of presentation, had no influence on the model (an effect of 0.00 as indicated by its odds ratio) and was subsequently dropped to resolve near-singularity during modeling.The resulting model is described in Table 1.
Table 1 presents estimates (betas) in the form of odds ratios.Odds ratios indicate the expected change in outcome value for a given predictor relative to a reference value.The reference value adopted for practice is "cooking," for information behavior is "convergent," for source is "reliable," for judgment is "useful," and for expertise is "low."For example, high expertise approximately doubled the expected inter-subjectivity with a factor of 2.06 ( p < 0.001) as contrasted to low assessor expertise.No significant changes in inter-subjective validity were found for the different practices or for source reliability.A small, negative influence of À21% (p < 0.05) was found in expected inter-subjective validity for divergent information behavior.Other significant effects of judged quality and high expertise furthermore appear to be strong, though with some possible variation indicated by their confidence intervals.
Post-hoc Tukey contrasts confirmed statistically significant pairwise differences between all three qualities (À7:90 < z < À 4:12, p < 0.001).Pairwise comparisons for practice confirmed a lack of differences between practices (À1:81 < z < 0:10, 0.27 < p < 1.00).The intra-class correlation (ICC) furthermore tells the proportion of total variance that is explained by differences between participants.At ICC = 0.32, this indicates that about onethird of variance can be attributed to a baseline normative stance that participants might have.
Figure 2 provides a graphical illustration of the model.It shows the estimated means for different combinations of information behavior, expertise, and quality label.The figure demonstrates the additive effects of these predictors.It reveals that judgments of comprehensiveness peak at 60.65%, while judgments of accuracy rank third highest at 48.69% under the appropriate conditions.However, in the absence of these conditions, judgments of comprehensiveness decrease to 37.13% and of accuracy to 26.67%.Judgments of usefulness rank overall lowest, ranging between 17.97% and 36.38%inter-subjectivity validity depending on the conditions set.These additive effects are estimated with reasonable certainty as indicated by their corresponding confidence intervals.Note: The top part of the table shows predictors with their estimated odds ratios (Est.), confidence interval (CI), and p value.The bottom part shows the random effects of σ 2 or error variance, assumed at π 2=3 , and of τ 00 participant or participant variance.The model was estimated using the lme4 package (Bates et al., 2015) in R. Based on 2128 observations from 36 participants.Intraclass correlation ICC = 0.32.

| Discussion
Study Two evaluated whether and when users are willing to accept a quality judgment.It estimated the intersubjective validity between 17.97% and 60.65%.This provides an initial benchmark of the inter-subjective validity of information qualities.It furthermore showed that the perceived validity is dependent on the quality judged and the expertise and goals of the judges.The lower end of estimates is in line with findings on the inter-rater agreement of information qualities (Kappa 6%-32%).The higher end, however, shows that users are an estimated 1.63-2.02times more likely to accept a quality judgment given the right conditions.This can be regarded as a considerably high level of perceived inter-subjective validity when compared against earlier reports of inter-rater agreement, especially considering the rudimentary information offered by the vignettes.
The resulting order of qualities (see Figure 2) can be interpreted using insights from Study One.Comprehensiveness was ranked highest, presumably because it relies on standards and narratives that form the core of a practice.Accuracy was ranked secondly, likely because accuracy claims are defended with various sources of knowledge, including personal experience with and common knowledge of a practice.Some of these sources of knowledge might be contested to a higher degree than the standards and narratives of a practice.Usefulness was ranked lowest, even though in Study One standards of excellence were appealed to in judgments of usefulness and these standards can be considered core to a practice.A likely explanation for this discrepancy is that, in order to judge usefulness, interlocutors need to actively participate in the practice.The same was observed in Study One, where usefulness was mainly a discussion when forum participants were actively involved in the practice.
Two of the three conditions manipulated showed a significant influence on perceived inter-subjectivity.Their significance embeds the question of inter-subjective validity within well-known findings on information seeking and use.Only source reliability, which was expected to support a judgment, did not show any significant effect.This suggests that participants solely evaluated the context to the presented judgments.They relied on the interlocutors rather than trying to form their own judgment heuristically.This finding is in line with practice theory which argues that people trust virtuous others in assessing what to belief (Donozo, 2014).The findings furthermore suggest that people bring a default normative stance to the vignettes with about onethird of variance attributable to individual differences.The importance of individual differences indicates that, even though a specific sample from the population was reached, study participants nevertheless did not share a default normative stance.It is likely that additional factors related to the interlocutors or to the study participants, such as their epistemic beliefs (Kammerer et al., 2013;Kammerer & Gerjets, 2012), can further explain perceived inter-subjectivity.
Surprisingly, no differences were found between practices in Study Two.This contrasts with the findings of Study One, which showed that different practices judged Means and confidence intervals were estimated using the effects package in R (Fox, 2003) based on the generalized mixed effects model in Table 1.Only significant predictors were varied, other factors were held constant.
different qualities and that judgments were supported with knowledge common to that practice.A possible explanation is that the participants in Study Two were not part of the respective practices themselves, they were rather reflecting on the validity of a judgment set within a practice.This likely left them unaware of practice-specific considerations that participants in Study One experienced.
The findings overall support the value of vignettes as a method for studying the inter-subjective validity of information qualities.Although the vignettes used in Study Two may lack the depth and immersion experienced by participants in Study One, they effectively prompt participants to imagine and evaluate various situations.This approach contrasts with qualitative studies, such as Study One, which provide a (quantifiable) description of observed quality conversations but lack a structured manipulation of the conditions influencing those conversations.Similarly, studies on inter-rater agreement measure the distance between raters but not their ability or willingness to accept each other's ratings.Despite vignettes offering simplified representations of reality, they excel in their possibility to quantify normative questions.Through the introduction of vignettes, we were able to systematically scale a range of conditions and quantify their impact on perceived inter-subjectivity.

| GENERAL DISCUSSION
Through two studies we addressed the question of whether and when information qualities have intersubjective validity.Study One showed that, in online discussions set within practices, quality claims were made and contested.Study Two showed that, when observers assess disagreements between interlocutors, the perceived inter-subjectivity depends on a specific set of conditions.Both studies indicate that quality claims can be considered normative-that is, applying to others-as long as certain conditions are met.This combination of two studies-one qualitative, observational and one quantitative, relational-resulted in a comprehensive picture of the inter-subjectivity of information qualities.It offered initial empirical results on a question that has, hitherto, received limited empirical scrutiny.
The present studies are the first to interpret quality conversations from the perspective of practices as well as to benchmark and examine the inter-subjective validity of information qualities.There are some notable differences with the approach taken in the present studies in comparison to previous studies on information quality.Previous observational studies typically surveyed quality conversations set within debates rather than practices (Savolainen, 2011(Savolainen, , 2023)).By introducing practice theory in interpreting quality conversations we showed that qualities are defined and defended with reference to the standards and traditions established within a practice.Previous relational studies typically evaluated inter-rater agreement without considering raters' willingness to accept each others' ratings.By introducing vignettes as a method, we set an initial, preliminary benchmark of the inter-subjective validity and confirmed boundary conditions to it.The upper bound of 60.65% can be regarded as a substantial gain on the typical inter-rater agreement of 6%-32% (Arazy & Kopak, 2011;Kakol et al., 2013;Mao et al., 2016).Taken together, these results confirm the value of practice as a concept and vignettes as a method to the study of information qualities and their acceptance.
Even though both studies are complementary, with one observing realistic situations and the other structurally manipulating a range of conditions, they are both limited in the number of situations and type of conditions that could be studied.Notably, neither study elucidated the influence of argumentation on the acceptance of claims.Even though Study One did observe and interpret arguments supporting quality (dis)agreements, no quantified analysis could be made on the role of argumentation in the acceptance of these claims.Reasoning and argumentation likely can contribute to trust in the assessment and the assessor, but can also complicate matters beyond general understandability (Kizilcec, 2016).Similarly, traditional markers of expertise-like a title or an education as used in Study Two-are not equally trusted across different thought communities (Nguyen, 2020).In online environments they are rather surpassed by gamified indexes that reference online accomplishments instead (Mößner & Kitcher, 2017).It remains an open question how arguments, markers, and trust influence the acceptance of quality claims, especially when quality discussions cross the boundaries of thought communities (Nguyen, 2020).
Whereas the results of Study One suggest that different practices emphasize different qualities, Study Two did not show any influence of practice on perceived inter-subjectivity.One possible explanation is that information qualities are important irrespective of practicesthey can and should be upheld across practices.Any differences found in Study One might rather be attributable to how vulnerable and exposed these qualities are in a practice, which increases the likelihood of debates about them.This aligns with domain analysis studies showing specific academic fields focus on specific criteria.Historians, for example, prefer primary over secondary sources and relics over narratives as evidence when judging the quality of documents (Hjørland, 2009b).This suggests that qualities such as correctness and comprehensiveness have domain-specific specifications which make them more difficult to achieve and accordingly more contested in some domains than in others, but nonetheless sought after across domains.
In comparison to domains of thought, practices offer an alternative set of boundaries within which qualities are defined and discussed.Domains are typically (though not always; Morado Nascimento & Marteleto, 2008) specified in pragmatic terms, based on the set of information systems, resources, and processes shared by a community with common concerns, viewpoints, and terminology (Hjørland, 2009a).Practices, at least in the MacIntyrean sense, are specified in moral terms, by a socially established, shared activity within which moral agents try to achieve a virtuous life through accepting and advancing the norms and standards associated with it.These specifications overlap, but might lead to slightly different boundaries.For example, a political party might form a discourse community, but be part of a larger practice of politics and policy making.The quality of such discourse is not only defined within the discourse community, but also within the practice the discourse pertains to.Following practice theory, quality discussions can run across various thought communities involved in a practice, which will lead to broader expectations about who should join a quality conversation.
The boundaries of a practice are, however, not determined theoretically.Rather, as we saw in Study One, the applicability of standards as well as the recognition of expertise is a recurrent theme of conversation.These findings overall support a conversational perspective on information quality, where quality is "determined publicly and socially in shared forums" (Mai, 2013b, p. 685).Through discussions the boundaries of practices and validity of quality claims are sought.This also allows us to reformulate the subjectivity problem of information qualities.As the present findings show, qualities are no longer subjective or inscrutable when considered through conversations and between interlocutors.Conversations offer the conditions needed to raise the perceived intersubjectivity and therewith expand the scope of qualities beyond individual, like-minded users.This makes the inter-subjective validity of information qualities an empirical question, about conversations, claims, and their agreeableness.
Our findings suggest that solving the subjectivity problem of information qualities necessitates transparency regarding their supporting claims.Currently, ranking systems conceal the reasoning that went into a particular ranking of information resources.They offer a ranked list to users with little transparency on the considerations that went into that list.The reasoning of experts creating relevance or quality assessments as ground truth for information retrieval (Google, 2023;Voorhees, 2002) or of users giving explicit and implicit feedback for recommender and tagging systems (Aggarwal, 2016;Rafferty, 2018) remains concealed.This lack of transparency limits the possibility of reaching an agreement about how a resource should be indexed (Feinberg, 2006;Mai, 2013b), leading to limited levels of inter-rater agreement (Voorhees, 2002), a proliferation of tags (Rafferty, 2018), and heightened levels of personalization (Robertson et al., 2023).This makes the intersubjective validity of information qualities also a design question, about creating environments which make quality assessment a shared goal and promote or even enforce some form of conversation, alignment and, possibly, agreement (Van der Sluis, 2022).
Raising the awareness and acceptance of information qualities has the potential to substantially impact users' interactions with information resources.It can direct users' engagement toward items that may otherwise be overlooked (Robertson et al., 2023), as users may accept that a new item has certain qualities even when other, less similar individuals claim it to be so.Moreover, it can expand users' engagement when there is accuracy, completeness, or other quality concerns.Individuals actively select different items (Mena, 2019), consult multiple sources, and explore more evidence (Tormala, 2016), all driven by their awareness of potential quality concerns and a desire to mitigate uncertainties in their knowledge (Kahan, 2015).This ties the inter-subjective validity of information qualities to contemporary issues surrounding individuals' overconfidence in their own knowledge and beliefs (Kidd & Birhane, 2023;Robertson et al., 2023).Rather, quality discussions and assessments communicate an inherent uncertainty about knowledge and knowing.
Current information retrieval and filtering systems are by and large-with the exception of multi-criteria implementations (Adomavicius & Kwon, 2015)-agnostic to the criteria that align users and the qualities that align resources.Our findings indicate that this alignment can increase with transparency about qualities and their assessments.Users can acknowledge and accept qualities and the criteria they uphold given the right set of conditions, such as the expertise of the assessor or the situation of the assessment.This means that qualities are neither inscrutable nor subjective, but rather a matter of the right set of conditions and the extent of their inter-subjective validity.Through the contribution of a novel methodology, theoretical framing, and initial empirical results, we show that it is possible to study this inter-subjectivity of qualities and, at least partially, unveil the conditions that make resources and users align.

F
I G U R E 2 Estimated marginal means of inter-subjective validity with confidence intervals are shown in the top graph.The means are marginal on the combinations of factors shown in the bottom part.Combinations are sorted and ranked according to their estimated means.