Naïve and informed views on the nature of scientific inquiry in large‐scale assessments: Two sides of the same coin or different currencies?

Funding information Deutsche Forschungsgemeinschaft, Grant/Award Number: NE 2105/1-1 Abstract Many models in the field of epistemic cognition conceptualize students' views as being on a continuum between the poles of naïve and informed views. Against this background, the aim of the present study was to find out whether views on the nature of scientific inquiry (NOSI views) should be conceptualized and quantitatively assessed in a more multiplistic manner, considering naïve and informed views in their own, separate dimensions. Based on a competence model defining three inquiry methods, we developed a Likert-scaled questionnaire containing 10 scales, each assessing one NOSI view. We administered the questionnaire to a sample of 802 students in the lower and upper levels of secondary school. Based on structural equation modeling, the analyses confirmed a 10-dimensional model, distinguishing between each naïve and informed views as the only adequate representation of the data. Latent class analysis and interview data revealed four profiles of NOSI views in the data, which differed with regard to their agreement or disagreement with different naïve and informed views. We interpret these findings as evidence that supports more multiplistic models, with relevance to conceptualizing, measuring, and fostering NOSI views. We derive future directions of nature of

whether views on the nature of scientific inquiry (NOSI views) should be conceptualized and quantitatively assessed in a more multiplistic manner, considering naïve and informed views in their own, separate dimensions. Based on a competence model defining three inquiry methods, we developed a Likert-scaled questionnaire containing 10 scales, each assessing one NOSI view. We administered the questionnaire to a sample of 802 students in the lower and upper levels of secondary school. Based on structural equation modeling, the analyses confirmed a 10-dimensional model, distinguishing between each naïve and informed views as the only adequate representation of the data. Latent class analysis and interview data revealed four profiles of NOSI views in the data, which differed with regard to their agreement or disagreement with different naïve and informed views. We interpret these findings as evidence that supports more multiplistic models, with relevance to conceptualizing, measuring, and fostering NOSI views. We derive future directions of nature of [Correction added on October 11, 2019, after first online publication: The article title was updated to include a question mark.] scientists" (naïve view, negatively pooled item) (p. 1342). Based on the works of Pomeroy (1993), Tsai (2000) administered items like "legitimate scientific ideas sometimes come from dreams and hunches" (empiricist view, negatively pooled item) or "scientists rigorously attempt to eliminate the human perspective from observations." (constructivist view, positively pooled item) (p. 199). Using a similar instrument, Huang, Tsai, and Chang (2005) expressed that "Some items in the PNSS stated from an empiricist-oriented (or non-constructivist-oriented) perspective were scored in a reverse manner" (p. 648). These studies implicitly assume that students, holding informed views, would agree on the positively pooled items and disagree on the latter pooled items, and vice versa. In any case, students would have to decide on a particular statement in either a particular naïve, informed sense or a medium category. The items do not allow students to evaluate both views independently.
In multiple-choice instruments, rather naïve and rather informed views are used to create answer options that students have to decide on. In a modified "Views on Science-Technology-Society" version, Dogan and Abd-El-Khalick (2008) asked students whether scientific "observations made by competent scientists will usually be different if the scientists believe different theories." Students could, inter alia, choose between a naïve answer option, "no, because observations are as exact as possible. This is how science has been able to advance", or an informed option, "yes, because scientists will experiment in different ways and will notice different things" (p. 1108). Ibrahim, Buffler, and Lubben (2009), to give another example, used the Views About Scientific Measurement instrument and asked students to choose an answer option between "nature follows exact laws and scientists discover these laws" and "scientists construct theories to explain what they observe in nature" (p. 251). Again, students would have to decide between different options leading to instruments that are not able to depict the possible multiplistic nature of NOSI views.

| STUDY PURPOSE
On the basis of our previous work in the context of SI, the purpose of this study was to find out whether multidimensionality in NOSI assessment, using Likert-scaled items, should be addressed according to a more fragmented profile of naïve and informed views or on a continuum over different dimensions with naïve and informed views being part of this continuum. Although the latter perspective would correspond to the title metaphor of naïve and informed views being two sides of the same coin, the first perspective would correspond to different currencies.

| Competence-based framework incorporating naïve and informed NOSI views
In order to build upon a theoretical foundation, we used an internationally published competence model (Nehring et al., 2015;Nowak et al., 2013), defining three inquiry methods as a framework for clustering different naïve and informed NOSI views reported in science education literature. The three inquiry methods are "observing as theory-driven activity", "experimenting as manipulation of variables", and "using models as tools for inquiry" which have been confirmed, in a validation study, to structure the competences in the field of inquiry by functioning as partial competences.
"Observing as theory-driven activity" means selecting observation criteria based upon theory, testing a hypothesis using collected data, carrying out theory-driven observations using auxiliary equipment if necessary, or deriving classifications if more than two objects or phenomena were observed (Gropengießer, 2009;Hofstein & Lunetta, 2004). "Experimenting as manipulation of variables," in the sense of the competence model, corresponds to applying the control-of-variables (CVS) strategy (Chen & Klahr, 1999;Schwichow, Croker, Zimmerman, Höffler, & Härtig, 2016) in science experiments. Going beyond observations, this means to identify dependent, independent, and control variables, to actively intervene with research objects, to manipulate variables, and to keep control variables constant "Using models as tools for inquiry" corresponds to using models as research tools by carrying out an investigation with the help of models (Crawford & Cullin, 2005;Krell, Upmeier zu Belzen, & Krüger, 2014;Schwarz et al., 2009;Upmeier zu Belzen & Krüger, 2010). That means formulating hypotheses based on models and carrying out model experiments, drawing conclusions on the underlying original theory by using substitute objects, and testing the model with regard to the data of the underlying original object.
Based on these three partial competences, we identified five rather naïve and five rather informed NOSI views. We chose views that would enable or hinder students to more or less successfully implement the inquiry methods during scientific investigations (Table 1). We are aware that we could have chosen many more views from the huge body of NOSI literature. Due to practical reasons and sample requirements, however, we limited our selection to 10 views that seemed particularly relevant to enabling or hindering students in implementing the inquiry methods from a theoretical point of view.
Following the approaches in the NOS literature, naïve views correspond to a more positivist, unsystematic, or empiricist perspective on science, whereas informed views correspond to a more constructivist, systematic, and relativist perspective on science (for further details, see Abd-El-Khalick, 2012;Demir & Abell, 2010;Deng, Chen, Tsai, & Chai, 2011;Lederman et al., 2014;Osborne, Collins, Ratcliffe, Millar, & Duschl, 2003). What is more, the criterion of choosing the views was that implementing one of the three methods according to the naïve views would decrease the chance of gaining valid results in scientific investigations, whereas following the informed views would increase them.
(6) While observing, scientist try out and see if it works. (7) Scientists discover laws or theories within observations (randomly). They test hypotheses about an original object using models. (5) Scientists use models as tool for communicating their investigations and results.
(10) Scientist use models as an exact copy of reality.
NEHRING 5 | As shown in Table 1, we numbered the views from 1 to 10. In the following, we explain the 10 views, give examples for questionnaire items, and rationalize connections between views and competences. However, views, as operationalized in our study, tend to be more on a meta-level, competences relate to being able to solve concrete problems that occur during scientific investigations. We suppose that views and competences are both interrelated on different levels. On an intraindividual level, the interplay of views and competences might be complex including several processes. When it comes to solve problems that occur during scientific investigations views might contribute to a meta-knowledge about the inquiry methods (Künsting, Wirth, & Paas, 2011). Students holding adequate views might be able to activate and to apply certain these views in order to use them as research strategies making them more successful during scientific investigations. This might help them to implement certain steps that contribute to more expedient problem solving. Moreover, acting according to one's views contributes to a coherent perspective on the own actions, whereas acting in disaccord to views might induce a feeling of cognitive dissonance. Finally, research suggests that students develop views from inquiry activities-especially under the condition of explicit reflection (Khishfe & Abd-El-Khalick, 2002;Vorholzer, von Aufschnaiter, & Boone, 2018). More adequate and less naïve views might, of course, be the result of inquiry-based learning activities in school science and therefore relate on the another. In each case, literature reviews articulate relating views and inquiry competences as one of the major gap in the field of research on inquiry (Rönnebeck, Bernholt, & Ropohl, 2016).
The readers should note, however, that this study is on conceptualizing views on an interindividual sample level. The question of how views and competences play together on an intraindividual (and not only interindividual) level or the question of how to model views and competences qualitatively or quantitively may be subject to future studies (see Section 6). These are the views included in this study: 1. Scientist are guided by ideas and plan observations: This informed view corresponds to a rather systematic and knowledge-based perspective on science with investigations not just following a random procedure. Although, there is no single scientific method, observations are guided by questions (Lederman et al., 2014) and scientist have an idea that guides them during the planning process (Carey, Evans, Honda, Jay, & Unger, 1989). Observing corresponds to a planned behavior and not just to "looking." Example item in our Likert-based questionnaire: When scientists carry out an investigation, they want to find out if their assumption matches the observations or measurements. Connection to the partial competence "observing as theory-driven activity": This view describes the process of the inquiry method of "observing" almost directly. As observing is not just "looking" but a planned and theory-based activity, it is likely that students holding this view are rather able to carry out investigations that require a planned and theory-based approach. They may use this view to solve problems based on a planed behavior and not apply less successful strategies as try-and-error procedures, for example. 2. Scientists interpret observations using theory: This informed view refers to the theory-laden NOS indicating that existing knowledge guides observations and helps to interpret data derived from observations. Observations and inference are distinct (Lederman, 2007) and observations can have meaning for theory. Example item in our Likert-based questionnaire: When scientists carry out an observation or a measurement, a theory helps them to understand the results of an observation or measurement. Connection to the partial competence "observing as theory-driven activity": This view also refers to a particular property of the inquiry method "observing." Students holding this view have understood that scientific knowledge should be used in order to make sense of what can observed on a phenomenological level. If they distinguish between observation and inference, they are more successful in interpreting data from observations or measurement. Holding this view could consequently be a factor supporting students in interpreting the behavior of a phenomenon with regard to scientific theories during investigations. 3. Scientists change only one variable at a time for valid experiments: This view corresponds to the widely researched CVS (Chen & Klahr, 1999;Schwichow, Croker, Zimmerman, Höffler, & Härtig, 2016) where isolating and controlling variables leads to valid results within experiments. Example item in our Likert-based questionnaire: When scientists carry out experiments to find out whether a certain variable affects a property, they change only one variable at a time. Connection to the partial competence "Experimenting as manipulation of variables": This view is very close to a concrete strategy that can be applied for carrying out controlled experiments. It is evident that an understanding of the CVS should help students to identify, vary, and control variables purposefully. 4. Scientists carry out investigations on models: This informed view on modeling procedures corresponds to the research-based aspects of modeling practice (Grosslight, Unger, Jay, & Smith, 1991;Upmeier zu Belzen & Krüger, 2010). Students holding this view would agree that scientists carry out observations and experiments on models in order to explain and to predict properties of an original phenomenon. Example item in our Likert-based questionnaire: When scientists work with a model, they test an assumption with the help of the model. Connection to the partial competence "Using models as tools for inquiry": An understanding that models are used as tools for inquiry is a prerequisite for deriving new knowledge based on models. Students should understand that models are theoretical reconstructions that can be used auxiliary tools for research. Within the model competence as conceptualized by Upmeier zu Belzen and Krüger (2010) or Krell, Upmeier zu Belzen, and Krüger (2014), p. 114), this view refers to "to the 'predictive nature of models' which reflects the scientific perspective of models as research tools". It goes hand in hand with a higher model competence indicating that students might be more able to perform investigations based on models. 5. Scientists use models as tools for communicating their investigations and results: This informed view corresponds to the communicative aspects of modeling practice (Schwarz et al., 2009). Scientists use models as tool for presenting ideas or results. Example item in our Likert-based questionnaire: When scientists work with a model, they explain an original to other scientists. Connection to the partial competence "Using models as tools for inquiry": This view corresponds to the second important perspective of models and model competence (Krell, Upmeier zu Belzen, & Krüger, 2014;Upmeier zu Belzen & Krüger, 2010). Besides the perspective of models as tools for inquiry, scientists use models as tools for communicating. Although this view is not conceptualized as being part of higher levels of model competence, it is part of a more holistic understanding of the role models play in science. This view might go hand in hand with a more holistic understanding of the role models play in science so that it might be easier for students holding this view to work with models in the context of inquiry. 6. While observing, scientist try out and see if it works: This naïve view goes back to the study of Carey, Evans, Honda, Jay, and Unger (1989). It includes views on scientists who "tries something to see if it 'works' or 'reacts' or to 'find out about the thing they're experimenting on'" (p. 523). Carrying out observations is seen as a random procedure. Example item in our Likertbased questionnaire: When scientists carry out an investigation, they start the investigation without preparation. Connection to the partial competence "observing as theory-driven activity": Students holding this view might not proceed in a very planned manner. If they apply a try-out-and-see strategy, it is likely that they will not be able to solve problems during investigations as successfully as students that would be aware of a planned approach. 7. Scientists discover laws or theories within observations (randomly): This naïve view is contrast with the theory-laden nature of observations (Chen, She, Chou, Tsai, & Chiu, 2013) and corresponds to an empiricist view on SI. Students holding this view would think that scientists carry out their investigations without a theory or an idea behind it so that scientists would simply discover theories within observations. Example item in our Likert-based questionnaire: When scientists carry out an investigation, they discover the result without a theory. Connection to the partial competence "observing as theory-driven activity": Students holding this view might not proceed in a very planned manner. If they apply a try-out-and-see strategy, it is likely that they will not be able to solve problems during investigations as successfully as students that would be aware of a planned approach. 8. Observations that do not confirm a hypothesis are useless: This view corresponds to the welldocumented confirmation bias where students seek to confirm a hypothesis even under conflicting findings or think that the purpose of investigations is to seek for confirmation (Eberbach & Crowley, 2009;Poletiek, 1996). Example item in our Likert-based questionnaire: When scientists carry out an investigation, the investigation is worthless if it contradicts their assumptions. Connection to the partial competence "observing as theory-driven activity": Also, this view might hinder students to carry out purposeful investigations. If students proceed accordingly, they apply the confirmation bias and only seek for observations that confirm their existing knowledge. This might hinder them to test hypothesis or to generate further hypothesis that go beyond their existing knowledge (Chinn & Brewer, 1993). 9. Scientists change many variables at a time for valid experiments: This naïve view corresponds to experimenting as an activity in which the scientist, for example, changes many variables at once to create a certain effect, for example. Often, this goes hand in hand with a rather engineering mode of experimentation, rather than a science mode where the interplay between observations and ideas is central (Schauble, Klopfer, & Raghavan, 1991). Example item in our Likertbased questionnaire: When scientists carry out experiments to find out whether a certain variable affects a property, they change all the variables at once that could have an effect at once. Connection to the partial competence "Experimenting as manipulation of variables": This view corresponds to approaches of confounding experiments and changing too many variables at a time. Students that would proceed accordingly would not be able to design valid experiments (Kuhn & Dean, 2005). 10. Scientist use models as an exact copy of reality: This naïve view corresponds to the welldocumented conception of models being an enlarged or a reduced copy of an original. Accordingly, models should match the original best in all properties (Grosslight, Unger, Jay, & Smith, 1991;Grünkorn, Upmeier zu Belzen, & Krüger, 2014). Example item in our Likert-based questionnaire: When scientists work with a model, the model displays all the characteristics of the original. Connection to the partial competence "Using models as tools for inquiry": Thinking of models as an exact copy the reality is associated with the lowest levels of model competence (Krell, Upmeier zu Belzen, & Krüger, 2014;Upmeier zu Belzen & Krüger, 2010). Although this view might not be associated with a lower performance while working on one concrete model, students, holding this view, might be less successful when it comes to change a model with regard to underlying theoretical assumptions. Also, choosing a suitable model out of several candidate models on the same phenomenon will not be that easy for these students. This might reduce competences as a situation that requires procedures that contradict this view.

| Deriving unidimensional and multidimensional candidate models
In the following, we stick to the two-step approach supposed by Burnham and Anderson (2002) of formulating of a set of candidate models and selecting a model to be used in making inferences. This approach was also applied by Neumann, Neumann, and Nehm (2011) and Harrison, Duncan Seraphin, Philippoff, Vallin, and Brandon (2015) in the context of NOS and NOSI views. In the present study, each model represents different implicit assumptions about the interrelatedness of naïve and informed NOSI views, the understanding of NOSI, and to question whether it is appropriate to picture a simultaneous presence of naïve and informed views in assessment. We derived five models (for an overview, see Table 2). Note that the metaphorical description (finance metaphor) of the model assumptions refers to a Likert-scaled assessment assigning a "more or less" to students. We are aware that understanding is a much more complex process than shifting between two poles or having more or less money, for example. We think, however, that this is a helpful tool for communicating the essence of the model. The models T A B L E 2 Overview on the candidate models being tested in this study

Model Dimensionality
Underlying assumption about the interrelatedness of views and the understanding of NOSI Description within the metaphor 1 Unidimensional • No particular structure between naïve and informed views. • Understanding as shifting on a continuum between naïve and informed views.
Understanding as having more or less money 2 Two dimensional (naïve and informed views) • Distinguishing between naïve and informed views. • Understanding as promoted informed and decreased naïve views.
Naïve and informed views as two sides of the same coin 3 Three dimensional (inquiry methods) • Distinguishing between naïve and informed views. • Understanding as promoted informed and decreased naïve views.
Understanding as more or less money in three currencies 4 Six dimensional (naïve and informed views within inquiry methods) • Distinguishing between naïve and informed views within inquiry methods. • Understanding as promoted informed and decreased naïve views within inquiry methods.
Naïve and informed views as different sides of a coin in three currencies 5 Ten dimensional (each view corresponds to one dimension) • Distinguishing between 10 naïve and informed views. derived and tested in this study are shown in Table 2. This table shows the underlying assumption about the interrelatedness of views and the understanding of NOSI as well as the implication on the metaphor appearing in the title.

| Developing the questionnaire
In order to gain data on students' views, we developed a 4-point Likert-scaled questionnaire (4 = "strongly agree," 3 = "agree," 2 = "disagree," 1 = "strongly disagree") containing 47 items in total. By using a 4-point scale, the students are forced to decide on agreeing or disagreeing, there is no neutral position that is harder to interpret. Nine views were assessed using five-item scales each. Due to a very good performance in the pre-studies (see below), one view ("scientists change only one variable at a time for valid experiments") was assessed using a three-item short scale. Each scale was laid out within a common table, leading to a block of five items or three items. The students could express their agreement or disagreement on a 4-point scale. Short texts explaining the characteristics of the inquiry methods to the students were presented before three-or four-item blocks each.
In order to contextualize this information, a school example from chemistry or biology was given. This was due to fact that we applied and validated the competence model in the contexts of these two disciplines. The questionnaires contained either the chemistry or the biology examples and were assigned randomly to the students. Figure 1 shows this example for the inquiry method "experimenting." In order to distribute possible sequence effects across the scales, we created eight booklets containing different sequences of inquiry methods, and different sequences of scales on naïve views, following scales on adequate views and the other way around.

| Qualitative and quantitative prestudy
The questionnaire was developed in two draft versions and evaluated in two preliminary studies, and was also judged by linguists and science educators. The first draft version contained 57 Likert items in total with 31 items going back to two informed views (Views 2 and 4) and 26 items going back to How do scientists carry out an experiment to find out if a property is affected by a certain variable?
Description: Perhaps you have already learned that properties can change because they are influenced by certain quantities. Whether a size influences a property or not can be determined by experiments. For example, in chemistry, pH can affect the colour of a solution.
This can be displayed in a diagram: influences The pH the colour of a solution.

Variable Property
F I G U R E 1 Example text for communicating the characteristics of the inquiry method experimenting containing the chemistry example two naïve views (Views 6 and 4). The wording of the items was kept as close to the students' wording as possible. Teachers and science educators were asked to evaluate the wording. The basic structure was the same as described above.
For the quantitative prestudy, the first draft questionnaire was administered in a sample of n = 135 students from the 9th and 10th grades in two secondary schools in the federal states of Berlin and Brandenburg in 2015. It was evaluated with regard to internal consistency, to item discrimination, and factor loadings in an exploratory factor analysis (EFA). The results showed 43 items having an item discrimination of r it > .20. The internal consistency of a scale composed of these items was at Cronbach's α = .85. EFA using the method of parallel analysis (Field, 2009;O'Connor, 2000) indicated a four-factor solution with loadings that ranged, after promax rotation, between .33 < λ i < .82. Based on these results, we used these 43 items for further item development in a second draft version and constructed, on this basis, new items for further views.
In order to ensure that the wording in the items met the understanding of the students in our sample, the second draft version was evaluated in a qualitative study using think aloud protocols and semistructured interviews. Four students from the 8th grade, two students from the 10th and two students from the 11th grades (eight students in total) were asked to fill out the questionnaire in a oneto-one situation with the interviewer. While filling out the questionnaire students were asked to think aloud in order to gain data about their reasoning while choosing answer options. The interview sessions were carried out in September 2016. All students were interviewed in an individual setting. After a block of 5 or 10 items each, the students were explicitly asked about their understanding of central notions such as "model," "original," "property," "theory," "assumption," "hypotheses," or "variable" and justified the choice of their decisions. The students were also asked to reword the introductory texts. The interviews were analyzed with special regard to cognitive validity (Thelk & Hoole, 2006). Therefore, we tried to gain data on whether the reasoning processes correspond to the construct the items pretend to assess-which in our case were NOSI views. This means that for each item, we analyzed whether the agreement or disagreement on the Likert-scale was in line with the reasoning processes or the justifications made by the students. In order to perform the analysis, we used the software ELAN that is freely available and supplied by the Max Planck Institute for Psycholinguistics (Hellwig, 2018). Using a simple dichotomous two-step classification system, we classified whether a student agreed or disagreed on a certain item and whether the agreement or disagreement involved reasoning that was associated with the view the item is supposed to assess (process validity given) or whether the reasoning showed aspects that had nothing to do with the view, such us misinterpretation of wording, for example (process validity not given). Notions that were not understood by students or that showed potential to bias the response behavior were replaced. Items that showed hints for lowering validity were reworded after an interview so that each student read a slightly adapted version of the questionnaire. The interview questions concerning central notions and introductory texts, however, stayed the same. After a final iteration, the questionnaire was evaluated by a team of linguists of the Leibniz Universität Hannover that are specialized in German as second language (see Acknowledgement section). Their analysis focused on barriers for understanding for students with a migration background. In particular, recommendations for simplifying syntax were made and adopted. Both pre-studies suggested the main study could be carried out with an instrument having a comparatively high degree of reliability, objectivity and validity. It should however be noted that this validated version is in the German language and the translations in this study, that readers find in the Supporting Information Appendix, would have to be revalidated if a reader would wanted to use the questionnaire.

| Sample
During the main study, data were collected in secondary schools with the highest educational track ("Gymnasien") in Hannover (Germany) and in Berlin (Germany) in 2017. In total, 802 students coming from the lower and upper secondary levels participated in the study. The students were aged 13 and 18. 55% of the students were girls. Table 3 gives an overview of the sample.
The sample is a convenience sample that comprises students in schools. These students were involved in school science learning during the period of data collection. Consequently, they were part of a population of interest. As for medical or psychological experiments, convenience samples allow testing hypothesis on correlations or identifying types of persons, which is the purpose of the study, but they do not allow conclusions on a population level, which was not the purpose of this study. Each student participated voluntarily and anonymously in the study. The study was carried out according to the legal requirements of the federal states of state of authors and Berlin. The questionnaire was administered in chemistry or biology lessons with the consent and the presence of the chemistry or biology teachers.

| Data analysis
The analysis was carried out in R package lavaan (Rosseel, 2012) in Version 0.6-1.1141). A robust maximum likelihood estimator was used to deal with nonnormal distributions. Missing data were handled with the help of the full maximum likelihood algorithm (Collins, Schafer, & Kam, 2001). According to the recommendations of Little (2013) and Urban and Mayerl (2014), we used measurement models of three items per view in order to avoid biased evaluations of models fit. Three items build up a just identified measurement model and help to avoid arbitrarily improved model fit. We choose the items which had the highest factor loadings in measurement evaluations for every single view. Three reasons justify our three-item approach. First, the items within each scale are quite homogeneously formulated (like if scientists carry out an experiment to find out if a variable affects a property, then, Item no. 1: …they change all the variables that could have an impact once; Item no. 2: …they change all the variables that could have an impact once; Item no. 3: …they change all the variables that could have an impact once). We sure to maintain content validity concerning each view also with three items. Second, the analysis of item discriminations (corrected item-total correlations) for each of the 10 scales revealed an average (calculated as median) discrimination r mdn (items) = .51, confirming a comparatively homogenous functioning of the scale. Third, reliability analysis showed that the McDonald's ω ranges between 0.64 < ω < 0.73 indicating that we have-for a relatively "fuzzy" construct like views-quite homogeneous three-item scales. As we administered Likert items, the probability of having one student marking a "I agree" (= 3) on one item and choosing "I agree" (= 3) on the other items is quite high. As we then used the scale means (and not the scale's sum) for the LCA analysis, we assume to not lose much information about the students' views using this approach. Third, as shown in Section 5, the factor loadings are comparatively in a very good range. The items remained stable in all of the five final model estimations. The readers should note that we carried out the structural equation modeling (SEM)-based analysis also using five items.
The results and implications are in line with the reported results in the following and are part of theSupporting Information Appendix. In order to evaluate the models' fit, we used the comparative fit index (CFI), the root-mean-square error of approximation (RSMEA), and the standardized root-mean-square residual. While a value of RMSEA <0.05 indicates a good model fit, Schreiber, Nora, Stage, Barlow, and King (2006) argue that RMSEA <0.06 and a CFI of >0.90 (or better of >0.95) should be given. We also relied on the χ 2 -difference test, on the Akaike information criterion (AIC), and the sample size-adjusted Bayesian information criterion (BIC). Latent correlations were used for analyzing the correlation structure between naïve and informed views.
Based on the chosen models, we carried out LCA using MPLUS 8 and fitted mixture models (Hickendorff, Edelsbrunner, McMullen, Schneider, & Trezise, 2018) containing one to eight classes. In order to converge these models, we used 10000 random sets of starting values for the initial stage and 20 iterations for the final stage. We compared the models with regard to their theoretical implications and different empirical indices. These are the AIC and BIC values as well as the Lo-Mendel-Rubin adjusted likelihood ratio test (LRT), the parametric bootstrapped likelihood-ratio test (LMR) and the Vuong-Lo-Mendell-Rubin likelihood ratio test (VL-LRT) (Asparouhov & Muthén, 2012;Nylund, Asparouhov, & Muthén, 2007). Additionally, we used the entropy index, which ranges between 0 and 1, as an indicator for quantifying a more certain classification of single individuals with 1 showing better classifications. Based on these indices and theoretical interpretability, we decided on the number of classes. The approach of using scale means, after having checked for reliability, has been carried out in the field of epistemic cognition by Kampa, Neumann, Heitmann, and Kremer (2016) or in the field of conceptual understanding by Schneider and Hardy (2013). Table 4 presents the model fit indices for each of the five models. The results show that only Model 5, including all 10 views as 10 scales, is suitable for fulfilling the mentioned SEM-based fit criteria (see Section 4). It's CFI index is >0.95, the RMSEA is <0.05, and it shows comparatively small BIC and AIC indices. Although, it is the most complex model, which could fit just better due to its complexity, the information criteria, which penalizes more complex models, also indicate the acceptance this model as representing the data most adequately.

| Comparing factor loadings
The standardized factor loadings for each of the five models are presented in Table 5. All loadings are significant on the p < .001 level.
The data show that the factor loadings are higher within the higher dimensional models. Items going back to naïve or informed views can have negative loadings. This occurs in the first and third models. These are the models that comprise naïve and informed views in one latent dimension (or in the metaphor: models with naïve and informed views as being two sides of a coin). This would suggest that naïve items can be indicators for informed views. The loadings, however, are lower on a descriptive level when being used as a negatively pooled indicator in lower dimensional models indicating that these items do not seem to be as adequate indicators in the models which merge naïve and informed views.

| Analyzing latent correlations
According to this analysis, we used Model 5 in order to analyze the latent correlations (Table 6). Table 6 shows highly significant positive correlations within the informed and within the naïve views. These correlations range between from a high and to a moderate level. Confirming to our expectations, informed and naïve views are correlated negatively, to a highly significant level. In order to give a more tangible overview of these results, we calculated medians for these correlations. This reveals that informed views are intercorrelated on a higher level (Mdn = 0.53) than naïve views (Mdn = 0.31) or informed and naïve views (Mdn = −0.30). Although informed views and naïve views go hand in hand to a certain extent, and holding informed views reduces the chances of holding naïve views, the correlations are not that high that holding informed views would automatically mean to not holding naïve views.

| Comparing latent classes in search of profiles of NOSI views
In order to follow-up this interpretation, we tried to identify students' view profiles in a LCA. Table 7 gives an overview of the model fit statistics for the 8 calculated mixture models. Each model contains 10 dimensions according to the results of the SEM-based analysis.
On a descriptive level, the fit indices (log likelihood) and information criteria (AIC and BIC) show the best fit for the model containing eight classes. However, the relative group frequency is comparatively small, and the LMR and VL-LRT indicate that this model does not fit significantly better than the seven-class model does. The same goes for models containing seven, six, and five classes. The indices for Model 4 indicate that this model could fulfill the requirements of a comparatively better fit and a class frequency with a more relevant percentage of students than just 1 or 2 %. With regard to these criteria, we decided to choose this model for depicting profiles between naïve and informed views. Based on mean scores for each of the 10 dimensions, Figure 2 presents theses profiles.  All correlations are significant on the p < .001 level, except the correlations between Views 2 and 10, Views 4 and 10 (p = .002) as well as between Views 3 and 7, 10 and 7 (p = .001). Figure 2 shows that the four profiles differ regarding the extent of agreement or disagreement with naïve and informed views. Although Class 1 students are able to distinguish clearly between naïve and informed NOSI views, Class 4 students seem to consider naïve as well as informed views as equal plausible when it comes to describe a scientists' inquiry. Class 3 students seem to be unsure about the informed views as well as whether scientists try something out and see if it works. In this dimension, they differ compared to Class 2 students. Figure 2 shows that the profiles overlap and do T A B L E 7 Comparison of fit statistics, indices, and likelihood ratio tests for LCA not simply follow the same trend as different levels on a continuum which, again, confirms a certain multiplicity of naïve and informed NOSI views. In Supplementary Information Appendix, we use interview data from the qualitative prestudy in order to provide insights to the reasoning processes that underlie students' response behavior and belonging to a certain profile.
6 | DISCUSSION AND IMPLICATIONS

| Summary
With regard to NOSI approaches conceptualizing views in this field as a continuum between naïve and informed views, we examined whether naïve and informed views should be conceptualized as separate dimensionsat least when it comes to large-scale assessment. The results indicate relatively clearly that treating naïve and informed views as single and separate dimensions it could be justified to. Spoken within the article's title metaphor, naïve and informed views rather correspond somewhat more to different coins in different currencies than to two sides of the same coin. These results of the SEM-based model testing, the correlation and factor loading analysis, the LCA as well as the interview data support this conclusion for our approach. In the following, we will discuss these results using three implications and derive directions for future research.
6.2 | Implication 1: Negatively and positively worded items may decrease construct validity in NOSI assessments and lead to a loss of information Our results implicate that researchers may gain more relevant information on students' views by distinguishing between naïve and informed views within the conception of views and within the conception of their instrument. Studies that do not take these views into account independently of each other may lose relevant information. Using positively and negatively worded items for assigning students to one continuous NOSI scale in a particular dimension may mix up naïve and informed views. We will show this idea using three fictive students as example. One student with an average score of 2 on one single continuous scale could correspond to a student (Number 1) who rather agrees with informed views and strongly agrees with naïve views, to a student (Number 2) who strongly disagrees with informed views and rather disagrees with naïve views or to a student (Number 3) who rather disagrees with informed views and rather agrees with naïve views. Thus, a single scale, merging naïve and informed views, may not comprise different knowledge profiles without being able to assess it. Figure 3 presents this fictive example graphically.
Correspondingly, quantitative studies, having many students in the middle of a scale, might have students with different profiles between informed naïve views in their sample. They might not be able to identify these students due to putting naïve and informed views together on one single dimension. This would go for internationally published studies using Likert-scaled items like the studies on views and beliefs such as Conley, Pintrich, Vekiri, and Harrison (2004), Kampa, Neumann, Heitmann, and Kremer (2016), Harrison, Duncan Seraphin, Philippoff, Vallin, and Brandon (2015) and Neumann, Neumann, and Nehm (2011) who refer to the instrument Lombrozo, Thanukos, and Weisberg (2008). It might be fruitful to deepen the analysis in these studies or to develop these or further instruments with regard to a more distinguished analysis between negatively and positively worded items. It may be discussed or even questioned whether these studies represent the constructs of naïve and informed views adequately with regard to construct validity. Future developments of questionnaires may increase construct validity by using separate scales for naïve and informed views. This may be one important piece within the puzzle of increasing the validity of Likert-based instruments in the field of students' epistemology (Sandoval, Greene, & Bråten, 2016). 6.3 | Implication 2: Models for describing students cognition based on a "ladder metaphor" may be oversimplified In a broader view, one may even ask how models for NOS and NOSI views that go back to a "ladder metaphor" can be justified with regard to this evidence. These models, which describe students' cognition in the field of epistemology, define ordered levels of more and more adequate understanding rather than a continuum. For example, Carey, Evans, Honda, Jay, and Unger (1989) describe Levels 1-3 that correspond to making to a distinction between ideas and experiments, distinguishing between ideas and experiments, and recognizing the cyclic, cumulative NOS. Mesci and Schwartz (2017, pp. 334) describe levels of a NOS continuum from naïve, over mixed, to increasing levels of understanding. The authors state that if "participants demonstrate inconsistent views about an aspect, they were placed in the (+) range of the continuum and considered to hold "mixed" views. Also, Lederman et al. (2014) describe, for the widely used VASI questionnaire, three levels for categorizing views. They distinguish between naïve, transitional, and informed views within an NOSI aspect. Regarding these levels, Mesci and Schwartz (2017, pp. 335) argue that the "use of a continuum F I G U R E 3 Three fictive students show how researchers might lose information and represent students views inadequately by mixing up items containing adequate and naïve views. All three students would get the same score on a common scale although they might hold different adequate and naïve views [Color figure can be viewed at wileyonlinelibrary.com] enables identification of the 'in between' to be represented as such. 'In between' are those perspectives that do not fully align with 'naïve' or 'informed'. Likewise, the continuum enables relative representation of views within the 'informed' range (less informed/ more informed/even more informed etc.)".
With regard to this study, it could be justified to treat the "steps of the ladder" as single dimensions leading to models that rely on a "more of the informed and less of the naïve" logic. Thus, naïve and informed views could be conceptualized independently from each other. Before questioning these models, however, we consider that more qualitative and quantitative research on the interrelatedness of naïve and advanced is necessary.
It might be that more advanced students, such as Hendrik from our example (see Supporting Information Appendix), are able to relate the contradicting and noncontradicting naïve and informed dimensions. They might realize that if they consider a view in one informed dimension as being informed, they cannot consider the views in other naïve dimensions as also being informed. Less advanced students, such as Niklas (see Supporting Information Appendix), might not able to recognize these implicit contradictions. They may hold both naïve and informed views at the same time, as they are not able to connect and to compare both views. From a modeling views perspective, this means that higher dimensional models might be more fruitful in describing less advanced NOSI learners, whereas lower dimensional models might be sufficient to describe more advanced NOSI learners. Future research would have to take in more data on these assumptions with regard to discussing the appropriateness of models using a "ladder metaphor" for describing students' NOSI views.
With regard to learning NOSI views, it would be very promising to think about how to foster informed and how to reduce naïve views in an effective manner.
6.4 | Implication 3: Adequate and naïve NOSI views should (quantitatively) be conceptualized as interrelated but distinct constructs In the following part, we will try to describe and to systematize the four profiles we chose in the LCA. Class 1-Advanced learners holding informed NOSI views: These students are able to distinguish clearly between naïve and informed NOSI views. They are very sure about informed views, and disagree strongly with naïve views, as well as scientists only trying something out during investigating. These students strongly disagree with views about discovering theories randomly, about the conformation bias, about confounded experiments and about models being a copy of reality. Based on the most likely latent class membership, this class comprises the highest number of students (48.9%). Class 2-Slightly unsure learners holding more informed than naïve NOSI views: These students are able to distinguish between naïve and informed NOSI views. However, they tend to slightly agree or disagree with these views; except for the view of scientists trying something out and seeing if it works, where they are very sure about their disagreement. The peak of confounded experiments almost reaching the scale mean shows that they are not very sure if scientists manipulate many variables at a time, although they are quite sure about the CVS in experiments. Based on the most likely latent class membership, this class comprises 31.2% of the students. Class 3-Unsure learners being rather sure about informed naïve NOSI and tending to agree with unplanned investigations and confounding variables: These students are able to distinguish between naïve and informed NOSI views and tend to slightly agree or to disagree with these views. Contrary to Class 2, this also goes for the view of scientists trying something out and seeing if it works. They may think that scientists investigate randomly in some but not all cases. These are quite sure about the CVS in experiments. However, they only disagree very slightly with confounded experiments. Based on the most likely latent class membership, this class comprises 13.8% of the students. Class 4-Learners holding informed and naïve NOSI at the same time: These students are not able to distinguish between informed and naïve NOSI views. Although there might be a slight trend toward informed views, their agreement and disagreement range around the scale mean, indicating that, for them, naïve and informed views are equally likely to describe how scientists carry out investigations. Based on the most likely latent class membership, this class comprises 6.1% of the students.
6.5 | Would relying on contradicting naïve and informed views have led to different results?
Critical readers may state that this may be due to the fact that we did not define naïve and informed views as direct opposites, but in a more or less complementary manner. Our criterion was whether a naïve or informed view would contribute to a purposeful implementation of scientific investigations by the students or not. Looking at the correlation table (Table 6), we observe much higher negative correlations for views being in a more conceptual opposite, such as View 3 ("scientists change only one variable at a time for valid experiments") and View 9 ("scientists change many variables at a time for valid experiments") with a correlation of r = −.57 (p < .001) or View 1 ("scientists are guided by ideas and plan observations"), and View 6 ("while observing, scientists try something out and see if it works") with a correlation of r = −.59 (p < .001). These correlations are much higher than those between complementary naïve and informed views. However, it has to be stated that the negative correlations between opposite views are very far from being r = −1.0. Even opposite naïve and informed views may have a complementary function within the questionnaire, as students may express a more or less sure view by only rather agreeing with the one and rather disagreeing with the other, which would correspond to the conception that "sometimes scientists do it like that or sometimes they do it like this." 6.6 | Limits and further future directions One basic limit, and the basic strength of this study, is in its quantitative approach. Although it cannot take into account the individual understanding of a single student, it can provide a clear confirmatory approach to deriving and testing models against data from a comparatively large sample. We have invested some effort in ensuring the validity of the questionnaire. One limit, which applies to all Likert-scaled questionnaires, is that the students do not have to produce an answer, but to evaluate statements that go back to different views. The evidence gained in this study applies, therefore, strongly to large-scale assessment (as mentioned in the title). Implications on further aspects of NOSI research have the status of data-based assumptions. We see our approach strongly in this particular area, but think that our results may shed an interesting light on further NOSI research.
In addition to the possibilities mentioned above, we can derive three main directions for future research. First, reanalysis of existing instruments could test the "two-sides-of-the-same-coinmodels" that distinguish between naïve and informed items, or views against existing data. This could be the first step toward replicating the findings of our study. Second, mixed-method studies may try to validate the multiple informed and naïve NOSI profiles. We consider the interview data presented in this article as one small first step in this direction. Having more data on reasoning processing within these dimensions would add a new perspective the quantitative findings, to a large extent. It would be fruitful to learn more about how students justify their views, and which scientific examples they would use for argumentation, for example. Third, experimental studies involving students in learning processes that focus on informed views or on informed and naïve views could examine which NOSI dimensions change, to which extent which students would reduce naïve views when learning about informed views, and which profiles react to which intervention. This would link intervention studies on NOSI learning with basic research, and possibly trigger a new branch of NOS and NOSI research.