VisualNeuro: A Hypothesis Formation and Reasoning Application for Multi‐Variate Brain Cohort Study Data

We present an application, and its development process, for interactive visual analysis of brain imaging data and clinical measurements. The application targets neuroscientists interested in understanding the correlations between active brain regions and physiological or psychological factors. The application has been developed in a participatory design process and has subsequently been released as the free software ‘VisualNeuro’. From initial observations of the neuroscientists' workflow, we concluded that while existing tools provide powerful analysis options, they lack effective interactive exploration requiring the use of many tools side by side. Consequently, our application has been designed to simplify the workflow combining statistical analysis with interactive visual exploration. The resulting environment comprises parallel coordinates for effective overview and selection, Welch's t‐test to filter out brain regions with statistically significant differences and multiple visualizations for comparison between brain regions and clinical parameters. These exploration concepts enable neuroscientists to interactively explore the complex bidirectional interplay between clinical and brain measurements and easily compare different patient groups. A qualitative user study has been performed with three neuroscientists from different domains. The study shows that the developed environment supports simultaneous analysis of more parameters, provides rapid pathways to insights and is an effective tool for hypothesis formation.


Introduction
The process of understanding complex brain-related diseases is to an increasing degree requiring a diverse set of study data to be collected and analysed. In addition to brain imaging data, e.g. functional magnetic resonance imaging (fMRI) data, it is necessary to collect physical measurements, such as blood samples, as well as descriptions of psychological factors and states, such as depression. The last decades have seen several efforts in collecting data from a large number of subjects, including the Human Connectome Project [VEUA*12], OpenNeuro [GES*17] and the Consortium for Reliability and Reproducibility (CoRR) [ZAB*14], to name a few.
These efforts are paving the way for exploratory research in which data-driven hypotheses are formed. Exploration provides clues to multiple influences underlying disease and is, therefore, of utmost importance in the understanding of complex disease models. However, the mix of spatial neurometric data and heterogeneous clinical measurements, and the highly iterative nature of the exploratory process makes it challenging for neuroscientists to analyse and discover correlations and causal connections residing in the data. The goal of this work is to support the iterative analysis process involving selection of subject groups, for which a causal connection is suspected, computation of the brain regions that differ between them and analysis of the results. Based on findings, the process continues and is refined, spawning further questions, such as if the found causal connection between regions is apparent in other clinical parameters. Interactive exploration of the linked chain of questions relating spatial and multi-variate data is necessary for understanding the data and supporting the formation of further hypothesis.
Various statistical tools, such as Statistical Parametric Mapping (SPM [PFA*11]) or Connectivity toolbox (CONN [WGNC12]), are commonly used to perform these types of analyses. These are mature tools supporting many statistical methods and various visualizations of the resulting statistical computations. However, they require processing time between each analysis, and thus significantly slow the iterative analysis process which ultimately hinders hypothesis formation and reasoning.
To aid neuroscientists in hypothesis formation and reasoning about their study data, we present an interactive visual environment that integrates statistical computations with interactive visualization components (see Figure 1). The visual environment provides effective ways of selecting subgroups of study subjects through the use of parallel coordinates. Using brushing and linking concepts, the view of clinical data is connected with slice views and a volume rendering of statistically filtered brain imaging data. The resulting visual environment enables the neuroscientists to, in a round-trip manner, pose queries about brain imaging data, starting from clinical measurements, and make queries about clinical measurements starting from brain regions. These concepts are illustrated in Figure 2. The resulting insights can then be further analysed in the existing toolset used by the neuroscientists. The environment is the result of an iterative participatory design process involving neuroscientists and visualization experts over the course of 3 years. This process has resulted in both knowledge about questions relevant for interactive analysis as well as which types of insights can be gained from this interactive round-trip analysis. The main contributions of this work can be summarized as: • An understanding of what types of insights can be gained through interactive correlation analysis between spatial regions in the brain and clinical measurements. • A qualitative study demonstrating that the participants can analyse more parameters concurrently using our environment than with existing tools, and that the presented environment rapidly leads to an intuitive understanding of multi-variate data. • A set of lessons learned from creating an interactive visual analysis tool for the neuroscience domain. • Release of the free software VisualNeuro, an interactive visual environment supporting round-trip bidirectional analysis of spatial and abstract data.

Related Work
The neuroscience discovery process involves analysis of a wide range of data types for multitudes of patients and healthy controls. Specifically, this work deals with functional brain activity scans in combination with clinical measurements, such as blood samples. For a more in-depth description of this type of data and the typical steps involved in the process, we refer to Jönsson et al. [JBA*19].
Most tools used for neuroscience analysis focus on statistical analysis approaches (SPM [PFA*11], Gift [CAS*05], SPSS [GS16]). The CONN [WGNC12] tool differs slightly in the sense that it also includes a set of visual representations connected to brain imaging, such as network connectivity graphs or volume slicing of functional connectivity. While CONN can be used to analyse correlations between clinical parameters and brain regions, it requires significant manual interaction when specifying which parameters or brain regions to include in a comparison. Similarly, the Freesurfer [Fis12] toolbox supports group analysis and visualization, but relies on command-line input to specify arguments. Exploring data from studies with tens or hundreds of clinical parameters is Figure 2: Illustration of the presented concepts for understanding patient group data. The application combines volume rendering fusion techniques for visualizing brain atlas (top left), magnetic resonance imaging (MRI) and functional MRI-derived data with statistical filtering to highlight group-differences (top right). Linked abstract visualization techniques are used to analyse and filter patient groups based on their associated multi-variate clinical data.
thus cumbersome using these advanced tools. The herein described visual environment uses output data from CONN, Gift and SPM and can, therefore, be seen as a complement to the tools mentioned above. The results of the analysis in our environment can also be pipelined back into these tools, in which further analysis or confirmation can be performed. The developed visual environment can thus be used as an exploratory step in the neuroscientists existing analysis pipeline.
There are a range of tools and methods focusing on visualizing brain imaging data [JKRY12, RB19, LM19, [FBKC*12]. In particular, Nguyen et al. [NEO*10] demonstrated a real-time pipeline for analysing the 3D fMRI signal during the scanning process, using an approximated method for treating the fMRI signal as an emissive light source and fusing it with a co-registered magnetic resonance imaging (MRI) scan [HLY07]. Later it has also been shown how to more accurately simulate this emissive light transport interactively in brain imaging data [JY17]. These methods, fusing fMRI and MRI data using light transport, form the basis for the 3D views in the herein presented work and we refer the reader to the multi-modal medical visualization data survey by Lawonn et al. [LSBP18] for an overview of more techniques in this domain. There are also general systems for prototyping visualizations such as Comvis [MFGH08], Paraview [Aya15] and Inviwo [JSS*19]. These tools can only provide the foundation for a tailored application and, in fact, the presented application is built using the Inviwo [JSS*19] visualization system. , none of them link the statistically based filtering to the brain imaging data views, which we identified as important for subject group comparison. Huismann et al. [HvM*17] presented a tool using t-distributed stochastic neighbourhood embeddings (t-SNE) to compare gene expressions and their location on brain-slices. While the t-SNE views are linked with the spatial brain regions, similar to our abstract and spatial linking, their tool is designed for molecular neuroscience and can, therefore, be seen as orthogonal to the work presented herein. Radoš et al. [RSM*16] linked statistics with the parallel coordinates plot for improving interaction and, while it would not aid in analysing the 3D brain imaging data statistics, it could further enhance the understanding of, and interaction with, the clinical parameters.
This work is an extension and result of the design process and user study presented in [JBF*19]. The prototype visual environment described in [JBF*19] has since been substantially improved to a production ready application. This process, taking the findings in the user study combined with further neuroscientist collaboration, is further described in Section 9.

Neuroimaging Fundamentals
Our work targets applications that investigate correlations between brain activity and other physiological or psychological factors. The brain activity is captured using fMRI, which will be briefly described in the following.
MRI is an imaging technique that uses a strong magnetic field and radio waves to measure tissues, e.g. fat, grey/white brain matter. Due to its comparably high spatial resolution, it is often used as a spatial context onto which other information is overlayed during brain analysis.

Clinical Parameters
Brain Imaging InteracƟvity StaƟsƟcs Group selecƟon SpaƟal context -brain atlas Figure 3: Illustration of the key elements, concepts and requirements identified during the design process. These form the basis for the visual environment targeting neuroscientists.
fMRI measures changes in blood flow to detect brain activity [OLNG90]. A scan of a subject's brain results in a 3D spatial map of the brain activity during the scan. Resting-state fMRI is captured while the subject is not performing a specific task and can be used as a baseline for comparing if a region is active during a task.
Arterial spin labelling (ASL) uses MRI to measure blood flow (perfusion) through blood vessels (arterial) [DLWK92]. It tracks blood water as it travels through the brain and captures one 'label' image and one control image. The difference between the two images is used to compute the brain (cerebral) perfusion.
Note that while the techniques discussed above measure different physiological aspects, they can be treated similarly from a visualization and analysis perspective since they are all 3D scalar fields.

The Design Process
The development of the visual analysis environment has been performed in five phases. The first phase has investigated the process of understanding complex brain-related diseases through a series of interviews and observations of the workflow of two neuroscientists and one gastroenterologist. This team of scientists researched connections between the gut and the brain to better understand the irritable bowels syndrome (IBS). The first phase resulted in the identification of major bottlenecks and needs in this process. In the second phase, a visual environment, addressing the major bottlenecks and needs from the first phase, has been developed using an iterative participatory design process [SR12]. The third phase has focused on expanding the visual analysis environment to the needs of a broader group of neuroscientists facing similar analysis challenges through a joint workshop. The workshop resulted in a prioritized list of new challenges, requirements and analysis components that would support a broader use. Figure 3 illustrates the key concepts and requirements identified during the first three phases of the design process. The fourth phase deals with validating the usefulness of the implementation of these concepts using the developed prototype application. After validating the usefulness, phase five deals with the long process of making the application production ready.

Phase 1: Understanding the needs of neuroscientists
The aim of this phase is to both identify needs of the collaborating neuroscientists and to create a common understanding between visualization experts and the neuroscientists. This was done using interviews as well as observations of the neuroscientists while they were using their tools and lasted for about 6 months. In the following, we first briefly describe the research topic of the neuroscientists and conclude with a summary of the identified problems they face.
The group of neuroscientists involved in the first phase investigate the cause(s) of the IBS disorder. IBS is a chronic disorder affecting 7-12% of the general population and is characterized by symptoms such as pain in the gut, alterations of bowel habits, as well as psychological comorbidities such as anxiety and depression. Due to the clinical heterogeneity of IBS, there are no clear biomarkers for the disorder, so it is believed to be caused by a combination of multiple factors [CKE15]. To better understand which factors influence the development of IBS, the domain experts performed studies with, in our case, approximately 100 subjects. The study involved the collection of a wide range of measurements ranging from neurometric imaging (fMRI brain scans), to questionnaires about psychological conditions and physiological parameters. The result is a large collection of data with heterogeneous structures and formats.
The analysis of this study data can be divided into first-level analysis, on a subject-specific level, and a more demanding secondlevel analysis, on group-specific levels. Second-level analysis involves statistical comparison of patient groups with healthy controls through hypothesis tests such as two-sample t-tests (used to test whether the difference between two population means is significant) or ANOVAs (Analysis of Variance tests). Using statistics to identify differences between groups are essential in the neuroscience analysis process. Currently, the domain experts use SPM [PFA*11], Gift [CAS*05], SPSS [GS16] and CONN [WGNC12] for their analysis. These are all advanced tools that have good support for statistical analysis of one patient group using tables, scatter plots and colour-coded slices of the brain. However, iterating over queries is tedious and time-consuming. Each iteration includes refining patient groups, specifying spatial brain regions for comparison, finding outliers and investigating multiple medical parameters at the same time.
In conclusion, investigating the interrelation of the neurometric and the clinical data is one of the most laborious parts in the domain experts current workflow. The most important aspects of this workflow are according to our observations: • Specifying and comparing patient groups or spatial brain regions is essential for the analysis-In current tools, this is a noninteractive process requiring manual configuration of study subject identifiers, the design matrix for the statistical comparison and/or the brain region in question. • Statistical computations are essential for all steps of the analysis-Current tools have good support for statistical analysis of one patient group. However, more evolved queries are tedious and time-consuming, e.g. finding outliers requires going back to the tabular display and, if identified, potentially removing the subject(s) from the patient group followed by recalculating the statistics. • Understanding complex diseases requires the analysis of multiple parameters at the same time-In current tools, this requires manually iterating over many possible combinations of parameters one by one. The result is that the domain experts mostly focus on a few subject groups and a few parameters during each session. In summary, the lack of immediate visual representation hinders the ability to explore the study data and reason about hypotheses. Thus, combining interactive subject group selection based on clinical parameters and linking it to views of the brain imaging data has been identified as the most important goals for the next phase.

Phase 2: Iterative application design
The second phase focused on designing a visual environment addressing the two most pertinent aspects identified during the first phase, improved support for interaction and integration of statistical analysis. This phase was conducted using short iteration cycles and rapid prototyping of new features based on obtained feedback. The duration of this phase was approximately 1 year. The key aspects in the previous phase were addressed in the following ways: • Group selection-Criteria based on clinical parameters are specified using interaction techniques in parallel coordinates. • Interactive statistics-Continuously update statistics according to the current subject group selection in background threads. No button for computing statistics, it is linked to the selection interaction. Show neurometric differences between groups by highlighting statistically significantly different regions. This statistical filtering was inspired by CONN, where t-values are shown but there is a lack of interaction possibilities. • Iterative analysis-Provide means for multi-variate clinical parameter analysis through parallel coordinates. Outliers can be detected and filtered away interactively. Scatter plot matrices were also evaluated for interacting with the multi-variate data but discarded due to screen cluttering and, since they use scatter plots in their existing toolset, they did not provide much new information.
In terms of the spatial data, a key aspect is to provide context for spatial queries of brain imaging data using brain atlas label regions.
While the resulting visual environment was tailored towards the analysis of IBS, the combination of brain imaging and clinical data is common in neuroscience and presents a typical analysis challenge. The next step is, therefore, to identify what would be necessary to make the environment more generic.

Phase 3: Broadening the use cases
To explore the usefulness for a broader set of applications, and define further development of the tool, we arranged a workshop with six neuroscientists. During the workshop, each neuroscientist gave a short presentation of their research problem from a data-centred point of view including workflow, typical analysis, commonly used tools and finally the main challenges they face in their analysis steps. Our visual environment was then introduced, using data from the existing collaboration, followed by discussions. It is worth noting that there were skeptical opinions on the idea of interactive exploration in the beginning of the workshop. The immediate reaction was that it would introduce bias into their analysis. However, the participants discovered the opportunities that interactive exploration can offer during the presentation, and the workshop resulted in a prioritized list of concepts and improvements for the tool. Notably, there were two concepts that were not supported by their current tools, which also ended up highest on the prioritized list: • The ability to interactively select a region in the brain and see how it correlates with all the clinical parameters. • The ability to interactively select a clinical parameter and see which regions in the brain it is correlated with.
The combination of these two correlations would provide a way to analyse the clinical parameters and brain in a bidirectional manner illustrated in Figure 3, i.e. seeing how a specific clinical parameter correlated with the different regions in the brain or seeing how a specific region in the brain is correlated with all the clinical parameters.
The two identified key concepts were implemented after the workshop along with other minor suggestions for improvement and support for modalities of brain data used by multiple neuroscientists, such as resting state fMRI and independent component analysis of fMRI. The visual environment for second-level analysis presented in the following section is based on the progression of requirements and solutions presented in the three phases described above.

Phase 4: Validation
To make sure that the implementation of the identified concepts are useful to the neuroscientists, a set of open tasks was designed. These tasks were performed in a qualitative user study, which resulted in many new insights with respect to both usability and the type of statistical methods that should be used during the analysis. This phase is reflected by Sections 6-8.

Phase 5: Making a production ready application
Based on the results of the user study, it was decided to take the prototype application into a state where it could be used on a regular basis by the neuroscientists. This is a long process involving everything from improving the functionality based on the user study to adding auxiliary features such as installers, web page for distribution, saving and loading functionality. The resulting application will be shown side-by-side with the prototype used in the user study for reference. Section 9 deals with the description of this phase with a focus on the improved functionality based on the user study.

Prototype Visual Analysis Environment
The interface to the visual environment contains two linked main parts, one representing spatial data and one representing multivariate clinical data (see Figure 4). The top region visualizes the spatial data through 2D slices and 3D volume renderings. They aim to answer questions relating to where in the brain the subject groups differ. Three different types of spatial data are used to support the spatial analysis, a template MRI brain [ABEC06] acting as a map, brain atlas label regions [MLKB03] supporting queries about specific parts in the brain and finally the subject's brain imaging data collected or derived from scans. The subjects brain imaging data are shown as the aggregated group average or correlation values depending on the current interaction state described in this section. The template MRI brain is shown in the background, while the atlas regions are fused using additive blending in the 2D slices and accumulation level intermixing [CS99] in the 3D volume rendering. The lower region of the application interface visualizes the multivariate clinical data and correlations through a parallel coordinates plot and a box plot. The right-hand side contains a range of settings as well as a list of the atlas label regions available. The vi-sualization techniques described above are publicly available in the Inviwo [JSS*19] visualization framework.
The input data to the environment are clinical parameters supplied via a spreadsheet of all subjects and brain imaging data for each subject. The supported brain imaging input are perfusion scans, measuring the amount of blood uptake at each location in the brain, functional connectivity, e.g. correlation between one region/voxel in the brain and all other regions/voxels, and principal component analysis of fMRI, decomposition of the most prominent signal in an fMRI sequence. Each scan is registered to either the Montreal Neurological Institute (MNI) or the Talairach coordinate space. All the presented components are linked based on subject or spatial region to provide coordinated views and answers to spatial and multivariate queries. Typically, the user interaction starts by selection of groups and comparison followed by deeper bidirectional analysis of brain and clinical data correlations. This section describes these user interaction flows in the context of the available components.

Defining and comparing subject groups through parallel coordinates
Forming a subject group from a data-centric perspective can be seen as putting criteria on each clinical parameter, e.g. a group can consist of subjects with low scores on depression parameters while another group consists of subjects with high scores. The simplest way of dividing patients into two groups would be to specify one threshold for each parameter. However, this approach does not convey overview information and therefore do not easily allow for outliers to be excluded. We, therefore, chose the parallel coordinates plot, which provides better flexibility and has been shown to be well suited for rapidly specifying these types of criteria [SR06]. Each clinical parameter, selected by the end user, is here represented by an axis in the plot. Furthermore, each subject is represented as one line intersecting each axis at its corresponding parameter value (see bottom part of Figure 4). Each axis has movable handles in the upper and lower parts for brushing the group selection criterion. As subjects are filtered in the parallel coordinates plot, their corresponding brain imaging data are also removed from the 2D slices and 3D rendering, enabling linked direct visual feedback on the group's average neurometric data.
Two separate groups can be specified and compared at any given time. The active (displayed) group is set through the controls on the right-hand side (see Figure 5). While it could be useful to specify many groups at the same time it would complicate the implementation, so it was decided to first investigate ways to compare two groups before implementing support for multiple groups. Early experiments showed that spotting the brain imaging differences between two groups using side-by-side comparison was difficult, causing us to instead choose direct comparison through switching between groups. We note that analysing how different the groups are instead of where would benefit from side-by-side comparison, but would also require more space [WBWK00]. Here, 3D rotations and slice positions are unaffected when switching groups to enable direct view-based comparisons.
Visual comparison of two groups based only on their averages is still hard and it is also not possible to judge if the differences are statistically significant. For this purpose, we employ the Welch's ttest [Wel47], which support unequal sample size and variance. The t-test is applied to each voxel and used to filter out areas of low statistical significance through a user-specified p-value: (1) Here, the X A and X B represent the mean of the voxel in groups A and B, respectively. The s 2 A and s 2 B are the variances for the same voxel while N A and N B the sample sizes of the two groups. The result of applying this equation and filtering by p = 0.25 (default is p = 0.05) can be seen in Figure 5(a). This type of comparison can be seen as a combination of temporal juxtaposition and explicit encoding [GAW*11]. While the computations can take a few seconds, they are performed in a separate thread and, therefore, do not affect the interactivity of other modules in the visual environment.

Single clinical parameter and spatial data correlation
To support the neuroscientist in deeper analysis of a clinical parameter without introducing additional visual components, axis selection in the parallel coordinates plot is used as an interaction concept for enabling information on the selected parameter and its correlation with the brain imaging data. The axis selection is linked to the brain imaging views where the Pearson correlation coefficient [Fis30] between the parameter's values and each voxel's values belonging to the active group is shown (see Figure 6a): Here, cov(X voxel , Y parameter ) is the covariance between the voxel's and parameter's values, while σ X voxels and σ Yparameter are their standard deviations. The computations result in a volume where each voxel has a correlation coefficient ρ voxels,parameter ∈ [−1, 1]. Similarly, Visual-Neuro by default uses Spearman correlation, because it does not assume linear relationship. In addition, only statistically significantly different correlation values are displayed (see Figure 6b). Here, −1 represents negative correlation and is mapped to blue colour, 0 represents no correlation and is mapped to transparent while 1 represents positive correlation and is mapped to red colour. The bluered colour mapping has been chosen to match the conventions in the neuroscience domain where blue commonly represents negative values while red represents positive values. Again, the computations are performed in a separate thread to ensure interactivity and, once available, is shown to the user in the upper part of the environment.

Atlas region correlation with clinical parameters
For neuroscientists who are interested in particular regions in the brain and would like to understand if, and how, they are correlated with the different clinical parameters, the environment provides a list with all atlas label regions (see Figure 7). Selecting one or multiple atlas regions in this list causes the correlation coefficient to be computed for the voxels in the selected regions and each clinical parameter. Thus, each parameter includes correlations with all voxels in the selected regions, resulting in a distribution for each parameter. There are many ways of visualizing such distributions, e.g. as histograms or simply plotting each point in a scatter plot. We primarily chose to use box plots [MTL78] since the neuroscientists had been using these in publications before. The box plot visualizes the median, first and third quartile and min and max of the calculated correlations.

Structure-functional regions
The anatomical atlases are fused into the 2D and 3D views of the spatial data as contours (line in 2D and surface in 3D) to make spatial orientation easier. The atlas is displayed based on the position of the 2D slices and is highlighted using a line in 2D and a surface in the 3D view (see Figure 7 or 8). In addition, the atlas region is displayed in text along with the anatomical position for more detailed information and compatibility with other tools.

User Study
A user study was performed to investigate the effectiveness of the interactive visual environment in its intended application area. The objective was to evaluate the potential usefulness, receive feedback on fulfillment of previous requirements and gather requests for future development. A primarily qualitative approach was considered appropriate in relation to our objective [IIC*13, Yin17]. Therefore, we chose to conduct a study based on the 'think-aloud-method' in combination with a flexible interview format and observation of use.
Participants also responded to a quantitative questionnaire to further gauge their subjective opinions. Three neuroscientists from different domains took part and the study was set up to mimic realistic cases of exploration and analysis of data in their respective domains.

Participants
The three participants in the study were all employed as neuroscientists at different departments within Linköping University, colocated with the University Hospital. The participants conduct research on causes of brain-related complex diseases and are potential end users of the tool, details of the participants and their respective data domains are presented in Table 1. All three were familiar with the tool and its potential as a concept since they were involved in the workshop (Section 4.3). Only the participant in the second row of Table 1) had been involved in the entire participatory design process and had used versions of the system prior to the study. The participants received no compensation for taking part in the study.

Procedure
Prior to the study each participant provided the experimenter with their own data which was loaded into the system. Each participant took part in the study on an individual basis, one at a time. Demographic background information was obtained, and each participant also signed an informed consent form. Participants then reviewed written instruction material and illustrations of the modules in the system before the actual session using the system began. Participants were also explicitly encouraged not to feel any pressure to give positive feedback when working with the system, although they knew that two of the experimenters were also the developers. The

Figure 6: Differences between a clinical parameter and regions in the brain can be seen by selecting an axis in the parallel coordinates plot, causing (a) Pearson correlation and in (b) the Spearman correlation filtered by its associated p-value to be displayed.
Here, blue and red colours depict negative and positive correlation, respectively. session using the system consisted of two parts. To begin with, one of the experimenters demonstrated the system. This scripted walkthrough provided an overview and a step-by-step demonstration of the interface components and their functionality. Participants were encouraged to ask questions during the demonstration for clarification if needed.
The participants were given four tasks, described in detail in Section 6.3, and instructed to work with these one at a time. Order of presentation of the four tasks was not a factor, as it was clear that order of tasks would not have a negative impact on the outcome and all participants executed them in numerical order.
While working with each case, the neuroscientists were asked to 'think aloud', meaning that they should describe what they did, why they did it, what they would like to do, etc. The main objective was to obtain as much information as possible on how the system supported the neuroscientists and if anything was preventing them in their workflow and analysis processes. Therefore, they were encouraged to ask questions and request assistance at any time. One of the experimenters demonstrated the system and assisted the participant when needed. This also included asking questions to the participant when relevant. A second experimenter took notes and documented parts of the sessions by voice recording. A third experimenter was also present to assist asking and answering questions.
An interview guide was used during the study. It consists of a set of questions that covers various aspects of the requirements and concepts presented in Section 4 and Figure 3. Some of the questions were answered while participants were working. The remaining questions were discussed in a post-experiment conversation

(2) The correlations between the voxels in the selected region(s) and every parameter is then calculated and shown in a box plot.
where all experimenters engaged together with the participant. Finally, each participant completed a subjective satisfaction questionnaire. The subjects responded to 11 statements and rated their satisfaction on a five-point Likert scale ranging from: 1 (Not at all good/Definitely not) to 5 (Very good/Yes absolutely) with a middle point 3 meaning being unsure. The results of the questionnaire along with the covered issues are presented in Section 7.4. For each participant, the study lasted for about 2 h including all parts described above.

Tasks
Each of the four tasks were defined based on the needs and requirements identified in Section 4, while at the same time being exploratory and adaptable to the participant's own agenda. Each task is first presented here, followed by our reasoning behind the task. The participants only saw the tasks during the study.

Task 1: Select two patient groups that are relevant to your research.
Designed to investigate if selecting groups through filtering in the parallel coordinates plot is intuitive, easy to do and provide enough flexibility. We were also interested in understanding the effects of interactive iterative group selection on hypothesis formation as it was identified in Section 4.1 as important for understanding complex diseases.

Task 2: Which regions in the brain are different between the patient groups with respect to brain activity?
This case primarily investigates how to present differences between groups in terms of brain activity. Having selected two groups, the participant needs to check the 'Compare groups' option seen in Figure 5. The statistically significantly different areas between the two groups are highlighted while the other ones are suppressed. The aim is to find out what type of knowledge the neuroscientist gains from this highlighting/suppression and if there are alternative visualizations that could be useful.

Task 3: For one of the patient groups, explore correlations between the regions in the brain with the clinical parameters.
Being able to find out how and which clinical parameters a brain region is correlated with was identified at the workshop described in Section 4.3 as potentially useful. The end user needs to select an axis in the parallel coordinates plot to cause the correlation between the selected parameter and each voxel, for the selected group, to be displayed in the 2D and 3D views. This task aims to identify if parallel coordinates axis selection is a good way of interacting, if the presented information is useful and, if so, what type of knowledge can be gained?
Task 4: For the other patient group, and a variety of clinical parameters important to your work, explore which regions in the brain they are correlated with.
Previous tasks involve analysis going from the clinical parameters to the different regions in the brain. The purpose of this task is to investigate what insights can be gained from analysing the opposite direction, i.e. going from spatial to abstract space. The end user needs to select one or more atlas regions in the brain in order to investigate its correlation with each clinical parameter using the presented box plots. Again, this was identified as one of the novel concepts during the workshop described in Section 4.3. In addition to understanding which insights that can be gained we are also interested in finding out if the box plot provides expected and enough information.

Study Results
The three sessions performed during the user study account for about 4 h of usage. The results of this usage and the prepared interview guide are first presented, followed by the quantitative results from the questionnaire. The main focus here is on aspects that can be useful to others designing similar environments. The structure follows the order of the performed tasks, cf. Section 6.3, which also coincide with the concepts of interactive visual analysis [WH14].

Group selection
The participants with prior knowledge about their data started with selecting two groups with known correlations. They verified that the difference between the two groups matched their expectations by locating the highlighted regions and confirming that they were positioned correctly. Exploring further, a common use case was to separate two groups based on the high and low values of a particular parameter. For example, comparing patients having IBS with high depression versus patients having IBS but low depression. Being able to store a patient group to resume to was noted by the participants as desirable as they progressed in their group selection.
By observing the participants while using the parallel coordinates plot to select groups, it could be concluded that all participants understood how to interact with the environment. Furthermore, while the participants seemed to appreciate the overview of all subjects in the plot, e.g. through the lines crossing each axis, they did not seem to use the correlation information available when observing relationships among neighbouring axes. Providing more training or cheat sheets [WSMRB19] on how the techniques work might be necessary for full better utilization. Regarding if the tool supported them in getting an overview one comment was 'Great overview of a huge data set and its interactions with each other. And then it leads you to more specific hypothesis'. One suggestion for improving the parallel coordinates plot was to provide additional information about aggregate values of each parameter, such as mean and quartiles, to make it easier to determine where to set filter values.
All participants stated that the visual environment allowed them to interactively analyse more parameters at the same time compared to their existing tools. In particular, they stated that seeing all the variables at the same time helped them to create hypothesis and see cross-correlations and relationships to the brain regions. As for being able to analyse even more clinical parameters, one participant suggested to add a scroll bar and ability to move axes to keep the ones they wanted visible within the viewport. When asked how many parameters at the same time they would need to see it varied between 10 and 20.

Group comparison
The participants analysed the magnitude and type of difference in brain activity between two groups, e.g. if the brain activity of both groups were decreasing at the same time or if one was increasing while the other was decreasing, by switching between the two groups to view their respective average. It was pointed out that the difference can also be represented by the t-statistics and colourcoded by the sign, which makes it possible to directly see the magnitude of the statistical differences although it loses information about the original values. Two participants thought that both of these representations are useful and that it would be beneficial to be able to switch between them, with the t-statistics being the default, while the remaining participants did not mention anything about it.
One participant combined the atlas regions with the brain difference between two groups to understand dualities within the atlas region. This duality is illustrated in Figure 8, where the different colours in the same region indicate that the region is responsible for more than one task. Task dualities within regions provide knowledge about which and how the regions should be divided when performing functional connectivity analysis. Mirroring of functional connectivity on two sides of the brain was also explored. For example, examining if the colours are different in the corresponding position of the interior (front) and posterior (back) sides of the brain. It was noted that the functional brain activity does not always correspond to the structural brain regions given by the atlas, so it could be useful to supply regions by either drawing a sphere or a providing a custom segmentation volume.
When observing the participants comparing differences in the brain, they mostly used the 2D slice views. When asked about the 3D view, the participants thought that it was useful for ensuring that they did not miss information and that it helped them to orient themselves. A richer set of interactions in the 3D view, such as being able to click to place the orthogonal slices and create/place cut planes, was pointed out as potential improvements.
Two of the participants expressed concerns with respect to verifying hypotheses since they have actively selected what they are investigating, i.e. introduced bias, and therefore cannot use, for example the p-values for reporting. However, it could help to confirm hypothesis in cases when there are several data sets with similar patient groups available, in larger studies, or when exploring pilot studies. Furthermore, it was also suggested to add non-parametric tests due to the common case of non-normally distributed brain data.

Within-group parameter/brain correlation
The third and fourth tasks investigate the bidirectional analysis between the clinical parameters and the brain. Because they are linked, we present the results of both jointly. We start with the parameter to brain analysis, where selecting a parameter in the parallel coordinates plot causes the Pearson correlation with each voxel to be visualized. While two of the participants stated that this analysis could be useful and, for example 'tells me what subgroups to focus on', they were not convinced that the Pearson correlation was the right information to present. They would like to apply a filter based on the p-value, similar to how group comparison is performed, to ensure that only statistically significant regions are displayed. For interaction, clicking the axis was sometimes mistakenly done when trying to filter groups. It was suggested that this could instead be done using a checkbox. Adding the ability to control for one or multiple other parameters was suggested as an improvement for supporting more advanced analysis.
The ability to select an atlas region and see its correlation with all clinical parameters was deemed useful by the participants. For example, to investigate a hypothesis about people with chronic pain with respect to the insula region and using the results in the next study with the same covariables. The results of the correlation with each parameter were visualized using a box plot showing the first and third quartiles, but not all participants were familiar with quartiles. Here, it was suggested to add confidence intervals. Furthermore, it was also suggested to add the ability to see a scatter plot of the data to aid the analysis of outliers' influence on the correlation computation. As noted previously, the atlas regions do not necessarily correspond to the functional regions, so the participants would also like to be able to specify customized regions in the brain.
Two participants noted that there is a risk of introducing bias when analysing the correlation between a brain region and all parameters. One participant stated that it would not currently be possi-ble to confirm hypotheses using Pearson correlation values, but that using non-parametric correlation along with a report on the tests used could potentially change that.
All participants said that the tool supported them in gaining new insights. For example, one participant stated 'Definitely. If I had access to this tool without a doubt this would be my first phase of investigation. Before running correlations, before running the raw stats I would play with this for a couple of days and then write a whole list of specific hypotheses and define all my anatomical regions, write down some specific coordinates and then take it to real analysis. I would clean my data set with this'.

Quantitative results
The answers to the subjective satisfaction questionnaire for each participant are depicted in Figure 9. The wording in the figure has been slightly changed to clarify reporting. However, it clearly represents the statements presented in the questionnaire and answered by the participants. In some cases, notes were added by the participants to clarify their statements. For example, the ease of use was very good when 'adequately explained'. In this case, the participant was used to having a pre-defined group for comparison, while here the notion of a group is flexibly determined by the parallel coordinates filtering. The participant had no issues with forming groups once this concept was better explained. When it came to the value of interaction all participants rated the highest score and comments included 'Switching quickly between analyses … is a strong plus'. One participant stated that using the system could save weeks of work compared to using traditional methods and that the quality of the work would improve due to the ability to more rapidly explore hypotheses. Another participant was unsure if it would save time since 'I might end up exploring too much', but stated that the quality of the work would improve. The ability to confirm hypotheses was doubtful or unsure since the exploratory process might bias the result (see Section 8.1 for a discussion on this issue). The participant rating five pointed out that in case the study is based on a large enough number of subjects it would be possible to use the exploratory approach on a subset of the data and confirm the hypothesis on the other part. All participants would like to use the system in their work and would also recommend it to colleagues.

Study Discussion
The study showed that the flexibility in selecting subject groups made it possible for the participants to form and reason about hypotheses in studies including brain imaging and clinical data. While the number of participants in the study is low, they span a wide range of use cases and each had brain data stemming from their own research. The rapid hypothesis formation process was pointed out as an important factor for improving the quality of their work. A practical example provided by one of the neuroscientists revolved around the two clinical parameters 'anxiety' and 'history of abuse'. It was, through the visual display, directly possible to see that they have a potential dependency and are not independent variables, which consequently affects how they are used in regression models.
The different types of analysis, group comparison and withingroup comparison between brain and clinical parameters, supported several types of queries. It could both be seen where in the brain two groups are different and in which way they are different. The brain atlas provided context but could also be used to identify if an atlas region were involved in multiple tasks. Queries about the connection between an atlas region and clinical parameters could be explored through box plots. The formed hypotheses can serve as a starting point for further analysis in other tools, future studies or to visualize other interactions between parameters. The participants all agreed that the interactive visual environment enabled them to gain insight.

Lessons learned
The key lessons learned resulting from the iterative design process, workshop and user study are listed below, followed by more general conclusions: • Parallel coordinates work well for subject group selection. However, axis-selection needs to be carefully implemented such that it does not conflict with filtering operations. • Filtering based on statistical significance (p-value) is essential when exploring neurometric study data. • Brain atlases provide context and enable richer spatial analysis but should be possible to customize. • The atlas regions do not always overlap with the functional activity areas of the brain. Thus, it can be concluded that providing voxel-level information is important, and that aggregating the information on atlas region level for comparison could hide important information. Thus, creating individual correlation plots between clinical parameters and brain regions would not suffice for exploration. • Linked interactive views for spatial and abstract data can be combined with statistical computations to enable rapid hypothesis formation and reasoning. • Integration and reporting of specific statistical tests can under some circumstances enable hypothesis confirmation and thereby extend the use of the environment. • End user needs to be aware of the balance between exploration and 'p-hacking' (discussed below).
We further conclude that interactive exploration can save many hours of time compared to using traditional methods. The integration of statistical analysis is essential for the tool to be useful, since it otherwise is difficult to analyse and compare the many 3D volumes contained within groups of patients. In addition, providing contextual information in a way familiar to the end users helps them to get started and orient themselves in the exploration environment.

Data Dredging
When investigating data to find patterns and correlations there is a risk that the data are presented as statistically significant even though there is no underlying effect. Misusing the analysis and presenting the results as statistically significant is often referred to as data dredging or p-hacking. All participants pointed out this risk, but also that, in order to know what to investigate in the next study, there is a need to explore the existing data. Thus, in reality, data ex-ploration must be performed while, most importantly, being aware that the results of the analysis must be verified later.
The efforts during the last decades in collecting large number of subjects may decrease these concerns. Still, many smaller studies, adapted to find the cause of specific diseases, will continue to exist and have needs for exploration. In light of this, we have identified three scenarios for which the presented visual environment can be used. First, in an exploratory phase for pilot studies as described by Moore et al. [MCNS11], where the idea is to find potential patterns that can contribute to the design of larger studies. Secondly, in cases when there are multiple independent studies having similar data available. The tool can then have one study as input, while the results are verified in another. Third, in studies or collections of data where the sample size is large enough to be used on a subset of the samples. The created hypotheses can then be verified using the remaining samples.
The participants included in the study were experienced neuroscientists, but future end users might not have the same awareness in their analysis. This spawns new questions on how interactive visual environments that are based on statistical group analysis should deal with data dredging. Certainly, including better statistics at the cost of computations and thereby possibly interaction, is one way forward. Another possibility is to make the end user aware of the potential pitfalls. We believe that both alternatives are interesting research questions on their own but go beyond the scope of this work.

Limitations
The study also showed that there are many areas that can be improved. All participants in the study were used to working with a variety of tools in different ways of representing information. Adapting the look and feel of the presented visual environment to be more similar to these tools would make it easier for the neuroscientists to understand how to use it. To name a few examples, being able to change colour mappings, using t-statistics for representing difference magnitudes and switch between neurological and radiological display convention. Other points for improvements include ability to store settings for multiple groups, be able to control for one or multiple variables and showing aggregate data in connection with an axis in the parallel coordinates plot. The number of clinical parameters that could be analysed at the same time were more than in their existing tools. The participants stated that, at any given moment, it typically would not be useful to view and interact with more than 10 to 20 parameters at the same time. However, it would help them if they could easily access all available, possibly hundreds of, parameters. The participants suggested using a horizontal scroll bar in the parallel coordinates plot for this purpose or have the ability to quickly switch the underlying spread sheet. Another option could be to list of all the parameters along with a checkbox indicating if it should be included in the parallel coordinates plot.

From Research Prototype to Production Application
The user study demonstrated the visual analysis environment's potential in aiding the neuroscientists in their work. To further understand the long-term effects it could have on the neuroscience workflow and discovery process it is necessary to provide a ready-to-use application going beyond the research prototype used in the user study. We have therefore continuously improved the visual environment and will release the 'VisualNeuro' application at www.visualneuro.com. In the following, we provide design decisions and experiences made from taking the research prototype to a release version, which may seem like a straightforward minor task but in reality is a long and continuous process.

Data import and colour map automation
The application requires two sources of information, brain imaging and clinical data, which varies from study to study. The major task here has been to identify the information required by the user and then provide an easy way for its specification.
Brain imaging data consist of possibly hundreds of files and studies may differ in imaging type, e.g. functional connectivity or blood flow measurements. We have automated the imaging data specification such that the user only needs to provide a folder in which the subject scans reside along with what type of data the volumes represent. The specified folder is automatically searched for supported volume data files, e.g. Nifti-files, and an appropriate transfer function based on ColorBrewer [Bre0x] is selected depending on the type of data. A perceptually linear sequential transfer function is used for Blood flow data while a diverging transfer function is used for ICA and functional connectivity data (see Figure 2).
The user can optionally select the clinical parameter to use for colour mapping in the parallel coordinates plot. The system then chooses an appropriate colour map based on the parameter data (categorical, continuous, diverging), again using colour maps based on ColorBrewer [Bre0x]. Categorical data have no magnitude difference, e.g. healthy or sick, so here perceptually distinct colours with only hue variation is used. Continuous data are assumed to have numerical values going from low to high, so a sequential colour map with perceptually linear intensity variation is used. Diverging data are assumed to have numerical values going from negative to positive, so the colour map is split at the zero crossing with two different colours varying in intensity.

Improving usability
Several changes have been made in the Visual Analysis Environment based on the user study. This includes small changes in the interface as well as novel functionalities, which are described in the following.
Improved group comparison-As concluded after the user study, making comparisons based on the p-value filtered group average was not intuitive to the participants (cf. Section 7.2 and see Figure 5a). Thus, an improvement made in VisualNeuro is to show the t-value in Equation (1) instead of the group mean (see Figure 5b). The main usage obstacle identified in the user study has thus been removed by integrating p-value based t-value filtering in the parameter to brain comparison. Another benefit of this approach is that there is no longer any need to switch between groups during comparison as the t-value is the same independent of the active group, which reduces the application complexity.
Display mode selection-It was noticed that the participants sometimes had difficulties in keeping track of the brain imaging information that was displayed, e.g. is statistical filtering applied or not. To make the type of spatial information currently displayed more clear, a drop-down menu replaces the checkbox for enabling/disabling filtering. While this may sound subtle, it reduces the cognitive burden of thinking in terms of on/off by simply stating the mode instead. At the same time, it enables the user to switch between more modes (group average, group t-test, parameter t-test), which was not possible before.
Complexity reduction of the interface-While the brain region selection view and statistical comparison with clinical parameters (box plots) were deemed useful, they were not used all the time. Thus, it was decided to move the box plots into a separate window that can be opened on demand (see right part of Figure 1). Note that linking and brushing techniques are still applied, meaning that selected regions are highlighted in the spatial views. The application thus become less cluttered. Other improvements for reducing complexity include adding search/sort functionality to the brain region list, since it contains more than 100 items, computing and moving the cross-hair to the centre of the selected region upon selection, and providing only two parameters for changing the transfer function, threshold and opacity. The user-supplied threshold and opacity are mapped to move the individual control points in the transfer function under the hood, independent on if the data are diverging on not. The threshold σ t adapts to diverging transfer functions σ (v) by filtering both negative and positive values v: The user-supplied opacity, on the other hand, linearly adjusts all transfer function control points except the threshold control point(s), which are kept at zero opacity.
Flexible look and feel configuration-Based on further iterations with the neuroscientists after the user study, a set of additional improvements have been integrated. For example, the ability to switch MRI background in the 2D view. Here T1, white and dark matter were deemed useful while others, such as fat and myelin, were disregarded. Another example is the ability to save and load states since is necessary for day-to-day usage.

Improving rigour and analysis
Two main technical aspects were pointed out during the user study when it came to being able to use the tool for analysis. First, the need for t-value in between-group comparison. This point was addressed in Section 9.2. Secondly, non-parametric statistics should be used instead of Pearson correlation to be able to use the information in the brain to parameters (box plot) comparison, cf. Section 7.3. For this purpose, VisualNeuro integrates the Spearman rank correlation [Spe04], see Figure 7(b), which reflects the monotonic relationships between two variables independent on if they are linear or not. Furthermore, it was determined essential to also filter the correlations with their associated p-value to only show the statistically significantly different ones. Thus, the region to parameter (Figure 7b) and parameter to region (Figure 6b) correlations only display the values below the specified p-value. By making these large integration changes, the two main technical statistical analysis obstacles identified during the user study have been removed.
In terms of visualization rigour, VisualNeuro uses perceptually motivated colour maps based on ColorBrewer [Bre0x]. In addition, temporal juxtaposition has been reduced in favour of spatial juxtaposition due to its superiority in visual comparison [GAW*11]. More specifically, box plots of each group are shown next to each other instead of requiring the user to change the active group as seen in Figure 7 and, as discussed before, the use of t-value reduces the need for switching groups during comparison.

Neuroscientist feedback
The most important improvement of the production application is the deeper integration of statistical filtering according the collaborating neuroscientists. They further state that the use of p-values is deeply integrated into their work and having this filtering directly incorporated with the interactive visualization tools is essential for inclusion in their workflow. Also, the group t-test provides a general visual idea about the regions of the brain that could be affected and then, in combination with the other parameters, they can start to make associations with other parameters within each group. For example, if they see that group differences in brain regions seem to gather in subregions of the emotional/limbic system of the brain, and that these regions are more closely associated with, for example, anxiety or depression than they are with a bacteria from the microbiome, then they can begin to generate hypothesis concerning the patient population in terms of other factors such as a history of child abuse or trauma.

Conclusion
Exploratory driven research is becoming increasingly important as the number of parameters and subjects grow in neuroscientific studies. However, as identified during the presented design process, existing tools provide versatile statistical measures for hypothesis confirmation but are not designed for interactive hypothesis formation and reasoning.
We identified that a round-trip query process, where questions relating brain imaging data to clinical parameters and vice versa, had the potential of driving a hypothesis formation and reasoning process if supported by statistical measures and filters. An interactive visual environment was designed to demonstrate this round-trip query process and a qualitative user study was performed to confirm its usability and identify the different types of insights gained during its usage. The qualitative user study showed that the interactive workflow, combining visualization and statistics, is an enabling factor for the participants in exploring their neuroscience study data. A variety of MRI/fMRI-derived data was used in the study including cerebral blood flow, functional connectivity and Principal Component Analysis (PCA) of fMRI, which also demonstrates its applicability in a wide range of neuroscience use cases. Examples of insights gained are which brain regions differ between groups, symmetries in these differences or parcelations' of atlas regions with respect to functional connectivity. The fact that all participants would like to use the presented visual environment for gaining insights about their study data stresses the importance of the exploratory gap covered by the presented interactive round-trip query process. We provide a summary of lessons learned during this process that can be used by others developing tools for interactive neuroscience data analysis.
The concepts implemented in the prototype visual environment were shown to be highly useful to neuroscientists, but the prototype also had limitations in terms of being able to use it on a day-today basis and need for other types of statistics for better comparison of groups and filtering during within-group comparison. The major limitations of the research prototype were addressed and a large effort was made to make the tool available for use on a regular basis. We described the major changes made to take the research prototype to the production ready VisualNeuro application, which is made freely available in connection to this work.