A systematic mapping study on crowdsourced requirements engineering using user feedback

Crowdsourcing is an appealing concept for achieving good enough requirements and just‐in‐time requirements engineering (RE). A promising form of crowdsourcing in RE is the use of feedback on software systems, generated through a large network of anonymous users of these systems over a period of time. Prior research indicated implicit and explicit user feedback as key to RE‐practitioners to discover new and changed requirements and decide on software features to add, enhance, or abandon. However, a structured account on the types and characteristics of user feedback useful for RE purposes is still lacking. This research fills the gap by providing a mapping study of literature on crowdsourced user feedback employed for RE purposes. On the basis of the analysis of 44 selected papers, we found nine pieces of metadata that characterized crowdsourced user feedback and that were employed in seven specific RE activities. We also found that the published research has a strong focus on crowd‐generated comments (explicit feedback) to be used for RE purposes, rather than employing application logs or usage‐generated data (implicit feedback). Our findings suggest a need to broaden the scope of research effort in order to leverage the benefits of both explicit and implicit feedback in RE.

explicit user feedback is key to RE practitioners to discover new and changed requirements and to decide what features to add, enhance, or abandon. 1 However, a structured account on the sources and types of user feedback useful for RE purposes and on the characteristics of those feedback types is still lacking. This paper addresses the gap by providing a systematic mapping study to report the state-of-the-art in using crowdsourced user feedback in RE. Implementing the guidelines proposed by Kitchenham et al,2 we analysed empirical evidence published in literature concerning crowdsourced user feedback in order to consolidate the understanding of this topic and map out areas, which would benefit from future research. This research makes two contributions. First, it contributes to the emerging literature on the role of crowdsourced feedback in RE. Specifically, it consolidates the empirical research published on the sources of crowdsourced user feedback employed in RE and the ways in which this use was beneficial for RE activities. Second, it indicates RE subareas in which much research work was focused and those where research is scant. To researchers, we indicate directions to which our efforts should expand. To practitioners, we present types of techniques for which much evidence indicates that these are safe to use and seem as good candidates for inclusion in the practitioners' toolbox. To educators, we indicate example concepts that they might possibly include in their RE courses in order to make students aware of those RE activities that leverage the available crowdsourced user feedback.
The rest of this paper is structured as follows: Section 2 provides related work. Section 3 presents the scope of our review, the research questions, and the research process employed. Sections 4 and 5 are on the results of this review and our discussion of the results, respectively. Section 6 is on implications. Section 7 is about validity threats. Section 8 draws the conclusions.

| RELATED WORK
In the areas of software engineering (SE) and information systems research, there are 11 literature reviews, that form the related work for our study [3][4][5][6][7][8][9][10][11][12][13] . For more information on those reviews, we refer interested readers to Table 9 in Appendix B. As part of preparing this paper, we compared them regarding their authors' research goals and scope. The 11 reviews took a variety of perspectives, eg, organizational perspective 12 , innovation process perspective 13 , technical design perspective 6 , and software development process perspective 9 . Below, we summarize these related works.
The review of Asghar et al 3 compared the methods and tools deployed for analysing user feedback and extracting app features and sentiments.
Next, the review of Guo et al 4 identified and evaluated a number of data mining techniques used in crowdsourcing environments. However, both studies are not concerned with the actual use of the feedback in the RE activities.
Furthermore, the SLRs of Hosseini et al 7 and Ghezzi et al 13 focused on examining the crowdsourcing phenomenon across multiple disciplines and the various ways in which the notion of crowdsourcing was conceptualized in these disciplines. In addition, Zuchowski et al 12 explored the phenomenon of internal crowdsourcing in organizations (this is when an organization's employees represent the crowd providing feedback that is used to improve the organizational systems).
Among the 11 reviews in Table 9 in Appendix B, two 8,9 explicitly looked at crowdsourcing in software development. Leicht et al 8 investigated how crowdsourcing in software development was framed as a phenomenon in published literature, and on the basis of their analysis, they developed a first theoretical understanding of crowdsourcing. These authors concluded that most research dealt with the development of sociotechnical systems that enable and support crowdsourcing in terms of collection of feedback from the crowd. Next, Mao et al 9 examined the challenges for crowdsourced SE and mapped out which of those challenges were addressed by existing work and which needed more future research efforts. While the review of Mao et al 9 does include a few literature sources devoted to RE, the central focus of these authors is the broad spectrum of SE activities. In turn, their treatment of RE is from a holistic SE perspective and not from the perspective of RE activities (which are in fact a different level of granularity of literature analysis).
We found only two SLRs 5,11 that explicitly investigated empirical evidence on using crowdsourced user feedback for the purpose of requirements evolution. Both examined techniques that are applicable on the user review repositories for apps: Rizk et al 11 examined automated tools for sentiment analysis by means of natural language processing, while Genc-Nayebi and Abran 5 focused on opinion mining techniques for the purpose of requirements evolution. Both studies focus on tools and technical solutions without discussion on (a) the nature of the crowdsourced user feedback, eg, implicit/explicit and (b) the specific ways in which feedback is used in RE activities. The present mapping study addresses these two aspects.

| Definition of key concepts and formulation of the research questions
To clarify the scope of this mapping study, we provide definition and explanation of two key concepts before formulating the research questions.
These two concepts are "crowdsourced user feedback" and "RE activity."

| Crowdsourced user feedback
In crowdsourced RE, the involvement of end users can take a variety of forms. For example, users generate information that becomes freely available for requirements specialists to use for requirements elicitation purposes or participate in distributed problem solving where they find workarounds in an application, which in turn may shape the requirements for a subsequent application release. In the empirical RE literature, quite a few empirical studies report the use of repositories of volunteer user-generated feedback for discovering new and changed requirements and deciding on those features that have to be added, enhanced, or abandoned. 1 For the purpose of this research, we call crowdsourced user feedback the result of various RE tasks that the end users of a software system can perform voluntarily and communicate about to other users or to the software development organization. 21 Meanwhile, crowdsourced user feedback is a type of critical knowledge in crowdsourced RE. From a knowledge management perspective, a distinction is often made between two types of knowledge: implicit and explicit knowledge. 22 Furthermore, in 23 , requirements knowledge is defined to "consist of implicit or explicit information that is created or needed while engineering, managing implementing, or using requirements, and that is useful for answering requirements-related questions in any phase of a software project." Accordingly, this crowdsourced feedback can be explicit or implicit. Since literature seldom gave definitions of explicit and implicit user feedback, we draw on 24 to define explicit and implicit crowdsourced user feedback as below:

Explicit user feedback
If the crowdsourced user feedback that is provided by the crowd after interacting with the software is in visual and readable expressions (eg, text and emoticons), we call it "explicit." A typical example of explicit feedback are the comments of users of apps in the Apple App Store, which are in a text format and in natural language.

Implicit user feedback
If the crowdsourced feedback is in the form of nonverbal format and is obtainable through monitoring application usage and context, then we call it implicit. Examples of implicit feedback are the streams of data generated by an Internet-of-things system that indicates, eg, the intensity of usage or the quality of services provided by this system.
For these definitions, we preferred the work of Jawaheer et al 24 as a reference over other classifications (eg,Claypool et al 25 ) because of its popularity in the crowdsourcing literature, its recency, and its suitability to our research context. This systematic mapping study intends to investigate how crowdsourced user feedback-be it explicit or implicit-is used for various RE purposes, according to published RE literature.

| RE activities
Because we are exclusively concerned with the use of crowdsourced feedback in RE activities, we define the meaning of "RE activity" as well.
We draw from the conceptualization of RE as per the Guide to Software Engineering Body of Knowledge (SWEBOK). 26 Therein, the software requirements knowledge area is concerned with the elicitation, analysis, specification, and validation of software requirements as well as the management of requirements during the whole life cycle of a software product. SWEBOK provides the definitions of these RE activities as follows: • Requirements elicitation (RElic) is concerned with the origins of software requirements and how the software engineers can collect them. 26 This activity aims to identify sources of information about the system and discover the requirements from these sources. 27 • Requirements analysis (RA) is concerned with the process of analysing requirements to (a) detect and resolve conflicts between requirements; (b) discover the bounds of the software and how it must interact with its organizational and operational environment; and (c) elaborate system requirements to derive software requirements. 26 This activity helps developers and concerned stakeholders to not only understand the requirements, their overlaps, and their conflicts, but also reconcile conflicting views and generate a consistent set of requirements. 27 • Requirements specification (RSp) is establishing the basis for agreement between customers and contractors or suppliers on what the software product is to do as well as what it is not expected to do. 26 This activity writes down the requirements in a way that stakeholders and developers can understand 27 and provides standardized expression of software requirements.
• Requirements validation (RV) is concerned with the process of validating requirements to ensure that the software engineer has understood the requirements and to verify that a requirements document conforms to company standards and that it is understandable, consistent, and complete. 26 This activity checks if the requirements are what the stakeholders really need. 27 • Requirements management (RMgt) is controlling the requirements changes that will inevitably arise. 27 This activity supports planning of software requirements, involving communication between the project team members and stakeholders, and adjustment to requirements changes throughout the course of the project. 28 These five RE activities are deemed essential to all types of RE processes, regardless the process model an organization chooses to follow. 26 In this mapping study, we adopt the understanding of the RE activities as in SWEBOK. The scope of our mapping study includes research on the use of crowdsourced user feedback in these RE activities. Research on crowdsourcing platforms and their design are outside the scope of this review.

| Research questions
As already indicated, we want to explore the state-of-the-art of existing research on the use of crowdsourced user feedback for RE purposes. To this end, we set out to answer four research questions ( The answer to RQ1 is needed to investigate the source of explicit and implicit feedback that used for RE purposes. Until now, crowdsourced user feedback used for RE purposes is reported in regard to different types of software systems and comes from various sources. We want to know what sources of user feedback the researchers and practitioners are working on and what types of software prompted the generation of crowdsourced user feedback for RE purposes. Next, RQ2 is motivated by the need to understand those parts of user feedback that matter to requirements specialists for the purpose of RE activities. Crowdsourced user feedback usually provides diverse pieces of information, eg, comments as text and timestamps. In this paper, the information describing aspects of user feedback is defined as the "metadata" of crowdsourced user feedback. Because of the diversity of user feedback, we want to know what metadata of user feedback have been collected and utilized for various RE purposes. Furthermore, RQ3 is expected to shed light into those RE activities that in fact employed the crowdsourced user feedback for achieving a specific goal in RE. Answering this RQ would help us understand how useful the crowdsourced feedback was to organizations in regard to the five essential RE activities. Finally, RQ4 is indicative for the generalizability of the published findings. For example, if it would turn out that the majority of publications come from particular contexts (eg, geographic regions and types of software systems), then our knowledge on the use of crowdsourced feedback in RE would be limited to those contexts.

| Study search
We employed an automatic search method to search studies in two selected digital libraries, ie, Scopus and Web of Science (WoS). We chose these two electronic databases because recent bibliographic research 29,30 indicated Scopus and WoS as the most comprehensive and user-friendly databases. These helped us get a diverse set of publications on the subject of crowdsourced user feedback.
Search strategy is crucial for a mapping study since it affects the quality and completeness of retrieved studies as well as the time cost we need to spend on the selection of primary studies. According to the study topic, the following search query was created by joining keywords with possible synonyms. Plus, we scoped the time period of the related publications from January 2006 to December 2017, since the concept "crowdsourcing" was coined in 2006. 31 (ALL (requirements) AND TITLE (user OR app OR software) AND TITLE (review OR comment OR feedback)) AND (TITLE (requirements) AND TITLE (crowd OR crowdsourced OR crowdsourcing OR data-driven))

| Study selection
To make the study selection results as objective as possible, we defined selection criteria that were employed in the study selection process. The inclusion (IC) and exclusion criteria (EC) listed in Table 1. were used in the three rounds of study selections to decide whether a study should be included or not.
As shown in Figure 1, the titles of those 876 publications that were returned from the two digital libraries for this study search (see Section 3.2), were manually reviewed by the first author. This step resulted in excluding 185 publication titles because these were duplicates. This meant that the first round selection started with 691 papers (see Figure 1). Out of these, 502 papers were excluded. At this point, the first and the second authors started the second round selection by abstract. They reviewed the abstracts of the remaining 189 papers and further excluded 98 papers in. Once this was done, the third round commenced in which the two authors reviewed the full text of the remaining 91 papers independently and checked them for relevance to the four RQs. This third round resulted in the final set of 44 papers, which were included in this mapping study.
These two authors had differences in their recommendations for inclusion or exclusion, and those issues were resolved through discussion. It is worth noting that the majority of the excluded papers were either technological solutions for user feedback collection or applications of feedback in other areas rather than RE.

| Data extraction
To answer the four RQs defined in Section 3.1, we used the data extraction form in Table 2 to extract data items from the 44 primary studies.
Specifically, for RQ1 and RQ3, the data items are derived from the research goals and/or main contributions of selected studies. The data items answering RQ2 are directly identified in the empirical evaluation or experiments reported in the 44 primary studies. Regarding RQ4.1 and RQ4.2, the data items are directly detected in the publication information and authors' affiliations of the selected studies. EC1 The paper addresses the use of user feedback for machinery, not for software or information systems.
IC2 The title and abstract refer to the review topic. EC2 The paper does not address approaches, studies, or platforms for using or processing user feedback but new approaches and tools that are claimed to produce and collect feedbacks with the help of crowds.
IC3 The paper addresses the research questions. EC3 The paper is a research plan or literature review IC4 The paper is published in a peer-reviewed journal, conference or a workshop.

EC4
The paper is about the use of feedback for non-RE purposes, including the improvement of recommending or selecting services, apps, etc.
EC5 User feedback is not used for software requirements, either explicitly or implicitly EC6 The full paper version is not available for download.
Abbreviations: EC, exclusion criteria; IC, inclusion criteria; RE, requirements engineering. In which year was the study published?
RQ4.2 Author's affiliation Which organizations were all the authors of the study working with and which countries were all the authors' affiliations of the study located in?

| STUDY RESULTS
We performed the mappings study according to the steps described in Section 3. In this section, we report on the results of this mapping study to answer each of our RQs defined in Section 3.1.

| Sources of crowdsourced user feedback (RQ1)
This subsection presents the distribution of explicit and implicit user feedback that has been employed in RE, according to the origins of those two types of crowdsourced feedback. On the basis of the definition of explicit and implicit user feedback (in Section 3.1), we first classified the selected studies in Table 3. As it shows, 93.2% of the included studies (41 out of 44 studies) employed explicit crowdsourced user feedback, such as online user reviews of Apps or other software products, in RE activities. Whereas, only three studies used implicit feedback for specified RE purposes. Explicit 41 S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12, S13, S14, S15, S16, S17, S18, S19, S20, Distribution of the selected studies over the platforms of user feedback Furthermore, Figure 2 zooms in and presents the distribution of the included studies over the sources of user feedback. In this mapping study, we focused on the types of software systems that the crowdsourced user feedback reported about and the platforms that the feedback was collected through. In the left part of Figure 2, there are 32 out of the 44 studies using explicit user feedback of Apps, ie, online app reviews. More specifically, Apple App Store and Google Play are the two most popular app repositories in our primary studies: six out of the 41 studies employed user feedback from the Apple App Store, 11 studies used Google Play, and 12 studies employed both of them. Besides, two studies (S1 and S23) adopted user feedback in unspecified Android markets, and only one study (S39) extended the collection of user feedback from Microsoft Store.
The right half of Figure 2 shows the other platforms from which the crowdsourced user feedback is collected for RE purposes. It is observed that websites (eg, Amazon used in S27) and platforms/forums (eg, Steam forum for action games used in S28 and SourceForge.net for Open Source Software used in S21 and S22) were two main sources for researchers and practitioners to crawl explicit user feedback of other types of software systems, with seven and four studies respectively. Next, only one study worked on user feedback from social media (ie, LinkedIn in S40).
Regarding the implicit user feedback, S30 and S44 adopted user behaviors of a certain app and user logs of a certain application software, respectively. While S37 employed service performance, usage, and feedback knowledge for requirements management, it did not specify any details of user feedback.

| Metadata of crowdsourced user feedback (RQ2)
Generally, the collected dataset of crowdsourced user feedback may contain several pieces of information. However, in the set of 44 included studies, we found that not all the pieces of information stored in the database were used for RE purposes. This subsection presents those metadata of user feedback that has been reported in published literature for RE purposes.
In the 41 out of 44 included studies employing explicit user feedback, Table 4 indicates six pieces of metadata that are explored to support RE activities. It shows that all the 41 studies employed the text content analysis at the level of words, phrases, and/or whole sentences. However, there are some particularities: eight out of these 41 studies exploit the length of text (see the second row in record No. 1), and two studies consider the tense of verbs in the content of user feedback (see the third row in record No. 1).
In addition, 23 out of the 41 studies complemented the text content analysis with the rating of the user feedback; seven studies-with the submission date of a user feedback; seven other studies-with version number of the software system that the user feedback reported about; six studies-with the title of the user feedback; and five studies-with total number of user feedback. An interesting observation is that some included studies employed more than one piece of metadata for RE purposes. For example, see row number 4 in Table 4: therein, except S25, the other six studies that account for the version number of the software that the user feedback points to, also account for the "submission date." Another example is that five studies that report total number of user feedback (see row number 6) also indicate the specified software version.
This makes good sense in these six studies' contexts because therein the purpose of analysing crowdsourced user feedback is to check if a new release of software actually implemented the requirements that were collected from the crowd, on the basis of the crowd's experience with the previous release.
Regarding the three studies (S30, S37, and S44) using implicit crowdsourced user feedback, S30 employed behavior records of Apps and user context data (eg, location and motion information), S37 concentrated on service performance, usage, and feedback knowledge, and S44 employed user logs of the specified software.

| Types of the RE activities using user feedback (RQ3)
This subsection presents those RE activities that used crowdsourced user feedback and realized its benefits according to the published literature.
We applied the data extraction strategies defined in Section 3.4 to identify the RE activities mentioned in the included studies. Figure 4 shows the distribution of included studies over RE activities and time period, where the number in the bracket under the name of each RE activity denotes the number of included studies that explicitly reported this type of RE activities. We found that RElic and RA are the two most reported RE activities, covering 38 and 33 out of 44 studies, respectively. Regarding RElic, for example, it was reported as one of the keywords of both S12 and S13, represented as "identification of requirements" in the title of S39, and derived from the aims of S24-"we propose SAFE, a novel uniform approach to extract app features from the single app pages, the single reviews and to match them." Similarly, S14, S15, and S20 reported "requirements classification," one subactivity of RA, in their titles and research objectives. Considering RMgt, it got support from eight out of 44 studies by using  crowdsourced user feedback, whereas, only two studies (S18 and S38) mentioned RSp, and one study (S38) reported RV. Table 5 provides the exemplary details on how the included studies used explicit/implicit feedback in different types of RE activities.
Furthermore, it was observed that 72.7% of the included studies (32 studies) reported more than one typs of RE activities.  We propose SAFE, a novel uniform approach to extract app features from the single app pages, the single reviews and to match them. S30 Implicit user feedback The approach captures behavior records contributed by a crowd of mobile users and automatically mines context-aware user behavior patterns, … based on the mined user behaviors, emergent requirements or requirements changes can be inferred from the mined user behavior patterns … .

RA S20
Explicit user feedback This paper introduces several probabilistic techniques to classify app reviews into four types: bug reports, feature requests, user experiences, and ratings. S37 Implicit user feedback … frequent analysis of the SKU (service knowledge utilization) reports potentially discloses end-user's configuration and severe bugs earlier and may help to reproduce bugs, possibly resulting in shorter overall development cycles.

RSp S18
Explicit user feedback About one third of the feedback includes topics on software requirements and user experience, varying from shortcomings and feature requests to scenarios in which the app was helpful … feedback like feature descriptions or how-tos can be used as starting-point for documentation. S38 Explicit user feedback Documentation: This activity transforms the raw requirements of the experience forum into a form best suited for the software development process.

RV S38
Explicit user feedback Verification/validation: The inspection of form and content is supported by the experience Forum's context information.
RMgt S19 Explicit user feedback We devise an approach, named CRISTAL, for tracing informative crowd reviews onto source code changes, and for monitoring the extent to which developers accommodate crowd requests and follow-up user reactions as reflected in their ratings.

S37
Implicit user feedback Using service performance, usage and feedback knowledge to support the software development and maintenance processes and make software vendors more flexible and responsive to service performance and usage changes. … we show that by using this approach, vendors can make informed decisions with respect to software requirements management and maintenance. We make three observations from Figure 4: (a) RElic and RA are reported to be two popular RE activities where the role of crowdsourced user feedback is extensively explored; (b) other types of RE activities, including RSp, RV, and RMgt, also benefit from crowdsourced user feedback, although with less evidence; and (c) RElic, RA, and RMgt are getting more and more attention in the RE community in the last 3 years.

| Study classification by publication venue and year (RQ4.1)
This subsection shows the sources and the year in which the primary studies have been published.
The 44 included studies were published in three publication types: Conference, Journal, and Workshop. Table 7 shows the distribution of included studies across types. Conference papers are the most preferred publication type with 75.6% (34 studies), and both workshop papers and journal papers are less preferred with 11.4% (five studies).
Furthermore, Figure 5 shows   Figure 6 shows the distribution of included studies by authors' affiliations. We found that 41 out of the 44 selected studies (93%) were authored by academic researchers from universities or research institutes. Two studies (S2 and S34, 5%) came out of industry-university collaboration, and one study (S4, 2%) only was from industry.   International conference on evaluation and assessment in software engineering

Conference 3
International conference on information science and security

Conference 1
International symposium on foundations of software engineering

Conference 2
International conference on computer and information science

Conference 1
International conference on software maintenance and evolution 1 International conference on software analysis, evolution, and reengineering

Conference 1
Workshop on software evolution and Evolvability at international conference on automated software engineering

Workshop 1
International symposium on empirical software engineering and measurement

Conference 1
Annual computer software and applications conference Distribution of the included studies over authors' associations Next, Figure 7 shows the distribution of included studies over the countries of the authors' affiliations. Fifteen countries are reported to be active in crowd-based RE community. In particular, Germany, China, and the United States are the three main contributors with 10 studies, nine studies, and six studies, respectively. Moreover, we observed that five European countries-Germany, Italy, Switzerland, the Netherlands, and Poland-contributed 18 studies (40.9% of 44 studies). Also, we found that eight out of the 44 selected studies came out of collaboration between countries.

| DISCUSSION
This section presents our reflection on the answers to the four RQs.

| The use of explicit and implicit feedback (RQ1)
The results of this mapping study make us think that despite its acknowledged importance, our knowledge on the use of explicit and implicit user feedback in RE is limited. In turn, this calls for more investigation in multiple directions.
First, regarding the use of explicit feedback, we found that more than three quarters of the included studies on explicit user feedback (32 out of 41 studies), did concentrate on app reviews. In fact, we know very little about the use of explicit feedback in the context of other types of software systems. In turn, more research in a variety of contexts is needed in order to gain a more complete understanding of the benefits and the applicability of user feedback in RE.
Furthermore, 29 out of the 32 studies using app reviews were published from 2014 to 2017. This trend matches the rise of app culture worldwide and indicates that the RE researchers are very actively involved in research on the topic.
While reflecting on the use of explicit feedback, we asked us the question of whether a particular platform was more preferred than others, by researchers. Our findings ( Figure 2) would not suggest so. The number of studies based on the Apple App Store, the number of studies based on Google Play, and the number of studies based on both do not vary a lot. This could possibly indicate that there is no preference on either researchers' or practitioners' sides regarding one of these repositories.
Second, our results clearly call for more investigation on employing implicit user feedback in RE. As observed in Section 4.1, only 7% of the included studies (three out of 44 studies) used implicit user feedback. One reason could be that normally, implicit feedback, such as user usage and behaviors, is stored in private databases supported by software vendors. In contrast to explicit user feedback, researchers and/or practitioners cannot get implicit feedback in an easy and convenient manner. Another possible reason for having just around 7% of the included studies on implicit feedback is that researchers working on the use of implicit feedback, publish in other venues that have their own specific terminology and vocabulary and, hence, do not use the key words that the RE community uses. For example, it might well be possible that living labs that generate streams of behavior data are also leveraged for RE purposes. A hint to this is the paper of Coetzee et al, 30 which describes the principles of a living lab approach to the design-level requirement for information platforms in Africa's rural areas. As living labs are a component of many interuniversity projects funded by the European Union research agencies, we assume that it might be likely to expect more RE research based on crowdsourced feedback in living lab settings.

| Attributes of user feedback (RQ2)
Regarding the different attributes of crowdsourced user feedback, the results presented in Section 4.2 identified six pieces of feedback metadata.
Specifically, text content was the metadata reported the most frequently (in 41 out of 44 studies). This is not surprising, because in RE, content analysis has been widely used in the past to process the verbal output of requirements elicitation interviews.

FIGURE 7
Distribution of the selected studies over countries Furthermore, more than 60% of the primary studies employing explicit user feedback (26 out of 41 studies) combined at least two pieces of metadata for various RE purposes. One might assume that the effectiveness of those combinations of metadata might well be contingent on the RE activity that the feedback is meant to support. However, more empirical research is needed to understand how to combine identified metadata of user feedback to support various RE purposes and what combinations would be more effective in what kinds of research contexts, on the basis of various purposes of using the crowdsourced feedback in RE.
Moreover, six out of seven studies using release number of software also combined submission date, and three studies (S5, S19, and S26) using submission date have reported to use both version of software and total number of reviews collected from the specified software. This indicates that there could be some combination patterns of the metadata of user feedback for specific RE activities. To know for sure, more empirical research with companies is needed.

| Activities employing user feedback (RQ3)
As observed in Section 4.3, our mapping study reported that RElic and RA were the two most popular RE activities in which crowdsourced user feedback was employed. We think that the reasons for this observation could be the following. First, it makes a lot of business sense to leverage the crowdsourced feedback for RElic and RA. Second, these two activities are relatively easy to investigate by using the empirical research methods available for feedback analysis.
Unlike RElic and RA, we found that RSp, RV, and RMgt did not get enough attention from researchers. Therefore, we think more exploration is needed on how crowdsourced feedback could support these three RE activities.
Furthermore, nearly 75% of the studies (32 out of 44) reported that the use of feedback was beneficial for more than just one RE activity. For example, S29 reported requirements identification (covered in RElic) and classification (covered in RA) in the title and research objectives; while RElic and RMgt can be derived from the main contribution and aims of S7, stating as "We use natural language processing techniques to identify fin-grained app features in the reviews" and "The extracted features were coherent and relevant to requirements evolution tasks," respectively.
This opens up the question of which other activities could be combined to jointly benefit from the crowdsourced feedback in order to maximize its value in the end-to-end RE process.

| Demographics of the studies (RQ4)
This mapping study clearly indicates that the application of crowdsourced user feedback for RE purposes is a rising research area for scholars. We Despite the rise observed, producing more diverse evidence on how user feedback is useful and tracing the use of feedback to the SWEBOK's fundamental RE activities seems urgent, if we would like to understand completely the possible range of roles that feedback may play in RE. Next, it is disappointing to see that there were only two studies (around 4% of 44 studies) authored by practitioners from industry. This is surprising knowing that app development organizations actively have been using crowdsourced feedback for years. This also signals a question regarding the extent to which the results of the studies are industry-relevant and realistic. We think therefore, that more case study research with companies is necessary to understand what crowdsourced feedback aspects best feed into RE and how this happens in real life.
In addition, the 44 included studies were published in 32 different venues in multiple disciplines and contributed by the authors from 15 countries in Asia, Europe, and North America. This indicates that extensive attention on this research topic is being paid from researchers not only in a broad range of research interests, but also with affiliations spreading in different continents. Furthermore, 22 publication venues out of 32 are classified into the fields of RE and SE, indicating that the current application domains of our research topic are narrow and focused, although the exploration of other disciplines is emerging.

| IMPLICATIONS
This mapping study has some implications for RE researchers, practitioners, and RE educators. knowing the RE is a social process in nature, it is a surprise that no study addresses organizational questions related to how RE takes place in companies that exploit crowdsourced user feedback, be it app development firms. We therefore think that the RE community needs to focus on tackling the use of using crowdsourced user feedback in RE, from a "process perspective": if RE researchers want their research to be industryrelevant, they need to position their studies on exploitation of user feedback against a larger organizational process, namely, the RE process, which is composed of activities having their inputs and outputs and being coordinated. One might assume that RE based on crowdsourced feedback might call for using coordination models different compared with those used in the RE processed described in RE textbooks. To confirm or disconfirm this, more research is required. Third, from the standpoint of generalizability, 33 our current knowledge of crowd-sourced RE is skewed.
This is because the published scientific evidence comes out of academic research that is mostly treated the use of explicit feedback and that originated in European and Asian contexts. More realism of the results could and should be achieved if empirical research efforts focus on studies with industry and in other countries.
Our study has some implications for practicing requirements engineers. First, our results suggest that it seems safe to employ crowdsourced feedback in requirements elicitation and requirements analysis for app development. We therefore think that practitioners should consider the use of crowdsourced user feedback in their RE processes as a viable option for learning. One can imagine that this way companies could quickly assemble improvement ideas for the evolution of their apps independent of the specific life cycle models they use for the app development itself. Second, an interesting question from practitioners' perspective is how to balance the voice of the crowd against the requirements that RE specialists could collect through user focus groups or user surveys. We encourage practitioners to try out these alternative techniques and compare the ways in which the requirements that come out are prioritized later on. Only then, practitioners will know if the crowdsourced user feedback can repeatedly provide the most relevant requirements in a cost-effective way. Third, from a software development organization's perspective, crowdsourced feedback is a resource available for systematic use that practitioners should incorporate into the larger process of RE. Leveraging crowdsourced user feedback organizationally in terms of roles, responsibilities, and processes seems however to be an organization's tacit knowledge. This means, if a practitioner wants to learn RE techniques utilizing crowdsourced user feedback, it might be a good idea to join an app development organization that uses feedback instead of searching for RE how-to textbooks.
Our study has some implications for RE educators. Until now, RE textbooks have no content dedicated to RE processes relying on crowdsourced user feedback. In turn, many teachers in computer science schools do not provide their students with adequate knowledge on the potential value of crowd-based RE techniques. Assuming that companies' adoption of these techniques would grow-especially in the era of Internet-of-Things systems-we as teachers need to help students be prepared for the current market developments and provide them with the skills that match it. This implies creating awareness of the use of explicit and implicit crowdsourced feedback in RE as a viable approach and informing them about its strengths and limitations. At least, our students should know more about using crowdsourced feedback in specific RE activities such as RElic and RA.

| THREATS OF VALIDITY
The results of this mapping study may be influenced by the coverage of study search, bias on study selection and personal judgment in study data extraction. Therefore, according to the guidelines in Shull et al and Wohlin et al,33,34 four types of threats to validity of the review results are discussed below.

| Conclusion validity
For our systematic mapping study to be reproducible by other researchers, we developed a study protocol defining our search strategy and study selection procedure with use of IC and EC. However, different researchers may have different understanding on these criteria, and in turn, might bring different results of their study selection. To reduce researchers' bias, our study protocol was discussed by all authors, which assured a common understanding on study selection. Plus, in the second round study selection, the first author performed a pilot selection and the other two authors joined the discussion to reach a consensus on understanding the selection criteria. After that, the second round selection was conducted by the first two authors. Furthermore, in the third round study selection, the first and the second authors conducted the selection process in parallel and independently, and then harmonized their selection results to mitigate the personal bias in study selection caused by individual judgments.
Second, our data extraction might influence the classification results of the selected studies as it included the researchers' personal judgment. To mitigate this, we used a template (as per Dybå et al 35 ) to describe the data retrieved from the primary studies. The template was an input to our qualitative synthesis.

| Construct validity
In this work, user feedback, crowdsourcing, and requirements are the most important terms under consideration. To ensure that all authors had a shared interpretation of these key terms, we discussed the definitions of these concepts and reached a consensus on their understanding. Moreover, to make sure the high coverage of potentially relevant studies in automatic search, we improved the search terms according to the result of the trial search before the formal search.

| Internal validity
Since the data analysis in this systematic review only uses descriptive statistics, the threats to internal validity are minimal.

| External validity
The results of this mapping study were considered regarding the application of user feedback in crowd-based RE. Therefore, the presented classification of the selected studies and the conclusions drawn are only valid in the review topic. The predefined protocol is helpful to collect representative studies in the given review topic.

| CONCLUSIONS
On the basis of 44 selected publications, this mapping study provided an overview on the types of user feedback that have been employed for crowdsourced RE activities. Our study revealed the following. First, current research mainly concentrated on explicit user feedback of Apps. Our knowledge on the use of feedback in other types of systems is, in turn, incomplete. Plus, little is known about the use of implicit feedback in RE. Second, nine pieces of metadata are identified from the selected studies using explicit user feedback, however, nearly half of these metadata are rarely used. Third, requirements elicitation and requirements analysis are the most investigated RE activity in which crowdsourced feedback was employed.
Fourth, only 7% of the included studies in this review came from industry settings, which indicates lack of industry-university research collaborations that led to published empirical research outcomes. Fifth, the topic of crowd-based RE using user feedback has received the most attention from researchers, most of whom are working at universities spreading in 12 countries, mostly located in Europe and Asia.

APPENDIX B
COMPARISON BETWEEN RELEVANT RESEARCH AND OUR WORK To understand the crowdsourcing research area from a system point of view.
To identify components and functions of a crowdsourcing system that can be conceptualized.
SLR on the basis of the guidelines of Kitchenham,14 No. Technical design of crowdsourcing system is the focus of the study.
Ashgar et al 3 To understand the state-of-the-art techniques of feature extraction in sentiment analysis and opinion mining.
(1) To identify approaches for feature extraction in sentiment analysis and opinion mining. (1) To identify the challenges and opportunities of data mining tasks using crowdsourcing, and summarize the framework of them. (2) To formulate a general framework of crowdsourcing for data mining, which includes question design, mining process and quality control.
Type of study and guidelines are not explicitly stated.
No. Data mining techniques are examined in terms of their own characteristics when used in crowdsourcing environments.
Leicht et al 8 To understand the state-of-the-art in crowdsourcing in software development.
To propose a fundamental framework with dimensions to structure the existing. Insights of crowdsourcing in the context of software development and to derive a research. Agenda to guide further research. Rizk et al 11 To understand the state-of-the-art methods and tools of using sentiment analysis on mobile apps users' reviews in order to extract user requirements for building new applications or enhancing existing ones, ie, requirements evolution.
To identify published studies on sentiment analysis techniques and tools used on mobile apps reviews of users, for the purpose of requirements evolution.
SLR based on the guidelines of Kitchenham. 14 Yes. The authors examine those techniques that explicitly extract user requirements for building new applications or enhancing existing ones; ie, requirements evolution.
Hosseini et al 7 (1) To understand the current status and the degree of maturity of the field and how it is being applied, when, where and by whom. (2) To identify gaps in the literature where additional research has to be conducted, as well as areas which got popularity and emphasis more than others.
(1) To identify the fields of study of the papers which have proposed a definition for crowdsourcing, the types of research conducted in those papers, the forms of study those papers have followed, the venues those papers have been published in, the demographics the researchers of those paper represent, and the commonality SMS based on the guidelines of Petticrew and Roberts 17 No. Crowdsourcing as a phenomenon across multiple fields is examined. To come up with a definition of internal crowdsourcing and a framework for understanding.
SLR based on the guidelines of Kitchenham. 14 No. The so-called "internal crowdsourcing" as an organizational phenomenon is examined.
Mao et al 9 To provide a comprehensive survey of the crowdsourcing in SE.
(1) To provide a comprehensive survey of the current research progress on using crowdsourcing to support SE activities.
(2) To summarize the challenges for crowd-sourced SE and to reveal to what extent these challenges were addressed by existing work.
Type of study and guidelines are not explicitly stated.
No. The authors examined definitions of the term "crowdsourcing" and the application of the crowdsourcing concept to the area of software engineering Genc-Nayebi and Abran 5 To identify proposed solutions for mining online opinions in app store user reviews, challenges and unsolved problems in the domain, any new contributions to software requirements evolution, and future research directions.
To identify mobile app store studies, the challenges faced when mining app store data according to these studies, how these challenges have been overcome, and any unsolved challenges.
SLR based on the guidelines of Kitchenham 14 Yes. The authors examined those techniques that mine opinions for requirements evolution purposes.
Morschheuser 10 To provide a comprehensive overview and outlook of the usage and study of gamification in crowdsourcing systems.
To examine and compare the characteristics of gamified crowdsourcing systems, the effectiveness of gamification in crowdsourcing, and use this information to formulate an agenda for future research.
Mapping study based on the guidelines of Webster and Watson 14 and Boell and Cecez-Kecmanovic 18 .
No. Gamification is examined in crowdsourcing systems.
Ghezzi et al 13 To revisit the definition of crowdsourcing, identify controversies and common patterns, and highlight strengths and weaknesses in current research in the field, as reported in management and social sciences.
Focus on articles with management-type implications.
Takes an innovation-as-aprocess perspective.
SLR based on the guidelines of Davis 19 and Short. 20 No. Crowdsourcing is examined as an organizational phenomenon from the perspective of a management theory lens.