A framework for managing ethics in data science projects

The field of data ethics is concerned with the ethical considerations surrounding data, algorithms, and associated practices, with the aim of identifying ethical solutions. The application of ethical principles to the handling of data, algorithms, and practices can facilitate the identification and delineation of ethical quandaries within the domain of data science. The present study focuses on the topic of data ethics, specifically pertaining to the processes of data collection, data model construction, evaluation, and deployment. This study introduces a comprehensive framework designed to facilitate the management of ethical considerations in data science projects. In order to authenticate the framework, a case study was conducted and our perspectives on its practical implementation were presented. The description of the scope of future research is also provided.

results for business and society, such as risk management, the detection of tax fraud, the prediction of terrorist acts, or in a commercial context, boosting profitability, raising revenues, or saving money.
Data science has enabled citizens to benefit from better, more effective services.However, as with any technology, data science has also had drawbacks, including an increase in privacy invasion, discrimination against vulnerable groups based on data, and data-driven decision-making without justification.][8][9][10] The algorithms that underlie automated systems in data science projects frequently produce unfair results. 11,12This has the potential to have a devastating effect, even on products developed in data science projects with the best of intentions.,12 A data scientist uses preloaded libraries to call on existing datasets.Data science projects require critical decisions from data collection through model development.Ethics in data science is arguably even more important for managers in businesses where data science practices are a key asset. 13The members of a team working on a data science project need to be familiar with the theories, procedures, and stories of ethics in data science because ethical considerations are becoming increasingly relevant.Data scientists and managers are not inherently unethical, but at the same time, they are not trained to think this through either.Some popular instances are Microsoft's racist chatbot, Google Photos' incorrect recognition of a picture with black people as gorillas, 14,15 Apple Card's inability to quickly respond to accusations of discrimination against women, 16 Amazon's apparent discrimination against women, 17 and the Cambridge Analytica debacle involving Facebook data. 18eing ethical has been promoted as a life goal, but there are also significant societal and corporate benefits.The hazards to one's reputation and finances are enormous when it comes to data science ethics.If data scientist do not get the ethical issues correct, they risk having their growth (or possibly the growth of their firm) completely halted and getting into difficulty during investment discussions or due diligence.Financial hazards are easily correlated with reputational risks.Lawsuits and settlements can result in significant financial losses, just as unethical data science can cause emotional and bodily suffering.Data science models may be improved as a result of ethical reasoning, possibly with more accurate forecasts or increased user acceptance.In addition to better data models, ethical behavior may also be a powerful marketing tool, as Apple is increasingly emphasizing the privacy component of its products.Thus, data science ethics can increase business value through higher profits, lower costs, or increased revenue.
Data ethics is the study and promotion of ethical practices with regards to data (including its creation, use, sharing, dissemination, processing, curation, and collection), algorithms (such as robotics, deep learning, machine learning, intelligent agents, and artificial intelligence), and corresponding practices (such as professional codes, hacking, programming, and responsible innovation, for instance, right conduct or right values 6 ;).Thus, data science ethics can be grouped under the ethics of data, algorithms, and practices. 6In this paper, we focus primarily on data ethics (data collection, building a data model, its evaluation, and deployment).
The paper is organized as follows: Previous research on ethical issues in data science initiatives is discussed in Section 2. Our ethics framework for data science projects is described in Section 3. Section 4 presents the case study conducted in this research work to evaluate the framework.Section 5 outlines our observations on the utilization of the framework, followed by a conclusion and scope for future research.

REVIEW OF LITERATURE
We examined the Scopus digital library (www.scopus.com) to find the most recent articles on data ethics and the related problems.The search was performed with the help of the following search string, which we built out of descriptive keywords related to our area of interest:(data science AND (ethics OR data ethics OR ethical concerns) AND ("data science projects" OR "data privacy," OR "data scientists" OR "data models" OR "data gathering")).Our review of the literature followed Kuhrmann et al.'s 19 guidelines.We also used snowballing techniques to supplement the Scopus search, as recommended by these researchers.However, we point out that our objective was not to conduct a comprehensive systematic literature review. 20Instead, we wanted to track out some of the most up-to-date publications on the topics that pique our attention, namely data ethics and its related ethical concerns about data science projects, to highlight the variety of research focuses taken into account in the data science community.To create our framework for the management of ethics in data science projects, all we sought to do was identify the essential elements of data ethics in previously published related research papers.To reach our study goals, we also review findings from relevant work.
It is no surprise that there were several literature reviews devoted to the topic of ethics in computer science.][26] However, not one has focused on data scientists and the ethical challenges they confront in the modern day.8][29] It is uncertain if data science companies have the breadth and depth of expertise to easily deliver this training, but they have been advised to provide guidance and support, as well as interactive ethical exams to help staff examine ethical challenges. 5f data science ethics goes unaddressed, it may introduce new business risks.Because of this, some researchers have remarked that none of the existing codes of conduct adequately address the full spectrum of ethical concerns a data science team may confront. 26,30Therefore, all the existing practices or approaches toward data ethics management are insufficient for data science projects.Furthermore, the need for ethics was reaffirmed by a coordinated team of data scientists who created a code of professional behavior for the area.
More and more data are being created, archived, and made accessible to data scientists so they may analyze patterns in the data and extrapolate information about the future.Making an ethical framework, for instance, was proposed as a means of facilitating accurate terminology when discussing moral dilemmas arising from data science. 30,31Data science teams may benefit from a comprehensive, all-encompassing framework for dealing with data science's ethical issues. 30any people stated that there was no existing code of ethics that fully covered what was needed, 5,10,31 and they also stated that a more general code of ethics would not be useful because it would not be specific enough.
Data access or collection does not imply morally acceptable data use.Additionally, "upstream" ethical concerns exist, like the privacy consequences of how big data are initially collected. 32Feelings, perspectives, and correct data processing make it difficult to know whether consent was given to the data in question. 33In terms of how the data is used, the data scientist must verify the "fitness of purpose."Otherwise, data might be used inappropriately or not for the data provider's intentions. 4It is important to pay attention to how the results of the analysis are presented as well as how the data was analyzed, and the people who design the analytics must fully comprehend and articulate how they will affect the data. 34,35ata cleaning, data modeling, and model deployment were recognized as the three main data-related issues in a data science project about data ethics. 4,36,37ith the recommendations made by Kitchenham & Charters, 38 we developed a data extraction process to find pertinent data from previously published research works (more specifically, 46 related research papers were reviewed) that are relevant to our goal of identifying the essential components for managing ethics in a data science project.As part of the data extraction process, we designed a form to keep track of the 46 publications' ideas, views, contributions, and conclusions.After the data were extracted, we used content analysis 39,40 to examine the main ethical ideas covered in each paper.Additionally, as part of the data extraction, each of these crucial ideas was noted.
By examining the papers through an iterative process that involved item surfacing, refining, and regrouping, we were able to specifically characterize the ethical issues that were raised in the previous research studies.Finally, we conducted an inter-rater study, as suggested by Fleiss et al. 41 to see how well our data extraction and categorization procedures held up to replication.We had two separate coders examine the research papers to determine the inter-rater agreement between the researchers.Eighty-four percent of the coding selections made after training were agreed upon by the coders.To get a final collection of coded data, disagreements were discussed and resolved.
In their research on the ethical issues affecting data science projects, many researchers 1,[3][4][5]34,36,37 ; concurred that the elements such as data privacy, data ownership, defining target variables used in a data model, fair evaluation of built-in data model, and foreseeing the potential consequences of the deployed data model are all set to invite ethical concerns or issues in a data science project. However,no practical answer or paradigm addressing these issues has emerged from the aforementioned studies.In addition, we discovered that little work had been done to organize the ethical actions into a unified procedure represented by a framework.Thus far, Saltz & Dewar 4 have done the only study in the field of data science ethics that has resulted in a published theory.
We combined the specific insights we gained while reading up on ethical considerations and issues in earlier research initiatives involving data science.We categorize them below as S-1 through S-3, in the form of a summary.This study's abridged findings served as the basis for our proposed framework for handling ethics in a data science project.S-1: As part of data collection and data cleaning in a data science project, addressing data privacy and informed consent from the data owners is essential to maintaining data ethics in practice.S-2: Properly defining target variables in a data model and benchmarking the built-in model is required to keep the model fair and transparent.
S-3: At the stage of the deployment of the data model itself, foreseeing potential consequences will curtail ethical concerns.

DATA ETHICS MANAGEMENT FRAMEWORK
Armed with our understanding of data science projects and their adjoining ethical concerns and issues and considering the observations made by previous researchers on data ethics (summarized in Section 2), we draw the elements for our proposed framework for managing ethics in a data science project.Figure 1 shows our proposed framework.The information presented in Figure 1 allows one to observe that there are three primary stages involved in most data science projects, namely data cleansing, data modeling, and model deployment.The data collection process is the first and foremost step for data scientists.All else being equal, the quality of your data will determine the accuracy and reliability of your analyses.Data cleaning is an essential first step in fostering an organization-wide culture that values fact-based decision-making.All else being equal, the quality of your data will determine the accuracy and reliability of your analyses.
Companies are gathering more data as production costs have decreased.However, acquiring more information from many sources can only increase the volume and give the appearance of more evidence.In fact, it will support systemic error rather than add to the narrative.Data may become less trustworthy if inaccurate data becomes more prevalent and practical.Data scientists should investigate any potential issues with the raw data.
The process of removing erroneous, damaged, incorrectly formatted, duplicate, or inadequate data from a dataset and replacing it with new, accurate data are known as "data cleaning."When combining many sources of data, there is a high probability of making mistakes in the form of data duplication or labeling.Results and algorithms are unreliable when they are based on inaccurate data, even though the results and algorithms seem to be right.The specific procedures in the data cleaning process cannot be prescribed in a single, universal fashion because they differ from dataset to dataset.However, it is crucial to create a template for your data cleaning approach so that you can be certain it is followed properly each time.From the perspective of data ethics, it is inferred from the framework that data privacy and informed consent are two integral parts of datasets used by data scientists for analytics.
By "informed consent," we mean that the human subject must be made aware of the experiment, must permit it to proceed, and must be given the option to revoke their consent at any time by notifying the data scientists.The consent must be given voluntarily, which means that it does not have to be forced so that data analytics can be done without any F I G U R E 1 A framework for data ethics management in data science projects.ethical trouble.Any data science project that involves people should, ideally, take into account what the Institutional Review Board (IRB) says.The IRB is made up of different people, some of whom are not scientists.It approves research on human subjects, weighs the risks to the subjects against the benefits to science, and handles situations where informed consent cannot be given.Concerning voluntary consent, also called "voluntary disclosure," the people whose information is being shared should be told that anything they share voluntarily with others is much less safe than information they keep to themselves.
When discussing data science ethics, data privacy is likely to come up first.In the present digital age, the right to privacy has assumed significant importance.Data scientists frequently believe that open data is freely available for copying, which is a common fallacy.It is common for businesses and startups to have ideas about how to use an already-existing public dataset, but one should take care when obtaining such data.The rights and policies of the database are two factors that control data privacy.
As a consequence of database rights, it is illegal to replicate a database without the permission of the owner.This is because database creation requires a significant financial investment, and database rights recognize this fact.A "collection of independent works, data, or other resources that are arranged in a systematic or methodical form and are individually accessible by electronic or other means" is a database under European law.When a database was created with a lot of money, its entries became copyright protected.Searching the public database is allowed, but copying large chunks is not.
After data cleaning is building a data model carrying well-defined target variables.Analytics models abstract the real world.This abstraction purposefully distances analytics results from reality to aid in higher-level decision making.The gap between theory and reality, however, might be unnecessarily widened by unintended omissions or inadequate models.Here, management expertise, global knowledge, and analytical analysis can all come together to support decisions that are better than either one could come up with on its own.The variables on the list that were used in the data models should be reviewed and redefined appropriately by data scientists.
The aspect of a dataset about which you want to learn more is referred to as the "Target Variable" of a dataset in a data model for a data science project.By analyzing your existing data, supervised machine learning may help you find correlations between your goal and other variables.The data scientists are expected to benchmark the model with other similar related data models to verify and validate any trade-offs between the input variables and the target variables.This will be followed by evaluating the model.Data science modeling can incorporate privacy in many ways.Imagine that you are a data scientist who has been tasked with creating a range of prediction models using data sets acquired from various data suppliers.Second, data scientists must prevent sensitive variables from being predicted based on datasets.Political preference might not be mentioned clearly in the dataset but could still be anticipated.
Particular ethical preferences may be incorporated into the model during the modeling step.The fundamental justification for this is that the data do not accurately reflect the desired results.This could be useful when dealing with uncommon situations or when positive discrimination toward specific groups is desired.A data science-driven prediction model may at some point, have to choose between running over an object (let us say A) and another object (say B).On the preference, an ethical discussion should be held.However, because of their low frequency, these occurrences will hardly ever (if ever) appear in the data.
It has been said that debugging is harder than programming.Data scientists may not have the requisite intelligence to properly evaluate their models if they are programmed beyond their expertise.The process of benchmarking involves comparing the inputs and outputs of one model to projections based on different sets of internal or external data or models.Both the model-building process and ongoing monitoring can incorporate it.Testing and maximizing a statistic that does not address the business issue is a common error in benchmarking.A false negative, for instance, could be far more expensive than a false positive in areas like fraud detection and medical diagnosis.One of the sneakiest methods by which a benchmark can produce false results is dataset leakage.If more sophisticated models, like deep neural networks, random forests, and gradient-boosted machines are trained on a dataset with leakage, they may outperform simpler models on holdout sets.This is true even when the production data does not have more forecasting power.
In general, evaluating data science models is a challenging process.It includes deciding what will be monitored, interpreting the results, and producing reports using data analytics.Data science projects must be carried out ethically and by all industry norms.Data scientists must consider fairness and transparency while assessing a data model. 2During the review stage, the data science model is assessed according to the aforementioned fairness standards, including privacy and discrimination against vulnerable groups.Individual characteristics like ethnicity, age, and marital status should be made available to the data scientists to allow them to assess how fair the model is and then applied to sensitive groups.
Transparency plays many significant roles in the evaluation of the approach.The first is about properly evaluating models.It is crucial enough to have ethical implications.Different performance measures may influence different model decisions.Think about estimating the share price of a private company as an example.If the prediction is made using the firm's share prices for a given week in the preceding month and if the price climbed during this time, then it stands to reason that the prediction will reflect a faster rate of share price growth in the future.However, if the data model takes into account the dataset on a weekly, monthly, quarterly, half-yearly, and yearly basis, the model should anticipate different share values.
Data model deployment follows evaluation.Every company and person must balance ethical concerns with the utility of data.These weights determine ethics, equilibrium, and best practices.Data scientists must consider the pros and cons (potential negative consequences) of their data model to avoid ethical issues in the future.When the data science project is deployed, who has access to the system must be considered.Access to the system may be restricted to particular people or places for many reasons.There are certain special explanations for the system's restricted access.The first is that sensitive and private data must be restricted by companies.A logging system that records each data access is crucial, in addition to the obvious confidentiality, integrity, and access control measures.For instance, if a banker checks a famous person's payment history, they should be held accountable and prepared to justify the action.Build in this logging functionality whenever you anticipate having access to such sensitive data, and inform your staff of its presence and any potential drawbacks if the data model is viewed without authorization.
Certain substantial, potentially unethical uses exist for some data science tools.Hence, access must be tightly restricted.Companies and governments embracing data science now have a new kind of power because they have extensive discretion over who has access to the system.Again, being open and honest about the decision-making process, including the reasons for decisions and ethical considerations, is critical.Managers can be tricked despite their best efforts.Defense-in-depth or considering the ramifications of the collapse of one line of defense is a helpful concept.
An additional degree of security for managerial decisions based on data would be to acknowledge that the data itself still has the potential to be deceptive.If the data are insufficient or unreliable, what are the potential positive and negative consequences in terms of costs, time, prediction accuracy, and other factors?Setting out a plan to lessen excessive, undesirable, or unfavorable effects will unquestionably prevent the data science initiatives from suffering a great loss.It is important to consider these issues carefully because dealing with the consequences of data dishonesty and recovering from it could take a long time and be difficult.In the next section, we discuss the preliminary evaluation of the proposed framework, followed by our observations and lessons learned from it.

PRELIMINARY EVALUATION OF THE PROPOSED FRAMEWORK
Our framework (Figure 1) for data ethics management in data science projects emerged from the extensive review of literature on data ethics management in data science projects discussed in previous related research works.We now intend to evaluate this framework with the help of a case study.Hence, a case study was conducted in a mid-size IT services company (pseudonym: DS-Tech) during 2020-2021.This company is headquartered in India and owns branches in the United States and Singapore.DS-Tech is managing several data science projects for their client organizations in India.The case study methodology was adopted from Yin. 42 The case study's primary purpose was to provide data scientists with a first look at how well the suggested framework for ethical data management in data science projects stands up to actual use in the field.Table 1 describes the case study company.A total of 27 data science project team members (Data scientists 12, IT/Business analysts 3, Team Lead 2, Data engineer 4, Programmers 6) from "DS-Tech" participated in this evaluation process conducted as part of the case study.As suggested by the case study research methodologist Yin, 42 the following procedures were used in this case study (1-5): Step 1: Prepare a draft questionnaire (Evaluation Questions-EQ) to interview 43  Our case study respondents were chosen based on the following criteria: The participants were involved in the data science projects in the case study company for at least 2-3 years in the recent past and were familiar with database management, software development, and data science model development and deployment.A total of 27 participants were selected from the case study company, "DS-Tech."The first author works at a top southern Indian technological institute, where one of its alumni founded this company.This allowed the authors to easily interface with this company for data collection and analysis to evaluate the proposed framework.
Focused semi-structured interviews, developed by King & Horrock, 43 were employed to obtain our data.By "focused," we mean that the case study participants' responses to our evaluation questionnaire regarding the applicability of our proposed framework were the main focus of our interview with them and our interactions with them.The interviews were considered "semi-structured" if they consisted of predetermined questions with which the participants were questioned afterward.The interviewer, however, will have the leeway to improvise additional questions designed to elicit much more specific responses from the interviewee.
The data collection process included the following.As a first step, the proposed framework for data ethics management in data science projects was shared with the chosen 27 case study participants.The working principle of this framework was explained to them.They were then asked to apply the framework to their upcoming data science project to ascertain its suitability in practice.They were given 7 weeks of time to carry out this exercise.Then, the same sets of participants were met with one by one for an interview as well as informal interaction.At this point, the assessment questionnaire (Appendix A) was given to each of them and their responses were gathered using a Likert Scale rating, 44 with the respondents choosing a number between 1 ("strongly disagree") and 5 ("strongly agree") to assess whether the framework manages the team member's ethical concerns in practice.
The total duration of this interview process, including informal interactions was 9 h, and it happened over 2 days with DS-Tech.To determine how the suggested framework was viewed by the members of the data science project who utilized it in the field, the responses (scores) of the case study participants have been recorded in an MS Excel sheet and examined graphically.The descriptive statistics of these scores are displayed in Figure 2.

DISCUSSION
This section covers the findings from our initial assessment of the framework for managing data ethics in data science projects.According to our case study participants' perceptions as a whole, the framework is appropriate for use in a data science project.They concur that the framework makes it easier for the data science project team to address ethical issues while dealing with data pre-processing, data cleaning preparation, data modeling, evaluation, and deployment.The attributes of this framework appear to have met the data science company's expectations generally, which is encouraging for the framework's potential application in their future data science projects.From Figure 2, we infer that the mean score, the lowest score, and the highest score provided by the case study participants for the evaluation questions (EQ1-EQ10) varies between 3 and 5 for the framework on data ethics management in data science projects.The evaluation questions "EQ9" and "EQ10" obtained a mean score of 3.22 and 3.478, respectively.For other EQs, the mean score value is considerably higher.These questions deal with the suitability of the framework to respond to data ethics equilibrium and address ethical concerns about complex data models.The evaluation questions "EQ1, EQ2, EQ5, EQ6, and EQ8" obtained the lowest score of "3" uniformly, while for other EQs, it is 4. Similarly, the evaluation questions "EQ2, EQ3, EQ4, EQ7, and EQ8" have uniformly secured the highest score of "5" while for other EQs, it is "4."As a whole, the scores provided by our case study participants through the Likert Scale appear to favor the use of our proposed framework in practice in data science projects for managing ethical concerns consciously.Participants in the case study agreed that for data science companies to build a strong data analytics product, they must strike a balance between ethical considerations and data utilization.However, they felt that the process of comparing our data model to models that have already been developed did not need to be given more weight.Data science and related technologies are developing quickly, therefore this suggestion may be reasonable.Rapid change may also affect benchmarking data models.As a result, rather than comparing apples to apples, this could occasionally result in comparing apples to limes.
The conclusions of prior studies 4,6 that addressing the ethical issues surrounding data collection, modeling, evaluation, and deployment is crucial for a data science project are supported by the evaluation results of our framework.Participants in the case study have also noted that the framework has distinct parts to identify and address ethical concerns regarding the secure collection of data from numerous sources, building a model to define target variables, comparing it to other models already in use, and finally assessing it for its impact from a behavioral science perspective.It is interesting to note that this result concurs with earlier studies. 4,36,37he primary objective was to establish a preliminary assessment of the suggested framework for overseeing ethical considerations in data science initiatives, as outlined by Wieringa.As per the findings of research methodologists, conducting an initial evaluation study is the primary measure toward gradually expanding a novel approach to practical circumstances.The objective of this study is to exhibit the practical implementation of the approach in a real-life scenario.The purpose is to enable both researchers and practitioners to gain insights from the experience and compile a set of preferred attributes for the approach.These attributes can be taken into account while improving, augmenting, or refining the approach.
The growing dependence on data acquisition, examination, and application across diverse fields, including technology, commerce, healthcare, and governance, has given rise to the necessity for a framework for data ethics.The increasing prevalence of data collection and utilization has led to a heightened awareness of the criticality of ethical considerations in promoting responsible and advantageous data usage.Data ethics frameworks serve to protect the privacy rights of individuals by establishing suitable practices for data collection, storage, and sharing.Using a standard data ethics framework, the data science project team should strive to establish a set of guidelines pertaining to the acquisition of informed consent, anonymization techniques, and measures to prevent unauthorized access or misuse of data.If not appropriately designed and monitored, data-driven systems possess the capability to sustain biases or engage in discriminatory practices against specific individuals or groups.
The implementation of an ethical framework can effectively tackle concerns pertaining to algorithmic partiality, impartiality, and lucidity, thereby fostering just treatment and curbing prejudiced consequences.The establishment of a comprehensive data ethics framework plays a pivotal role in fostering trust among entities such as industries, individuals, and the society at large.When data collection practices are transparent and adhere to ethical principles, it is more probable that stakeholders will exhibit trust in the collection, utilization, and dissemination of their data.The implementation of a data ethics framework prompts organizations to contemplate the wider societal implications of their data-related activities, guaranteeing that they are consistent with social norms and contribute to the collective welfare.Furthermore, it lays down frameworks for ensuring accountability and promoting responsible conduct in activities related to data.
Data ethics frameworks aim to tackle the security and governance dimensions of data management.It provides optimal methodologies for safeguarding data against breaches, unpermitted entry, and cyber hazards.As guided by the framework, explicit guidelines should be outlined to ensure responsible data stewardship, data sharing, and data retention in order to prevent any potential misuse or abuse of sensitive information.The implementation of data ethics frameworks can effectively tackle the obstacles associated with conducting operations in a globalized context, wherein data is transmitted across international boundaries.
Beyond demonstrating the framework's suitability for data ethics management in a real-world setting for a data science project, the purpose of the assessment research was to collect first-hand accounts and gain insight into the type of ethical considerations that should be taken into account by the data science project team.We have divided our learnings into three groups, as listed below, by Wieringa & Daneva 45 : (1) learning on understanding the data ethics challenges faced by the data science project team; (2) learning about the efforts required to apply our framework by the data science project team; and (3) learning about the case study company's practitioners' framework usage experience in our preliminary study.The following is a summary of what we learned.
Participants discussed how even an experienced team working on a data science project could not afford to spend additional time conforming to all of the rules and regulations or norms established by various regulatory authorities in various countries for the purpose of data protection and privacy.Participants highlighted that this was due to the fact that such compliance would need a significant amount of time.When attempting to address the ethical considerations that are present in a typical data science project, the participants of the case study company ran into this roadblock as the first impediment.In most cases, this is the result of a combination of two factors: on the one hand, there is a dearth of educated labor or human resources, and on the other hand, there is ignorance.Because they are dedicated to providing their clients with the data analytics solution they desire, several of the case study participants in our research claimed that they found it very difficult to find a balance between ethical concerns and the usage of data.This was one of the reasons why our study was conducted.The team working on the data science project has to have a simplified framework to help and control their process flow and workflow if they are going to be able to successfully tackle such practical issues.
Second, we want to point out that it is difficult for us to evaluate our framework in relation to other techniques in terms of the amount of time and effort (in terms of both human resources and time) that would be required to use the framework for data ethics management.This is due to the fact that past studies in the management of data ethics did not present any frameworks of this kind, despite the fact that they emphasized the significance of such a framework for teams working on data science projects.Third, the findings of our analysis showed that the proposed framework demanded less time and effort from the practitioners in order to be utilized.Several of the participants made explicit reference to this observation while we were conducting the evaluation process as part of the case study.On the other hand, they suggested that the members of the team working on the data science project should participate in frequent training in order to keep up with the laws, rules, regulations, and standards that regulatory organizations in different countries have enacted for the protection of data and privacy, as well as for data analysis and reports.

CONCLUSIONS AND FUTURE RESEARCH
Data science has had an extremely positive impact on people's lives and businesses, and it is quickly replacing some traditional business practices at but a small number of businesses.However, there is a chance for unintended, expensive, and severe negative consequences.As many cautionary tales have demonstrated, anyone working on the cutting edge of data science technologies will inevitably run into ethical issues.Data science ethics takes time, effort, and training.Open discussions and understanding potential issues, ideas, and methods are crucial.Because data science and data science ethics research are expanding, maintaining best practices will require time and resources.It is, therefore, likely that a prerequisite is senior management's willingness to support data science ethics recognize its significance.
The prior studies that discuss the requirement for ethical considerations in data science initiatives serve as the foundation for the framework for managing data ethics in this study.We created a preliminary suggestion for a framework for managing ethics in a data science project following a thorough examination of the literature on data ethics.The framework's primary areas of attention are data cleansing, data modeling, and their evaluation and deployment.A case study with 27 participants from a data science project team at a mid-sized IT services company analyzed the framework's usability and applicability.They offered input on our evaluation questions through a questionnaire and in informal chats.Our suggested framework may be viewed as a theory for outlining the crucial elements to resolve ethical issues in a typical data science project.A systematic framework like the one suggested in this study should be adopted by data science project teams as a requirement from the perspective of practitioners.We recognize that they were unable to adopt a strictly formal strategy, but they do need to begin considering the absolute minimum in terms of processes and project documentation.
Some implications for future research are also provided by this study.Additional case studies with IT organizations of varying sizes with different domains (for instance, banking, healthcare, and insurance), as well as other support team members, are needed to better understand the ethical data management activities of their projects and the suggested framework.This provides a potential study avenue.Future research shall also focus on risk and impact assessments for biased data, discriminatory outcomes, and privacy breaches.
EQ9.The framework facilitates the data science project team in balancing ethical concerns with the utility of data.
(Model deployment) EQ10.The framework covers the important aspects of data ethics required for a complex model in a data science project.(Model deployment)

2
the case study participants and get their responses on our framework.Step 2: Test the questionnaire with two to three interviewees.Step 3: Use the final questionnaire (Appendix A) to interview the chosen participants.Step 4: Assess the practicality of our framework.This questionnaire should answer the framework's evaluation questions.Step 5: Engage with case study participants to learn more about the framework.TA B L E 1 Profile of case study company.Evaluation score on the framework for data ethics management.