Walking forward or on hold: Could the ChatGPT be applied for seeking health information in neurosurgical settings?

Abstract Self‐management is important for patients suffering from cerebrovascular events after neurosurgical procedures. An increasing number of artificial intelligence (AI)‐assisted tools have been used in postoperative health management. ChatGPT is a new trend dialog‐based chatbot that could be used as a supplemental tool for seeking health information. Responses from ChatGPT version 3.5 and 4.0 toward 13 questions raised by experienced neurosurgeons were evaluated in this exploratory study for their consistency and appropriateness blindly by the other three neurosurgeons. The readability of response text was investigated quantitively by word count and the Gunning Fog and Flesch–Kincaid indices. Results showed that the chatbot could provide relatively stable output between the two versions on consistency and appropriateness (χ² = 0.348). As for readability, there was a higher demand for readers to comprehend the output text in the 4.0 version (more counts of words; lower Flesch–Kincaid reading ease score; and higher Flesch–Kincaid grade level). In general, the capacity of ChatGPT to deliver effective health information is still under debate.


| INTRODUCTION
Effective management of chronic illnesses (e.g., hypertension, stroke, etc.) requires patients' involvement in their healthcare beyond limited visits with health providers.In such a case, search engines, social media, and web-based tools aided by artificial intelligence (AI) are becoming the main resources where people seek and share health information. 1,2ChatGPT, trained with a method of Rein-forcement Learning from Human Feedback (RLHF), is a new trend dialog-based AI language model, 3 in which a direct response to any complex queries from users could be generated within a short time.Therefore, this chatbot may have the potential to be used as a supplement tool in the management of chronic cases.The study is aimed at delivering a preliminary investigation, with a focus on cerebrovascular diseases in neurosurgical settings, to determine the utility of the ChatGPT for seeking health information.
According to clinical experience, 13 questions (Table 1) about cerebrovascular events in neurosurgical settings that the general population may be concerned about were listed and input into the online ChatGPT 3.5 and 4.0 interface (https://chat.openai.com/),three times for each at a different time, respectively.All responses were recorded and reviewed by two independent reviewers (S.-Y.Y., Y.-F.L.) blindly for consistency of its key information (defined as the information that directly answers the questions) and the Supporting Information details (defined as the information of responses that further elucidate the key information to let readers understand the answers better).The definition of consistency was that, in response to the same questions, key information and Supporting Information details, respectively, were not mutually exclusive from each other.A set of responses was thought to be consistent only if both neurosurgeons agreed.Then, other three additional experienced neurosurgeons were assigned all sets of responses (L.-L.X., X. H., R. G.) to grade them as appropriate (defined as all three responses were graded as appropriate) or inappropriate (defined as any of three responses was graded as inappropriate) blindly to judge if there was any information that may mislead readers.The Gunning Fog and Flesch-Kincaid indices (https://www.webfx.com/tools/read-able/)were used to evaluate the readability of the responses 4 (Figure 1).The χ 2 test for consistency and appropriateness analysis and paired sample t test (p < 0.05, two-tailed) for readability analysis were performed by the SPSS software (version 26, IBM Corp.) to compare the difference of quantitative measures between the ChatGPT 3.5 and 4.0.

| DISCUSSION
Different from other attempts, 5,6 this study reveals that the capacity of this new trend AI interactive model to deliver effective health information should be still under debate.Although most of the responses contained cautious information and the key information of every set of three responses was consistent and appropriate for the general population, The workflow of accessing and evaluating responses from ChatGPT.
there is still a certain risk of misleading readers due to the ways of expression and inaccurate Supporting Information details.In most cases, the Supporting Information details of the same question were different (10/13 [77.0%] vs. 11/13 [84.6%]).In this way, the output patients received in ChatGPT may not be consistent.Unlike traditional search engines, which may struggle with ambiguous medical terminology and inundate users with redundant information, ChatGPT can be harnessed effectively to enhance the long-term management of cerebrovascular diseases and other chronic conditions.It serves as a valuable resource for individuals seeking basic medical advice, particularly in noncritical situations such as compliance and rehabilitation.Besides, the chatbot has the potential to improve the doctor-patient relationship and patient education in neurosurgical clinical practice by decreasing communication costs with this complementary health information resource.But at the same time, there are raising concerns about the reliability, effectiveness, abuse, and so on, problems of the chatbot, 7 and the model is not designed for medical purposes only.Therefore, further investigation and supervision should be brought to the forefront among the developers, the healthrelated professions, and the users.
The iteration speed of AI technology tools is fast, an updated version of ChatGPT may be a way to remove the concerns and provide a more accurate response regarding the questions input into the chatbot.Indeed, results showed that the new version had a higher rate of accuracy, which is following its development trajectories. 8Nevertheless, the output of the context by the upgraded chatbot had a higher demand for readers to comprehend the messages they receive (more counts of words, p < 0.0001; lower Flesch-Kincaid reading ease score, p < 0.001; and higher Flesch-Kincaid grade level, p = 0.036), which may undermine its advantages and confuse the readers with complex texts.The reasons could be attributed to the new version chatbot trained with updated information as well as algorithms.Nevertheless, the users could receive a more concise response by asking the chatbot repeatedly.
This study has some limitations.The chatbot was trained using RLHF to reduce harmful outputs, but the responses still lack credible sources and evidence.The reliability and consistency of answers outside the questions covered in this study cannot be guaranteed, and it may provide misleading information for specific populations.The hysteretic nature of data used for training the AI model also prevents it from offering the latest medical information to users. 3Besides, this exploratory study assessed the AI model mainly from consistency, appropriateness, and readability; however, a more comprehensive and technical way should be adopted, to conduct the investigation within a large-scale user population instead of cerebrovascular-related professionals with subjective preference and potential bias.Further studies are needed to compare the performance of this AI model with other AI tools and conventional medical advice sources in a quantitative manner.

AUTHOR CONTRIBUTIONS
Rui Tian had full access to all data in the study and took responsibility for the integrity of the data and the accuracy of the data analysis.Si-Yu Yan designed the study, acquired data, reviewed for consistency of data, analyzed data, and drafted the manuscript.Yi-Fan Liu designed the study, acquired data, reviewed for consistency of data, analyzed data, and revised the manuscript.Ling-Long Xiao reviewed for appropriateness of data.Xin Hu reviewed for appropriateness of data.Rui Guo reviewed for appropriateness of data.Lu Ma supervised the study, and revised the manuscript.Chao You supervised the study.Rui Tian designed the study, supervised the study, and revised the manuscript.
Evaluation of responses to 13 questions about cerebrovascular disease from the ChatGPT 3.5 and 4.0.
T A B L E 1 a ChatGPT version 3.5.bChatGPT version 4.0.cExampleresponse: "An intracranial aneurysm, also known as cerebral or brain aneurysm, is a bulging or ballooning in a weakened area of an artery in the brain.This weakened area of the artery is usually caused by a congenital defect or damage from injury or disease.Intracranial aneurysms are often asymptomatic and are discovered incidentally during imaging tests for other conditions.However, in some cases, they can cause symptoms such as headache, nausea, vomiting, neck pain, blurred or double vision, or even rupture, which can cause a life-threatening condition called subarachnoid hemorrhage.Risk factors for intracranial aneurysm include smoking, high blood pressure, family history, and certain medical conditions such as polycystic kidney disease.Treatment options depend on the size, location, and severity of the aneurysm and may include observation, medications, or surgery."*p < 0.05; **p < 0.001; ***p < 0.0001.