Embracing the future—is artificial intelligence already better? A comparative study of artificial intelligence performance in diagnostic accuracy and decision‐making

Abstract Background and purpose The integration of artificial intelligence (AI) in healthcare has the potential to revolutionize patient care and clinical decision‐making. This study aimed to explore the reliability of large language models in neurology by comparing the performance of an AI chatbot with neurologists in diagnostic accuracy and decision‐making. Methods A cross‐sectional observational study was conducted. A pool of clinical cases from the American Academy of Neurology's Question of the Day application was used as the basis for the study. The AI chatbot used was ChatGPT, based on GPT‐3.5. The results were then compared to neurology peers who also answered the questions—a mean of 1500 neurologists/neurology residents. Results The study included 188 questions across 22 different categories. The AI chatbot demonstrated a mean success rate of 71.3% in providing correct answers, with varying levels of proficiency across different neurology categories. Compared to neurology peers, the AI chatbot performed at a similar level, with a mean success rate of 69.2% amongst peers. Additionally, the AI chatbot achieved a correct diagnosis in 85.0% of cases and it provided an adequate justification for its correct responses in 96.1%. Conclusions The study highlights the potential of AI, particularly large language models, in assisting with clinical reasoning and decision‐making in neurology and emphasizes the importance of AI as a complementary tool to human expertise. Future advancements and refinements are needed to enhance the AI chatbot's performance and broaden its application across various medical specialties.

intelligence (AI) as 'the science and engineering of making intelligent machines' [3,4].
Artificial intelligence (AI) began as a simple series of 'if, then' rules and has advanced over several decades to include more complex algorithms that perform similarly to the human brain [5].By harnessing the power of machine learning and data analytics, these algorithms can now analyse vast amounts of medical data, uncover patterns and generate valuable insights.AI has emerged as a promising technology with vast potential to transform the healthcare industry [6].Disease diagnosis and treatment has been a focus of AI since at least the 1970s, when MYCIN was developed at Stanford University for identification of blood-borne bacterial infections [7].
Although these systems showed promise for accurately diagnosing and treating diseases they were not substantially better than human diagnosticians, and they were poorly integrated with clinician workflows and medical record systems [7][8][9][10][11][12].
In the context of natural language processing (NLP), IBM created an open-domain question-answering system in 2007, named Watson, that used various searches to analyse data over unstructured content to generate probable answers [13].These developments led to IBM Watson's victory on Jeopardy in 2011, still being a landmark for the AI field.In 2015, a team of researchers from Stanford University developed the transformer architecture; in 2018, OpenAI released GPT-2, a large language model (LLM)-machine learning algorithms trained on vast amounts of text-that could generate realistic and coherent text, followed by GPT-3 in 2020.More recently, a vast amount of LLMs has been created with different numbers of parameters, such as GPT-4, PaLM 2, Claude v1 and Cohere.All these advancements led to superior performance in language tasks, bridging the gap between human− computer interactions and making AI systems more adaptable and integrated into real-world applications.The result has been a surge in AI capabilities, from chatbots and voice assistants to advanced text analytics and generation [14].ChatGPT is amongst the most widely used AI chatbot.It makes use of the Generative Pre-trained Transformers (GPTs) autoregressive transformer model that uses 175 billion parameters [14].This generation of LLMs has demonstrated exceptional performance in a wide range of NLP tasks, including language translation, summarization and question-answering.The transformer-based architecture is one of the latest breakthroughs in NLP and introduced an attention mechanism that allows contextual relationships between words/tokens in a sentence to be captured [15].This attention mechanism enables transformers to consider the entire input sequence at once, addressing the limitations of previous models that processed text sequentially.The training procedure is composed of two main steps: (1) pre-train using a large set of corpora such as Wikipedia, scientific journals and news articles, where the main task is to predict the next word in a sentence; (2) smaller and human-made examples are demonstrated as correct behaviour based on reinforcement learning using human feedback making use of proximal policy optimization as the learning method [15].By drawing information from a patient's electronic medical record and other electronic resources, this technology could be applied to provide evidence-based medicine responses.As such, it opened new possibilities in evidence-based clinical decision-making [6,13,16].In 2017, Bakkar et al. [17] used IBM Watson to successfully identify new RNA-binding proteins that were altered in amyotrophic lateral sclerosis.
Given this momentum, along with improved computer hardware and software programs, digitalized medicine became more readily available and AI applied to medicine started to grow rapidly.In June 2023, an article published by Bitkina et al. [18] provided an overview of the current state of artificial intelligence in the field of medical technologies, reviewing 89 papers.Eighty-six of these were published after 2011.The main area of AI application in medicine was oncology (55%), and in sixth place was neurology with 3% published papers on this topic, mainly focusing on neurodegenerative diseases.
Since its launch in November 2022, ChatGPT has been widely used by individuals for recreational and professional purposes.Many studies have suggested its potential application in the medical field, including in patient diagnosis and medical education, but major limitations and ethical considerations still exist [19].
In 2020, the American Academy of Neurology (AAN) released the Question of the Day application [20] as an educational tool for neu- diagnosis (this is an implicit step, but no answer is required); (2) correctly answer the multiple choice question (i.e., the most appropriate treatment).Only one choice is correct for each multiple-choice question.A recent article has estimated that, in 2022, the total number of users was 11,000 per year, with average daily users of 1500 from 123 countries [21].

Aim
The aim was to explore the reliability of LLM-based AI software in healthcare clinical reasoning, diagnostics and decision-making.

Study design
This was an exploratory cross-sectional observational study.A pool of questions in the form of clinical cases with multiple choice questions from the Question of the Day application (which provides daily board-examination-style multiple choice questions) was used.Topics include a wide variety of neurology subspecialties that are divided by the app into 22 categories.LLM-based AI software (AI chatbot) responses were generated by ChatGPT (GPT-3.5 version) and neurology peers' response data (relative frequency of right answers) were provided by the app.Examples of questions (followed by ChatGPT's answers) are provided in the Results section (Figures 1-5).

Primary end-point
The primary end-point was established as the mean success rate, defined as the mean percentage of correct answers per category to multiple choice questions.The objective was to assess the reliability of ChatGPT (GPT-3.5 version) in correctly interpreting neurologyrelated clinical vignettes and answering complex multiple choice questions, compared to neurology peers.

Secondary end-points
The first secondary end-point was the success rate per category, defined as the percentage of correct answers per category.This secondary end-point measures ChatGPT's accuracy across distinct categories, benchmarked against neurology peers.
The second end-point was correct justification, intended to validate the correct answers given by the AI chatbot in order to exclude luck/randomness by asking for the justification for its correct answers.For this end, two investigators simultaneously and blind to each other's decision read the justifications and decided whether they were adequate or not.When an immediate agreement was not reached, the case was discussed with a third investigator and the decision was based on the consensus of two out of the three investigators.
The final end-point was the correct diagnosis, designed to assess ChatGPT's capability to deduce the correct diagnosis from clinical scenarios when the diagnosis is not explicitly mentioned in the clinical vignette.This was provided within the 'justification' section in the AI chatbot's answer.

Data analysis
The results were obtained using answers from AI chatbot by entering the following prompt: 'answer the following multiple choice

Clinical vignette:
A 49-year-old woman has had multiple episodes of dizziness that started about 2 months ago.Episodes are characterized by a sudden onset of vertigo that lasts a few seconds; they occur mostly at rest and can happen multiple times a day.She feels slightly off balance and reports some pressure/ numbness in her right ear during episodes.History includes hypertension and anxiety for which she takes lisinopril and bupropion.Neurologic and neuro-otologic examinations are unremarkable.Brain MRI with constructive interference in steady state (CISS) shows bilateral neurovascular cross-compression between the vestibular nerve and the anterior inferior cerebellar artery.Which of the following medications is most likely to help with the patient's vertigo? 1.

Answer (AI-chatbot):
The most likely medication to help with the patient's vertigo in this case is B-carbamazepine.The patient's history and imaging findings are consistent with neurovascular cross-compression, also known as vestibular paroxysmia.This condition is characterized by recurrent, brief episodes of vertigo, often triggered by head movements and occurring multiple times a day.The episodes are typically short in duration and may be accompanied by pressure or numbness in the ear.The neurovascular compression, in this case between the vestibular nerve and the anterior inferior cerebellar artery, can lead to aberrant firing of the vestibular nerve and result in the characteristic symptoms.
Carbamazepine is an anticonvulsant medication that can help alleviate the symptoms of neurovascular cross-compression.It stabilizes neuronal membranes and can reduce the abnormal firing of the vestibular nerve, thereby reducing the frequency and severity of the vertigo episodes.Other anticonvulsant medications such as oxcarbazepine and gabapentin may also be effective in managing the symptoms.Betahistine (option A) is often used for the treatment of Meniere's disease but is not typically effective for neurovascular cross-compression.Hydrochlorothiazide (option C) is a diuretic commonly used for the management of hypertension and is not indicated for the treatment of vertigo.Meclizine (option D) is an antihistamine medication commonly used for the treatment of motion sickness and vertigo associated with inner ear problems.While it may provide some symptomatic relief, it is not specifically indicated for neurovascular cross-compression.Propranolol (option E) is a beta-blocker that is often used for the treatment of migraines and essential tremor.While it may have some effects on reducing symptoms related to anxiety, it is not the primary treatment for neurovascular cross-compression.
Therefore, carbamazepine is the most appropriate medication to help with the patient's vertigo in this case.

F I G U R E 2
Example of a clinical vignette and the corresponding answer.Category: Movement disorders.Published 25 February 2023.

Clinical vignette:
A 65-year-old man presents with 2 years of progressive gait difficulty with frequent falls.He sometimes chokes while eating or drinking.His family is concerned as he always has a "worried expression."They also report he suddenly starts laughing or crying for no reason.Examination shows slow, slurred, monotonous speech as well as axial rigidity, symmetric rigidity of his arms and legs, and a stiff gait with pivot turning.He is unable to look up or down, but this can be overcome by cervical-ocular reflex maneuvers.
Given the most likely diagnosis, which of the following MRI findings would be expected?A. asymmetric frontoparietal atrophy B. midbrain and superior cerebellar peduncle atrophy C. olivopontocerebellar atrophy D. bilateral T2 in the posterolateral putamen E. diffuse cerebral atrophy with relative preservation of occipital volume

Answer (AI-chatbot):
B-midbrain and superior cerebellar peduncle atrophy In progressive supranuclear palsy (PSP), the characteristic MRI findings include midbrain atrophy, often referred to as the "hummingbird sign" or "penguin sign," due to the slimmed appearance of the midbrain.Additionally, atrophy of the superior cerebellar peduncle can also be observed.These findings are helpful in distinguishing PSP from other neurodegenerative disorders.The midbrain and superior cerebellar peduncle atrophy seen in PSP is a result of the underlying degenerative changes in these regions of the brain.

F I G U R E 3
Example of a clinical vignette and the corresponding answer.Category: Neuroinfectious diseases.Published 6 December 2022.

Clinical vignette:
A 47-year-old man with a history of HIV develops severe pain described as electrical and shooting from his back into the legs.He also reports diarrhea.

Answer (AI-chatbot):
The most likely cause of the patient's symptoms is HIV-associated sensory polyneuropathy with concomitant CMV infection.The appropriate management for this condition involves treating both HIV and CMV infections.The correct medication combination for the management of HIV and CMV infections in this case is option D -ganciclovir and foscarnet.Justification: -HIV-associated sensory polyneuropathy is a common neurological complication in patients with advanced HIV infection.It typically presents with distal symmetric or asymmetric sensory loss and pain in the lower extremities.
-CMV (cytomegalovirus) is a common opportunistic infection in patients with advanced HIV.It can cause various manifestations, including polyradiculopathy, which can lead to the patient's symptoms of severe pain and lumbosacral nerve root enhancement on MRI.
-Ganciclovir and foscarnet are antiviral medications used to treat CMV infection.The combination of these drugs is effective against CMV and is the recommended treatment for CMV-related neurological complications in patients with HIV.
-Ceftriaxone and vancomycin (option A) are antibiotics used to treat bacterial infections and are not appropriate for the management of HIV-associated sensory polyneuropathy with CMV infection.
-Amphotericin and fluconazole (option B) are antifungal medications and are not the treatment of choice for CMV infection or HIV-associated sensory polyneuropathy.
-Efavirenz and tenofovir (option C) are antiretroviral medications used to treat HIV infection but do not have specific activity against CMV or provide direct relief for the symptoms of sensory polyneuropathy.
-Pyrimethamine-sulfamethoxazole (option E) is a medication combination used to treat infections caused by Toxoplasma gondii, which is not the most likely cause of the patient's symptoms.In summary, for the management of the patient's symptoms related to HIV-associated sensory polyneuropathy with concomitant CMV infection, the appropriate medication combination is ganciclovir and foscarnet (option D).However, treatment decisions should be made in consultation with a healthcare provider, taking into account the individual patient's condition and other relevant factors.

Answer (AI-chatbot):
In a patient with transverse myelitis, the presence of Aquaporin-4 antibodies (Option D) in the laboratory results is most likely to be diagnostic.Explanation: Transverse myelitis is an inflammatory disorder that affects the spinal cord, resulting in neurological deficits such as weakness, sensory loss, and dysfunction of the bladder and bowel.It can be idiopathic or associated with various underlying conditions.One of the important considerations in the evaluation of transverse myelitis is determining whether it is associated with neuromyelitis optica spectrum disorder (NMOSD), which is characterized by the presence of Aquaporin-4 antibodies.Aquaporin-4 (AQP4) is a water channel protein found primarily in the central nervous system, including the spinal cord and optic nerve.The presence of AQP4 antibodies, detected through serological testing, is highly specific for NMOSD.These antibodies target and damage AQP4 channels, leading to inflammation and demyelination in affected areas.
While the other options (A, B, C, and E) represent various autoimmune markers that can be tested in different neurological conditions, the presence of Aquaporin-4 antibodies specifically points toward NMOSD as the underlying cause of transverse myelitis in this patient.Antiamphiphysin antibodies (Option A) are associated with paraneoplastic neurologic disorders.Antineutrophilic cytoplasmic antibodies (Option B) are associated with vasculitides such as granulomatosis with polyangiitis (formerly Wegener granulomatosis) and microscopic polyangiitis.Antinuclear antibody (Option C) is a nonspecific marker of autoimmune diseases.CSF oligoclonal bands (Option E) can be seen in multiple sclerosis but are not specific to transverse myelitis or NMOSD.Therefore, in this case, the presence of Aquaporin-4 antibodies would provide the most diagnostic information.
question about neurology' (in order to assess the primary end-point [success rate] and a secondary end-point [success rate per category]) 'and justify' (to assess the remaining secondary end-points).The answers were obtained for 10 questions per category, except when there were fewer than 10 questions available, in which case all the available questions were used.The questions were randomly chosen based on the most recent publication since 2020 until 30 April 2023.
The questions requiring only the interpretation of a provided image were excluded.
The percentage of correct answers was calculated for each category ('success rate per category') and the primary end-point ('success rate') was calculated using the mean percentage of correct answers per category (  4) and ( 20) had fewer than 10 questions each, and hence only three, one, three, nine and two questions respectively were included.Two questions had been excluded because the answer relied on image interpretation.In The results obtained by the AI chatbot are presented in Table 1.
Results obtained per category and the overall performance of the AI chatbot and neurology peers are shown in In correct justification analysis, the AI chatbot provided an adequate explanation for its correct answers in 123 out of 128 cases (96.1%).

DISCUSS ION AI chatbot performance overview
The difference in performance between the AI chatbot and human experts was not statistically different, meaning that ChatGPT, on average, performed on a par with human neurologists/neurology residents (estimated in 1500 users) in what concerns correctly answering complex multiple choice questions through the interpretation of neurology-related clinical vignettes.This is notable because it suggests that, at least in the context of these questions, LLM-based AI models can match human performance.
Despite these results, AI chatbot has shown variable success across different categories, ranging from 30% to 100% (see Table 1).
It performed better in categories such as normal structure process and development through life cycle, anatomy/localization and ethics/professionalism.It should be noted that these were categories with particularly few questions (three, one and three respectively), as was the case for cranial nerve palsies with only two questions available, making them prone to bias.Therefore, care should be taken in drawing conclusions about AI performance in such domains.It was interesting to note, however, the high success rate for ethics/professionalism, as ethical problems often require whole-picture analysis of difficult and nonlinear scenarios-although only three questions were available.In categories where AI chatbot showed a success rate above 60%, it was consistently better than human experts.On the other hand, it performed worse than humans in categories with a success rate ≤60%, particularly in neuroimmunological and paraneoplastic disorders of the central nervous system and demyelinating disorders (30% vs. 53.5%).The variability in performance across different categories may be attributed to differences in complexity of

Diagnostic capability and answer justification
The results demonstrate that the AI chatbot is able to infer the correct diagnosis from clinical information in a significant number of scenarios (85%), which is a crucial step in clinical practice to make assertive decisions related to treatments, selecting appropriate examinations and defining prognosis.Although knowing the diagnosis was not sufficient to correctly answer the multiple choice question, it seemed a necessary condition, as shown by the fact that it failed all questions where it did not reach a correct diagnosis.Actually, this was the main reason for failure, in 42% of cases.By asking ChatGPT to 'justify' its correct answers, an extra layer of validity was also added to this investigation.LLMs like ChatGPT reach an answer by associating words in a neural network architecture that was pre-trained to then choose the most probable text continuation as an output [14].They do not actually 'know' the concepts related to the diseases.By asking for the justification, it is believed that it is possible to analyse the process of 'reasoning', trying to clarify and confirm if it uses correct information to then generate its answer, thus making the results more trustworthy and valuable-a confirmatory test-and to demonstrate that it is capable of sequential reasoning through these complex medical questions.It provided an adequate justification in the vast majority of answers, suggesting that it offers explanations that are appropriate and potentially valuable in clinical decision-making.

Study limitations
It is important to acknowledge that the study has certain limitations.First, the performance of the AI chatbot was evaluated using a specific dataset and may not be representative of real-world clinical scenarios.Secondly, there is also variability in what concerns the difficulty inherent to the questions across the different categories, as they were not made by the same person and were randomly chosen across a set of questions.This may have been in part responsible for the variability in performance across different categories, as discussed above.Thirdly, it should be highlighted that answers were only based on interpretation of human language and never on images, so no extrapolations can be made about AI chatbot performance in scenarios where images are crucial to interpretation or reasoning.Fourthly, different prompts including more information, such as defining the role as neurologist, could have been used.
Additionally, methodological limitations come from the fact that the answers from the AI chatbot were not individually compared to a peer.Instead, using a mean percentage of correct answers by multiple peers makes statistical tests more fragile in inferring definite conclusions about differences in performance between AI chatbots and human experts.
Lastly, the study focused solely on neurology questions, so AI chatbot's performance in other medical specialties remains to be explored.

The role of AI in clinical practice and future implications
It must be highlighted that analysing clinical vignettes is somehow artificial compared to a physician's clinical practice.The medical role implies collecting clinically relevant information, filtering out some details and asking the right questions in an imperfect, sometimes desperate, plight scenario with limited time.
Moreover, clinicians attend patients who are not always cooperative or able to describe their symptoms, when the main symptom is, quite often, just one more piece in a large puzzle that is slowly uncovered throughout the various consultations and follow-ups.
Beyond diagnosing diseases and treating them according to the latest guidelines, the physician's mission also includes managing the patient as a whole, which often encompasses social and emotional support, dealing and managing expectations and creating empathy/healthy patient−physician relationships based on trust.
In this process, decisions about disease management are often nonlinear and may not follow strict guidelines or the apparent best clinical practice.
Concerning the above, whilst the AI chatbot shows promise as a valuable tool in the field of neurology, it is important to emphasize that it should not replace human expertise.Otherwise, the integration of AI technologies should remain being investigated as a potential complementary approach, supporting clinicians in their decision-making process in order to improve care.
GPT-3.5 was used for this paper, although by the study's end a new version had already been released (GPT-4) and, it is hoped, many others will do so in the not too distant future.It is hence a limitation for the present study to use GPT-3.5 and not GPT-4 which, according to OpenAI, is trained with around 100 trillion parameters (vs.175 billion in GPT-3.5),giving more accurate and coherent responses.It is also capable of accessing the internet meaning that its information is up to date and multimodal, capable of analysing text, images and voice.By using GPT-4, it was expected to perform even better.It paves the way for future studies comparing these different softwares.
Although overwhelming, it must be taken note that the pace at which technology-namely AI-evolves and advances is still, apparently, exponential.With the methodology used in this paper there is confidence that our work can be replicated and refined in future studies to test the utility and reliability of LLMs in clinical practice.
At present, AI−physician interaction is already possible but requires a specialized neurologist with refined critical thinking and accurate reasoning to validate the output.As they are scalable and tendentially cheaper, enhanced LLMs and other AI technologies have the potential to have a global reach, including developing countries with scarce access to differentialized medical care.For this reason, it is therefore important that the value is recognized and credit is given to companies such as OpenAI that provide these tools for public use, free of charge it is hoped, despite the high financial efforts to develop and maintain them.

CON CLUS IONS
The results of this study suggest that the LLMs (in our case GPT-3.5) show promising capabilities in analysing extensive clinical vignettes, interpreting them and providing a diagnostic hypothesis.Less commonly, but also with remarkable accuracy, they are able to answer two-step questions inquiring about the most effective tests, the best treatment, prognosis and additional considerations reflected in the multiple choice questions.The success rate of AI answers was similar to neurologists/neurology residents with a mean success rate of 71.3% (chatbot) versus 69.2% (peers).This study suggests the potential of integrating AI technologies as a complementary approach to human expertise to support clinicians in their decision-making process and enhance patient care.Despite being far from being completely reliable, they represent a promising tool.Examining the performance of these models in challenging medical questions can support and encourage further benchmarking in the medical community, in order to learn more about their utility in clinical practice.
rologists and neurology residents.These are multiple choice questions that are created for neurologists and follow a curriculum based on the American Board of Psychiatry and Neurology Maintenance of Certification content outline.Each question describes a real-life scenario involving a patient presenting with specific symptoms or clinical findings.The questions require careful interpretation of provided information, knowledge about the underlying condition, diagnostic tests and treatment options in order to select the correct answer.They are two-step questions consisting of clinical vignettes designed to assess the ability to interpret symptoms and to arrive at a diagnosis and/or appropriate management strategy, such as the best diagnostic test or treatment.The two steps required for each question include (1) interpretation of the clinical vignette to reach a

1
Example of a clinical vignette and the corresponding answer.Category: Neuro-ophthalmology/neuro-otological disorders.Published 28 March 2023.

Figures 1 - 5
Figures 1-5 examples of the questions and respective AI chatbot answers are provided.
the various neurological fields, to particularly challenging conditions addressed by the selected questions and/or limitations and scarcity of the training data.Categories with lower performance of the AI chatbot like neuromuscular diseases/disorders and neuroimmunological and paraneoplastic disorders of the central nervous system and demyelinating disorders can be particularly challenging due to the fast evolution and growing knowledge in pathophysiology and available treatments as well as recent updates in classification systems.Not only are data in the AI chatbot outdated in some fields (the current publicly available version has information until September 2021), but also access to reliable scientific literature is frequently restricted to paid platforms such as scientific journals, making it impossible for AI chatbot to access the knowledge necessary to correctly answer some questions.Further refinement and optimization of the AI chatbot's algorithms and training processes are essential to enhance its performance across all neurology domains.
A 5-year-old boy presents with delayed motor milestones, prolonged toe walking, frequent falls, and difficulties performing basic motor tasks.Examination reveals symmetric calf hypertrophy, a mildly lordotic posture, a waddling gait, and poor hip excursion during running.His creatine kinase level is 31,400 U/L.Which of the following therapeutic options should be offered?A. Achilles tendon lengthening B. mexiletine 150 mg orally three times/day C. prednisone 0.75 mg/kg/d D. pyridostigmine 30 mg every 6 h Based on the clinical presentation, examination findings, and elevated creatine kinase level, the most likely diagnosis for this patient is Duchenne muscular dystrophy (DMD).DMD is an X-linked recessive disorder characterized by progressive muscle weakness and wasting.Among the given therapeutic options, the most appropriate choice for this patient would be D. Corticosteroids, such as prednisone, are the mainstay of treatment for DMD.They have been shown to slow the progression of muscle weakness and improve muscle strength, function, and quality of life in patients with DMD.Prednisone is typically initiated at a dose of 0.75 mg/kg/d and can be titrated based on the individual patient's response and tolerability.Achilles tendon lengthening (choice A) is not the appropriate therapeutic option in this case.It is a surgical procedure that is sometimes performed in patients with equinus contractures to improve heelto-toe gait, but it does not address the underlying cause of DMD.Coenzyme Q supplementation (choice B) has not been shown to be effective in the treatment of DMD.Mexiletine (choice C) is a sodium channel blocker that is sometimes used in the treatment of myotonic disorders, such as myotonic dystrophy.However, it is not a recommended therapy for DMD.Pyridostigmine (choice E) is a cholinesterase inhibitor that is used in the treatment of myasthenia gravis, an autoimmune neuromuscular disorder.It is not an appropriate therapy for DMD.Therefore, the most appropriate therapeutic option for this patient with Duchenne muscular dystrophy would be prednisone 0.75 mg/kg/d (choice D).
4 Example of a clinical vignette and the corresponding answer.Category: Neuromuscular diseases/disorders.Published 24 March 2022.Clinical vignette: F I G U R E 5 Example of a clinical vignette and the corresponding answer.Category: Brainstem and spinal cord syndromes.Published 22 August 2022.Clinical vignette:A 54-year-old woman presents with an episode of transverse myelitis with severe quadriparesis that developed over 2 days.Spinal cord MRI scan shows an extensive area of increased signal from T2 to T6, which enhances on postcontrast T1-weighted images.Brain MRI is normal.Which of the

Table 2
tive analysis between the 'success rate' of the AI chatbot and human experts was then conducted using t Student paired test, after employing the Shapiro-Wilk test and visual inspection of histograms to assess data normality.Data were then paired by category to compare success rate per category between the AI chatbot and peers.

Table 2
. For the primary end-point analysis, the AI chatbot achieved a mean success rate of 71.3% (standard deviation of19.1).The difference between the AI chatbot and neurology peers was not statistically different (71.3% vs. 69.2%;t(21)= 0.634; p = 0.533).Amongst 164 questions without an explicit diagnosis in the clinical vignette, the AI chatbot could identify the correct diagnosis in 139 (85.0%).Failing the correct diagnosis was the main reason for missing the correct answer in the primary end-point, corresponding to 42% of the cases (25/59 wrong answers).Other domains of incorrectness were treatment/medication management (25.6%);clinical signs (most common neurological finding, topographical diagnosis based on examination) (8.6%); complementary diagnostic tests (most common findings, possible test contaminators) (6.8%); prognosis (5.2%) or other (such as risk factors and epidemiology) (10.3%).
TA B L E 1 AI chatbot results.
TA B L E 2 AI chatbot versus peers' success rate.