Prompt engineering: The next big skill in rheumatology research

Large language models (LLMs) like GPT‐4 and Claude are catalyzing transformation across medical research including rheumatology. This review examines their applications, highlighting the pivotal role of prompt engineering in effectively guiding LLMs. Key aspects explored include literature synthesis, data analysis, manuscript drafting, coding assistance, privacy considerations, and generative artificial intelligence integrations. While LLMs accelerate workflows, reliance without apt prompting jeopardizes accuracy. By methodically constructing prompts and gauging model outputs, researchers can maximize relevance and utility. Locally run open‐source models also offer data privacy protections. As LLMs permeate rheumatology research, developing expertise in strategic prompting and assessing model limitations is critical. With proper oversight, LLMs markedly boost scholarly productivity.


| INTRODUC TI ON
2][3] Originally emerging as simple, rulebased chatbots in the 1960s, LLMs have undergone significant evolution into sophisticated tools capable of handling complex tasks in natural language processing.The introduction of Google Brain in 2011 and the development of Transformer models in 2017 have been particularly pivotal, culminating in advanced models like OpenAI's GPT-4. 1 However, their integration into academic settings, particularly in higher education and scientific research, has sparked intense debate.Concerns revolve around their impact on traditional learning and assessment methods, reliability in capturing nuanced scientific texts, and potential to mislead research outcomes due to their limitations in understanding and replicating human scientific discourse. 2,3yond those focused specifically on developing LLMs, medical researchers must also cultivate their prompting skills to apply LLMs effectively in their work.For instance, clinicians seeking LLMs' aid in tasks like medical literature reviews or patient communication need to formulate prompts that elicit useful outputs.Acquiring prompt engineering skills is crucial even for guiding simple searches on LLMs.Without careful prompt construction, models often return information lacking nuance or depth.However, apt prompts can produce focused, comprehensive results.Hence, developing prompt engineering knowledge emerges as a key priority for researchers employing LLMs.Here, we provide an overview on how prompting skills may assist medical researchers.

| THE ART OF PROMP T ENG INEERING IN RHEUMATOLOGY MED I C AL RE S E ARCH
The progress of LLMs such as OpenAI' GPT-4 Anthropic's Claude 2.1, Google's Gemini, and open-source models like Meta's LLama-2 3 and MistralAI's, has already denoted a transformation in academic writing and will drive innovation in routine clinical practice.
Prompt engineering, an emerging discipline, involves the strategic creation and optimization of prompts or commands to effectively guide LLMs in producing relevant outcomes for medical research. 4 thoughtfully considering factors like the phrasing, wording, sequencing, and context setting of prompts, researchers aim to elicit optimal performance from LLMs on diverse tasks ranging from literature reviews to data analysis.
While prompting refers simply to providing inputs or commands to guide LLMs, prompt engineering involves the systematic crafting and optimization of prompts to improve model performance on specific tasks.Prompting uses natural language to query models, taking advantage of their pretraining on diverse texts.However, suboptimal prompts can confuse models and elicit irrelevant or nonsensical outputs.
In contrast, prompt engineering relies on research-driven strategies to formulate prompts that maximize relevance, accuracy, and utility.It examines factors such as wording, syntax, priming, and context setting through an iterative process to develop specialized prompts tailored to models and use cases.While prompting helps exploit the knowledge encoded in models, prompt engineering unravels how models represent and process knowledge to further enhance their capabilities.
Simply prompting models without strategic engineering often wastes time and resources.However, apt prompt engineering unlocks substantial value, enabling non-experts to utilize complex LLMs effectively.
Recent evidence suggests that the effectiveness of LLMs in specific tasks can be influenced by the nature of the prompts used, indicating that tailored prompting techniques might be necessary for different LLMs. 4 For example, engineered prompts can streamline the review of extensive academic literature.This may also contribute to medical education by developing accessible materials that elucidate intricate medical theories, providing advantages for both students and professionals in the field. 5nce, the practical applications of prompt engineering in medical research are extensive.These range from clinical decision support, where prompts guide LLMs to list possible diagnoses based on symptoms, to literature reviews, assisting researchers in synthesizing the latest findings.In patient education and communication, 6 prompts help the generation of comprehensible information, enhancing patient and caregiver understanding and engagement. 7 data analysis, prompt engineering enables processing large datasets, providing insights critical for public health planning whereas in medical training it may assist in developing educational content, equipping future healthcare providers with up-to-date knowledge and practices. 5,8ining expertise in prompt engineering requires dedicated effort from medical researchers.Key skills to hone include systematically evaluating model outputs, exploring various prompting styles, priming models with medical ontology and examples, and incrementally constructing prompts from simple blocks.Testing specialized medical LLMs first and documenting the prompt crafting process enables progressive refinements.While models can aid certain tasks, researchers must identify aspects needing human discretion.
Through iterative analysis and testing, prompt engineering mastery can be attained over time as models advance.
When applying prompting strategies, medical researchers can choose between few-shot prompting and chain-of-thought prompting approaches.Few-shot prompting primes models by providing 2-3 examples to establish the desired context before querying outputs.This leverage's the model's vast pretrained knowledge for efficiency, albeit with some template standardization.Chain-of-thought prompting articulates logical reasoning step-by-step to produce custom outputs, though prompt engineering is more intensive.Assessing trade-offs and downstream impacts on clinical interpretations could inform best practices.Both approaches have situational merits-few-shot for high-volume content needing consistency and chain-of-thought for elucidating nuance around complex concepts.Breaking down complex tasks into modular prompt blocks focused on distinct sub-problems before combining enhances the building chain-of-thought reasoning incrementally. 4,8

| DATA ANALYS IS IN ONE PROMP T
Even if LLMs like GPT-4 offer promising advancements in the domain of data organization and analysis, their application in tasks such as data cleaning and analysis should be approached with a critical and objective perspective 9 and relevant prompt engineering skills.Recent LLM models can analyze datasets and present them in understandable formats, including tables and graphical representations.This capability may be valuable in feature engineering for tabular data.Word embeddings, which are vector representations of words capturing semantic relationships and linguistic nuances, might play a crucial role in biomedical Natural Language Processing (NLP) for interpreting and processing medical language. 10Through the generation of embeddings that capture the latent structure of data, LLMs streamline traditional data analysis methods, often obviating the need for extensive data cleaning and feature development.
The scope of LLMs extends further, enhancing processes in sentiment analysis. 11,12In fact, they can discern the emotional nuances embedded in text, allowing for a more comprehensive understanding of the overall sentiment conveyed by the data.In practical term, in the context of medical research, sentiment analysis may be applied to better understand the emotional tone of patient narratives, possibly corroborating patient-reported outcome measures.
Therefore, the incorporation of LLMs in data analysis significantly accelerates workflows, reducing the likelihood of human error and saving time.This shift is paving the way for more sophisticated and efficient analytical capability.An attached image below illustrates the utilization of prompt engineering in data analysis, highlighting LLMs' capabilities of generating tables and graphical representations (Figures 1 and 2).

| B E T TER PROMP T ENG INEERING FOR B E T TER COD ING
In the realm of academic research, the integration of AI tools like LLMs and no-code platforms is simplifying the process for researchers, particularly those with limited coding skills. 13Many of these can also assist in writing Python scripts and handling other coding tasks, making them an indispensable tool for researchers in computational fields, while third-party LLMs such as GPT-4, Gemini, and Claude

| ENHAN CING AC ADEMIC PRODUC TIVIT Y WITH LLMs: FROM PDF UPLOADS TO MAN USCRIP T REFINEMENT
We all know how powerful LLM can be for academic productivity, aiding in document analysis, manuscript drafting, and language refinement, thus playing a pivotal role in streamlining research processes and enhancing the quality of academic publications.
A key feature for most of them is their capability to process PDF documents, allowing users to directly upload papers for indepth analysis.This functionality transforms how researchers engage with content; for instance, one can inquire about the methodology of a study directly from the LLM, which interprets and explains the paper approach in a simplified manner.This results in faster comprehension and time savings. 14Beyond mere understanding, LLMs excel in content creation and refinement.
They assist in summarizing dense academic texts, ensuring clarity and coherence.The language check feature is particularly beneficial, aiding in polishing the language for publication readiness.In this regard, LLMs are invaluable for non-native English speakers, enhancing the quality of academic writing.They offer more flexibility than traditional editing tools by allowing custom prompts for grammar and style checks, thereby facilitating a more nuanced and accurate editing process.Furthermore, LLMs significantly contribute to the drafting process of manuscripts and protocols.
Researchers can outline their methodology, and the LLM can assist in articulating the remaining sections, ensuring alignment with academic standards and narrative flow.This collaborative process between the researcher and the LLM not only enhances the quality of academic writing but also accelerates the process from ideation to publication. 14

Generative artificial intelligence (AI) models like DALL-E and
Midjourney have shown promising capabilities in various fields, including medical research.These models, which can generate images from textual descriptions, offer a novel approach to visualizing medical data and concepts.However, their effectiveness in the medical domain heavily relies on the precision of the prompts given to them, can provide Python scripts, also open.Moreover, when writing code in IDE such Microsoft® VS Code, third-party extension such as GitHub Copilot serves as an intelligent coding aid, providing code suggestions that simplify software development related to research retrieving inference for OpenAI GPT models.These tools augment research capabilities and foster an inclusive environment where technical barriers are substantially lowered. 13Through these platforms, researchers may easily explore the potential of machine learning for tasks in rheumatology such as F I G U R E 1 Distribution Analysis by ChatGPT-4-A concise representation of data distribution patterns identified by ChatGPT-4.Prompt-"I've conducted a study on imposter syndrome in medical students using the Clance Impostor Phenomenon Scale (CIPS).The data is stored in an Excel file.I need assistance with the following tasks: 1. Load Data: Import the data from the Excel file.2. Calculate Scores: Compute total CIPS scores for each respondent.3. Categorize Scores: Classify scores into four categories -'Few' (≤40), 'Moderate' (41-60), 'Frequent' (61-80), and 'Intense IP' (>80).4. Year-wise Analysis: Analyze the distribution of scores across different years.5. Statistical Tests: Conduct an Analysis of Variance (ANOVA) to identify significant differences in scores across years or groups, followed by post hoc tests for detailed insights.The aim is to evaluate the prevalence and severity of imposter syndrome among medical students over time and assess variations across groups or years."predictive modeling of disease progression or response to biologic/ targeted synthetic disease-modifying antirheumatic drugs (b/tsD-MARDs) (Figure 3).
highlighting the critical role of prompt engineering.DALL-E, particularly its second iteration, DALL-E 2, has demonstrated potential in generating radiological images from textual descriptions.Research has shown that DALL-E 2 can create x-ray images that are stylistically similar to authentic x-rays, suggesting that it has learned relevant representations for radiographs during its training.However, its capabilities are somewhat F I G U R E 2 ANOVA and post hoc analysis by GPT-4 model.limited when it comes to generating images with pathologies, likely due to filters that prevent the generation of harmful content (Figure4).Midjourney, another AI image generation tool, has been explored for its ability to create medical art.While it can produce original art quickly, its accuracy in rendering medically precise illustrations, especially for complex medical procedures or anatomy, is currently limited.This limitation is partly due to the intentional restriction of medically accurate training data to avoid generating sensitive or harmful content.Prompt For DALL-E to generate medically relevant images-"Create an illustration showing an antibody-producing B cell in action against pathogens (like bacteria or viruses).Depict the B cell as a detailed sphere with surface receptors and antibodies, in shades of blue and green.Contrast this with the pathogens in red and orange.Highlight the interaction between the B cell and pathogens, emphasizing the immune response.Use a soft-colored gradient background with abstract representations of other immune cells."7 | E THIC AL PROMP TING The use of patient data in prompts given to LLMs raises serious concerns about privacy violations and lack of regulatory oversight.Under the EU's General Data Protection Regulation (GDPR), stringent safeguards must be in place when processing personal health information.If patient data are provided in prompts to train or finetune proprietary LLMs without explicit consent, this would likely constitute a GDPR breach.The opaque and closed-source nature of most commercial LLMs precludes transparency around how patient data are used and protected.Once given in a prompt, there are no reliable safeguards to prevent patient data from being retained and utilized without authorization for purposes like model training or improvement.This opens the door to exploitation of sensitive health information without accountability.In contrast, localized and open-source LLMs offer greater privacy protection when working with patient data.Models that process information within secure, localized environments instead of external servers provide enhanced control over data access.Additionally, open-source LLMs allow transparency into the data handling while F I G U R E 3 Python script for analyzing lifestyle factors and type 2 diabetes as retrieved by GPT-4-The script includes data import, cleaning, correlation analysis, and logistic regression.Prompt-"I am conducting a medical research study on the correlation between various lifestyle factors and the incidence of type 2 diabetes.I have a dataset in a CSV format that includes patient demographics (age, gender, body mass index [BMI]), lifestyle factors (diet, physical activity, smoking status), and diabetes diagnosis (yes/no).I need assistance with a Python script to: 1. Import and load the dataset from the CSV file.2. Clean the data by filtering out any incomplete or erroneous records.3. Analyze the correlation between each lifestyle factor and diabetes incidence.4. Create visualizations such as bar graphs to compare the incidence of diabetes across different age groups and BMI categories. 5. Perform a logistic regression to understand the impact of these factors on the likelihood of having diabetes.I have basic knowledge of Python but would appreciate a script that is clear and well-commented to help me understand each step of the analysis."F I G U R E 4 DALL-E generated immune response-A B-cell engaging pathogens.