Self-directed learning of basic musculoskeletal ultrasound among rheumatologists in the United States




Because musculoskeletal ultrasound (MSUS) is highly user dependent, we aimed to establish whether non-mentored learning of MSUS is sufficient to achieve the same level of diagnostic accuracy and scanning reliability as has been achieved by rheumatologists recognized as international experts in MSUS.


A group of 8 rheumatologists with more experience in MSUS and 8 rheumatologists with less experience in MSUS participated in an MSUS exercise to assess patients with musculoskeletal abnormalities commonly seen in a rheumatology practice. Patients' established diagnoses were obtained from chart review (gout, osteoarthritis, rotator cuff syndrome, rheumatoid arthritis, and seronegative arthritis). Two examining groups were formed, each composed of 4 less experienced and 4 more experienced examiners. Each group scanned 1 predefined body region (hand, wrist, elbow, shoulder, knee, or ankle) in each of 8 patients, blinded to medical history and physical examination. Structural abnormalities were noted with dichotomous answers, and an open-ended answer was used for the final diagnosis.


Less experienced and more experienced examiners achieved the same diagnostic accuracy (US-established diagnosis versus chart review diagnosis). The interrater reliability for tissue pathology was slightly higher for more experienced versus less experienced examiners (κ = 0.43 versus κ = 0.34; P = 0.001).


Non-mentored training in MSUS can lead to the achievement of diagnostic accuracy in MSUS comparable to that achieved by highly experienced international experts. Reliability may increase slightly with additional experience. Further study is needed to determine the minimal training requirement to achieve proficiency in MSUS.


The first clinical use of ultrasound (US) technology for musculoskeletal examination occurred in the early 1970s, primarily for detecting popliteal cysts (1). Rheumatologists first adopted musculoskeletal US (MSUS) in Germany in the 1980s (2), with the rest of Europe following over the next decade. In 1996, a specific program in MSUS was added to rheumatology fellowship training in Italy (3), and around the same time, the Ultrasound School of the Spanish Society of Rheumatology was established (4). These events resulted in the development of rheumatology experts in MSUS imaging in Europe and Latin America (5). The MSUS reliability of some of these experts has been determined previously in Germany and Spain (6, 7) in the “Train the Trainers” (6) and “Teach the Teachers” (7) exercises, finding good interobserver agreement for the wrist/hand and knee, with fair concordance for the shoulder and ankle/foot (7).

US rheumatologists have been slower to adopt MSUS. Introductory courses in MSUS were first offered at the American College of Rheumatology (ACR) Annual Scientific Meeting in 1999, and have been conducted at all successive meetings to date. Interest in MSUS in the US has steadily grown, as evidenced by the increasing number of abstracts at the ACR involving MSUS: from 17 in 2002, to 23 in 2005, to 42 in 2008. There have also been a growing number of MSUS training courses conducted within the US. A recent study of interest in MSUS among rheumatology fellows and program directors found that up to 50% of current fellows have attended a lecture or course on MSUS, with 41% incorporating MSUS into the fellowship program to some degree, and up to 81% of those surveyed believe that MSUS will become a standard clinical tool for the rheumatologist (8).

With few opportunities for formal training and mentorship in MSUS in the US relative to Europe, most rheumatologists performing MSUS in the US have gained their training through short courses, reading, and independent scanning, without the benefit of formal mentorship. Training in MSUS imaging is not currently guided by clear-cut agreement as to the recommended number of cases or scanning hours needed for an examiner to become competent: the American College of Radiology recommends 500 US scans to become competent (9), and the American Institute of Ultrasound in Medicine had recommended 300 scans but is currently revising this recommendation (online at: In Germany, 300 studies are required to sit for the final examination (10), the Italian School requires 200 US examinations (3), and the Spanish Society of Rheumatology recommends 90 examinations for the intermediate and 90 more for the advanced training level (4). Since MSUS is highly operator dependent, it would be important to establish whether non-mentored/minimally-mentored learning of MSUS is sufficient to achieve reliability in US assessment of rheumatic disorders similar to that of rheumatologists recognized as experts in MSUS.

In the current study, we compared less experienced rheumatologists without extensive mentored training in MSUS from the US with more experienced rheumatologist experts in MSUS from outside of the US in their ability to arrive at the correct overall disease diagnosis through focused US imaging of joint regions and structures commonly examined in a rheumatology practice. In addition, we assessed the sensitivity and specificity of less experienced rheumatologists in detecting peripheral musculoskeletal lesions by US.


Physician groups.

Eight rheumatologists more experienced in MSUS and 8 rheumatologists with less experience in MSUS were recruited. Rheumatologists more experienced in MSUS had at least 5 years of scanning and more than 5,000 lifetime US scans performed (trained outside of the US) (Table 1). Rheumatologists with less experience in MSUS had at least 2 years of scanning, 250 lifetime US scans performed, an average of 99 hours of Continuing Medical Education (CME; range 0–360 hours), of which 40% was dedicated to hands-on scanning, and no significant (3 days or less) prior mentoring in MSUS (trained in the US) (Table 1). These 2 groups participated in an MSUS exercise to assess patients with common rheumatic conditions. Sixteen physicians were divided into 2 examining cohorts, each consisting of 4 less experienced and 4 more experienced examiners. Each cohort concurrently scanned 1 predefined body region (hands, wrist, elbow, shoulder, knee, or ankle) in each of 8 patients using a Latin-square design. Rheumatic conditions were evenly distributed between the 2 groups (Table 2). Clinical evaluation was forbidden and examiners were unaware of the diagnoses. Prior to scanning, examiners reviewed standardized definitions for MSUS structural abnormalities (e.g., enthesopathy, synovial proliferation, joint erosion, etc.) (11–15).

Table 1. Ultrasound training characteristics of the more and less experienced groups*
Experience levelYears of experienceEstimated number of lifetime studiesMentored teaching timeHours of CME credit (40% hands on)
  • *

    CME = Continuing Medical Education.

11610,00012 months0
266,0003 months0
31220,0002 months0
41111,0003 months0
61425,0004 days60
8105,00024 months120
545003 days100
633003 days0
Table 2. Patients' established chart diagnosis and region of ultrasound examination
 Joint areaDiagnosis
  • *

    Undifferentiated spondylarthritis (SpA) is defined as inflammatory spine pain or asymmetric synovitis with enthesopathy or asymmetric sacroiliitis, without psoriasis, inflammatory bowel disease, or serologies of rheumatoid arthritis.

Patient group 1  
 2KneeUndifferentiated SpA*
 3ShoulderRotator cuff tear
 4AnkleRheumatoid arthritis
 5WristPsoriatic arthritis
 6ElbowUndifferentiated SpA*
 7WristRheumatoid arthritis
Patient group 2  
 9ShoulderRotator cuff tear
 10KneeAnkylosing spondylitis
 11AnkleUndifferentiated SpA*
 14ShoulderRotator cuff tear
 15ElbowRheumatoid arthritis
 16WristRheumatoid arthritis

Patient groups.

Sixteen patients were recruited from the rheumatology clinic at a tertiary care hospital, based on having a firm diagnosis of a rheumatic disease (rheumatoid arthritis [RA], seronegative arthritis, gout, osteoarthritis, and rotator cuff syndrome) that had caused a longstanding structural abnormality such as bony erosion, tendon tear, tendinopathy, osteophyte, or tophus. Patients with lesions that might resolve prior to the date of the study, such as isolated synovitis, were excluded. The rheumatic disease diagnoses were obtained from chart review and were consistent with established diagnostic/classification criteria and/or magnetic resonance imaging studies (Table 2). The study was approved by the Institutional Review Board.

Recruited patients had abnormalities involving at least one of the following 6 regions: hands, wrists, elbows, shoulders, knees, and ankles. Both patient groups included the same rheumatic disease diagnoses and at least one of each of the 6 joint regions studied. Therefore, each rheumatologist assessed the full complement of peripheral joint regions.

US equipment.

In group 1, the US equipment included 4 SonoSite MicroMax with HFL38 13–6 MHz linear array transducers (SonoSite, Bothell, WA) and 3 GE Logiq e and 1 GE Logiq P5 with L12 13–5 MHz linear array transducers (GE Healthcare, Milwaukee, WI). In group 2, the US equipment included 3 Philips iU22 with L9-3 9–3 MHz linear array transducers (Philips, Andover, MA), 4 Biosound Esaote MyLab25 with LA435 18–6 MHz linear array transducers (Biosound Esaote, Indianapolis, IN), and 1 SonoSite Titan with L38 10–5 MHz linear array transducer (SonoSite).

US scanning procedure and documentation.

Each evaluator was permitted a maximum of 10 minutes per patient. Only the joint area to be scanned was exposed to view, and discussion of disease symptoms with the patient was not allowed. The patients did not have any findings that would have made the diagnosis obvious from casual inspection of the affected joint such as typical RA deformities, visible tophaceous nodules, or psoriatic skin lesions. Findings were recorded on a standardized form on which dichotomous answers of present or absent for the following potential findings were indicated: joint effusions, synovial hypertrophy, joint erosion, Doppler signal, cartilage calcification, enthesopathy, tendon calcification, tenosynovitis, tendinosis, osteophyte, bursa effusion, tophus, and ganglion cyst. Tendon tears could be recorded as complete, partial, or absent. A final overall diagnosis question was open ended (not chosen from a list).

Statistical analysis.

Final diagnoses were open ended and analyzed by the generalized estimating equation statistic. The overall disease-specific diagnosis (e.g., RA, gout, etc.) and disease-nonspecific diagnosis (e.g., inflammatory arthritis, noninflammatory arthritis, crystalline arthritis) recorded by the sonographers were compared with the disease diagnosis established in the medical record (Table 3) to determine diagnostic accuracy.

Table 3. Comparison of ultrasound-based diagnostic ability between more and less experienced rheumatologists (each patient was scanned by 4 more experienced and 4 less experienced rheumatologists)
Type of rheumatic conditionLess experienced (250–1,500 studies, 2–4 years), correct/total (%)*More experienced (5,000–50,000 studies, 6–23 years), correct/total (%)*Difference, odds ratio (P)
  • *

    Number of case-observer interactions, where the denominator is the number of times a particular diagnosis was evaluated (number of diagnoses × number of examiners for diagnosis).

1. nonspecific inflammatory condition25/36 (69)26/36 (72)0.98 (0.81)
1a. rheumatoid arthritis7/16 (44)6/16 (38)1.06 (0.73)
1b. seronegative arthritis3/20 (15)4/20 (20)0.96 (0.67)
2. nonspecific noninflammatory condition16/20 (80)15/20 (75)1.06 (0.68)
2a. osteoarthritis4/8 (50)7/8 (88)0.69 (0.08)
2b. rotator cuff disease11/12 (92)10/12 (83)1.09 (0.64)
3. chronic gout6/8 (75)6/8 (75)1.00 (1.00)
Total (1 + 2 + 3)47/64 (73)47/64 (73)1.00 (1.00)

To determine sensitivity and specificity for the less experienced group, normal and abnormal findings with 100% agreement among all of the more experienced examiners were designated as gold standard findings.

Structural abnormalities were dichotomized as present/absent and evaluated by the multirater inter-kappa. Interrater reliability was calculated for the less experienced and more experienced groups. The kappa statistic was interpreted as follows: <0.00 = poor agreement, 0.00–0.20 = slight agreement, 0.21–0.40 = fair agreement, 0.41–0.60 = moderate agreement, 0.61–0.80 = substantial agreement, and 0.81–1.00 = almost perfect agreement. Differences were considered statistically significant if P values were less than 0.05.


Less experienced examiners were similar to more experienced examiners at interpreting US findings. The diagnostic accuracy was not statistically different between less experienced and more experienced examiners for any of the rheumatic disease categories or overall (Table 3). Despite clinical information being limited to a single joint region US, both less experienced and more experienced examiners could distinguish among inflammatory, noninflammatory, and crystal-induced arthritis 73% of the time. Neither group could reliably distinguish rheumatoid from seronegative forms of arthritis based on US characteristics of the affected joint (Table 3). Gouty tophus was correctly identified on 12 of 16 attempts, with 4 of 16 mistakenly identifying the structure as a rheumatoid nodule (less experienced and more experienced groups combined). Rotator cuff tear was correctly identified on 21 of 24 attempts (less experienced and more experienced groups combined). Two of the patients had both subacromial bursa and glenohumeral effusion detected by all of the participants in the more experienced group. Of the less experienced group, 5 of 8 detected the subacromial bursa effusion and 6 of 8 detected the glenohumeral effusion. There was more difficulty with establishing the diagnosis of osteoarthritis in the less experienced group compared with the more experienced group (Figure 1), but the difference was not statistically significant.

Figure 1.

Ultrasound with discrepant results. For the above scan of the second proximal interphalangeal joint, 2 of 4 more experienced and 2 of 4 less experienced practitioners reported effusion (arrows), 1 of 4 more experienced and 1 of 4 less experienced practitioners reported synovial proliferation, 2 of 4 more experienced and 1 of 4 less experienced practitioners reported calcification (arrow heads), and 1 of 4 more experienced and 3 of 4 less experienced practitioners reported joint erosion. 1 = distal end of proximal phalanx; 2 = proximal end of middle phalanx.

Sensitivity and specificity of less experience compared with the gold standard of more experience did not vary with US experience in the range of experience tested. Specifically, examiner sensitivity and specificity did not correlate with the number of US examinations performed (R2 = 0.02 and 0.08, respectively; P = not significant [NS]) (Figure 2 and Table 4) when adjusted for years of experience and number of MSUS CME hours of training. When no adjustments were made, there was also no correlation of sensitivity or specificity with number of studies (R2 = 0.11 and 0.14, respectively; P = NS), with number of years of MSUS training (R2 = 0.29 and 0.10, respectively; P = NS), or with number of CME hours of training (R2 = 0.43 and 0.02, respectively; P = NS).

Figure 2.

Regression model for correlation of examiner specificity and sensitivity with number of musculoskeletal ultrasound (MSUS) studies performed, adjusted for years of MSUS training and number of MSUS Continuing Medical Education hours.

Table 4. Comparison between years of ultrasound experience or number of musculoskeletal ultrasound studies performed by less experienced examiners and the examiner's sensitivity and specificity in detecting ultrasound abnormalities
Less experiencedYearsStudiesSensitivity, %Specificity, %

Interrater reliability for tissue pathology was slightly higher for the more experienced examiners compared with the less experienced examiners (κ = 0.43 versus κ = 0.34; P = 0.001), with moderate agreement for the more experienced group and fair agreement for the less experienced group. There was no significant difference in overall reliability between the more experienced examiners in group 1 and the more experienced examiners in group 2 (κ = 0.41 versus κ = 0.41; P = NS), but there was a difference between the less experienced examiners in group 1 and the less experienced examiners in group 2 (κ = 0.39 versus κ = 0.28; P = 0.002).

Complete agreement among the more experienced group about the presence of tissue pathology was achieved for 11 joints with synovial proliferation, 10 with synovial effusion, and 6 with articular erosion. Tendinopathy was agreed on 5 times, calcifications 4 times, cortical irregularity 4 times, and tendon tears 2 times. There were no cases of complete agreement about the presence of osteophyte, tophus, or Doppler signal.


Although MSUS training, like medicine training in general, is a lifelong process, our results suggest that rheumatologists, who are mostly self-trained in the US, can use US to properly diagnose common rheumatic conditions similar to international MSUS experts. Diagnostic accuracy for basic MSUS pathology, as demonstrated by the less experienced group in this study, does not improve substantially beyond a certain level of experience. The absence of correlation between sensitivity and specificity of less experience (compared with the gold standard of more experience) with US could be due to a ceiling effect for basic US learning reached by 250 US scans. Questions remain about the number of studies needed to reach this ceiling effect of training. The differences between the diagnostic accuracy of the 2 groups would likely have been brought out if less common pathology had been assessed. However, for basic diagnostic competence, the groups were equal.

The lack of magnetic resonance imaging testing as a gold standard prior to the exercise prevents us from establishing testing characteristics of sensitivity and specificity for US assessment of isolated joint and tendon abnormalities such as joint erosions or tendon thickening by the more experienced group, but we were able to evaluate sensitivity and specificity for the less experienced group using areas of agreement by the more experienced group as a gold standard for comparison. Our study was limited to typical anatomy and pathology encountered in rheumatology practice and did not include imaging of tissues less frequently imaged by rheumatologists such as skin, fascia, and nerves. Furthermore, practical limitations narrowed our investigation: we did not assess any hip pathology, we did not assess US-guided needle placement abilities, and diagnostic evaluation was limited to a subsection of potential pathologic findings at each joint. We did, however, include a variety of common pathology: 10 joints with synovial effusion, 11 with synovial proliferation, 6 with articular erosion, 5 with tendinopathy, 4 with calcification, 4 with cortical irregularity, and 2 with tendon tears.

The slightly higher reliability demonstrated by the more experienced group compared with the less experienced group could be explained by either greater MSUS experience or greater experience with other members of the more experienced group. Unlike the less experienced (the US) group, many of the more experienced (non-American) rheumatologists had extensive previous collaborative US experience in prior studies. This may have increased their reliability values relative to the Americans. However, the reliability difference observed between less experienced groups 1 versus 2, in contrast to more experienced groups 1 and 2, suggests that the observed reliability difference is, at least in part, due to examiner MSUS proficiency.

The overall interrater reliability achieved in this study was lower than that seen in the Teach the Teachers (7) or Train the Trainers (6) exercises. This might be explained by the variability in US equipment used (6 different machines) and variable examiner experience with the US equipment. The interrater reliability values achieved in the Teach the Teachers (7) exercise was less than in the Train the Trainers (6) exercise for perhaps the same reason, with only 1 US machine used in Train the Trainers but 3 different machines used in Teach the Teachers. In actual medical practice, US result variability depends both on the examiner and on the US equipment used. Therefore, the interrater reliability achieved in our study likely is a good estimation of the variability that exists in practice.

The examiners' inability to differentiate RA from seronegative arthritis with US is difficult to interpret due to the relatively few patients with these diagnoses. Furthermore, this exercise was designed to isolate the US features of disease from the context of the patients' history and physical examination findings, making it difficult to establish an overall disease diagnosis. Despite these limitations, it is encouraging that both less experienced and more experienced examiners were usually able to distinguish noninflammatory, inflammatory, and crystal-related arthropathies using MSUS as a sole diagnostic tool.

Although formal, direct supervision of MSUS examination by an expert is generally accepted as the best method of training, non-mentored learning can also result in the achievement of MSUS competency. Filippucci et al showed that a fundamental competency in MSUS can be achieved in 6 months through Web-based learning (16). Their results demonstrate that the greatest predictor of achievement of competence is student motivation: among students who passed the final examination, 78% had made the effort to submit US images for review, whereas those who failed the examination included only 15% who submitted images. Similarly, our study shows that mentored training is not a necessary condition for the achievement of MSUS competency, as long as the learner is highly motivated. The current guidelines for MSUS training proposed by the American Institute for Ultrasound Medicine (online at: specifically require supervised MSUS examinations. With few available and willing MSUS tutors in the US, such requirements could stifle the development of MSUS learning by local rheumatologists. A credentialing process that was outcomes/competency based would circumvent this problem. Such a process could require achieving an MSUS skill set meeting specific, predefined standards (17), rather than time/process-based requirements that would vary greatly based on the talent and motivation of the trainee as well as of the trainer, and the quality of the equipment used. Future studies should aim to establish the minimal MSUS training, with and without formal mentoring, required for most learners to achieve a high degree of diagnostic accuracy.


All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be submitted for publication. Dr. Kissin had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study conception and design. Kissin, Nishio, Backhaus, Bruyn, Iagnocco, Swen, Wells, Kaeley.

Acquisition of data. Kissin, Nishio, Backhaus, Balint, Bruyn, Craig-Muller, D'Agostino, Feoktistov, Goyal, Iagnocco, Ike, Moller, Naredo, Pineda, Schmidt, Tabechian, Wakefield, Wells, Kaeley.

Analysis and interpretation of data. Kissin, Yang, Backhaus, Balint, Bruyn, Iagnocco, Wells.


We thank the vascular medicine laboratory of Boston University for lending us space, US equipment, and technical expertise. We are also grateful to Biosound Esaote, General Electric, and SonoSite for supplying US equipment and technical expertise for this study.