Supplementary Figure 1. Toy example illustrating how miR expression data is converted to binary representation. First, the raw expression value for each miR is converted to a vector of Z-scores for that miR (i.e., number of standard deviations from the mean, across samples). The Z-score vector is then converted to a binary vector, Therefore, a value of “1” means the miR was over-expressed in that sample, and a value of “0” means the miR was not over-expressed in that sample.

Supplementary Figure 2. Toy example illustrating how the interaction significance is determined for a binary expression vector vs. the phenotype vector. Panel A shows the binary, multi-miR expression feature from Figure S2 and the phenotype vector defining control (CTRL) and CCA sample. Panel B shows the two-by-two contingency table for the vectors in A, and the corresponding Fisher's exact, two-tailed P-Value. In B, Expression+ means overexpressed and Expressiono means not overexpressed. After all interaction P-values are calculated, they are corrected using the Benjamini and Hochberg false-discovery rate.

Supplementary Figure 3. RNA integrity and qRT-PCR values illustrating RNA degradation in whole bile specimens. Panel A shows gel electrophoresis of RNA extracted from the following conditions: whole bile processed immediately (Lane 1, WB), bile supernatant (2,000 Gs centrifugation for 5 min, Lane 2, S), whole bile kept at RT for 1 h (Lane 3, WB RT 1h), cell pellet kept at RT for 3 h (Lane 4, CP RT 3h), whole bile kept at RT for 24 h (Lane 5, WB RT 24h), whole bile after 1 freeze-thaw cycle (Lane 6, WB FT 1 cycle) and a cell line RNA (Lane 7, Cell line RNA). These data demonstrate that fresh whole bile displays 18S and 28S bands, while the supernatant does not, suggesting that the longer RNA species are contributed by the free floating cells in bile. Next, we noted that keeping whole bile at RT for 1 h results in weakening of the 18S and 28S bands, suggestive of RNA degradation, probably induced by cell lysis. Further, storing the cell pellet at RT for 3 h results in RNA degradation. Storing whole bile at RT for 24 h results in complete disappearance of 18S and 28S bands, as is the case after 1 freeze-thaw cycle of whole bile. Panel B demonstrates that the measured quantity of miR-21 decreases with a single freeze-thaw cycle and with whole bile storage at room temperature (50% after incubation at RT for 1 h and 300-fold by incubation at RT for 24 h). The x-axis shows various bile handling conditions (freeze-thaw, incubation of whole bile at RT for 1 hour and 24 hours, respectively). The Y-axis shows the measured quantity of miR-21.

Supplementary Figure 4. PKH67 staining of biliary extracellular vesicles. EVs were extracted from a bile specimen (shown in Panel A) and from a control specimen (PBS, Panel B). After staining with PKH67, EVs were placed in culture with human cells. In the figure, EVs stained with PKH67 from the bile specimen can be identified with fluorescence microscopy inside human cells that uptook them. Of note, there were no EVs in the control arm (Panel B). In both Panel A and B, from left to right, we displayed a light microscopy picture, then a fluorescent picture and last, an overlay.

Supplementary Table 1. Detailed diagnosis information regarding control patients. We made all efforts to include patients with PSC, choledocholitiasis and other conditions with obstruction in the biliary tree that could mimic CCA.

Supplementary Table 2. Toy example illustrating how miR binary vectors are combined using the union Boolean set operation. Binarized miR expression values can be combined into a single multi-miR feature. Here, the three-miR feature in row 4 is a feature representing the overexpression of miRx or miRy or miRz.

Supplementary Table 3. CCA diagnosis established by the 5-miR panel or CA19-9 in PSC patients. Among the patient included in our study, there were 4 patients with a PSC diagnosis that were subsequently diagnosed with CCA. Note that a PSC patient with CCA was not diagnosed by CA19-9, but it was correctly diagnosed by the 5-miR panel (CCA 38).

Supplementary Table 4. List of CCA patients diagnosed exclusively by the 5-miR panel or by CA19-9. A number of 11 CCA patients were diagnosed by the 5-miR panel, but not by CA19-9. Of note, 8 of these (72%) were patients without lymph node or distant metastatic implants. These patients could be explored for surgical resection. In contrast, only 2 out of 6 patients diagnosed exclusively by CA19-9 were without lymph node or distant metastases. In addition, please note that the 2 early cancers in our cohort (T1N0M0) were missed by CA19-9, but diagnosed correctly by the 5-miR panel.

