Nipple aspirate fluid—A liquid biopsy for diagnosing breast health

Purpose Nipple secretions are protein‐rich and a potential source of breast cancer biomarkers for breast cancer screening. Previous studies of specific proteins have shown limited correlation with clinicopathological features. Our aim, in this pilot study, was to investigate the intra‐ and interpatient protein composition of nipple secretions and the implications for their use as liquid biopsies. Experimental design Matched pairs of nipple discharge/nipple aspirate fluid (NAF, n = 15) were characterized for physicochemical properties and SDS‐PAGE. Four pairs were selected for semiquantitative proteomic profiling and trypsin‐digested peptides analyzed using 2D‐LC Orbitrap Fusion MS. The resulting data were subject to bioinformatics analysis and statistical evaluation for functional significance. Results A total of 1990 unique proteins were identified many of which are established cancer‐associated markers. Matched pairs shared the greatest similarity (average Pearson correlation coefficient of 0.94), but significant variations between individuals were observed. Conclusions and clinical relevance This was the most complete proteomic study of nipple discharge/nipple aspirate fluid to date providing a valuable source for biomarker discovery. The high level of milk proteins in healthy volunteer samples compared to the cancer patients was associated with galactorrhoea. Using matched pairs increased confidence in patient‐specific protein levels but changes relating to cancer stage require investigation of a larger cohort.


Introduction
Although the number of women, aged 40 or less, diagnosed with breast cancer is relatively low, they experience more aggressive forms of the disease with poorer clinical outcome [1,2]. They are often at high risk due to a genetic predisposition toward the disease, of which mutations in breast cancer type 1 susceptibility protein (BRCA1) and breast cancer type 2 susceptibility protein (BRCA2) tumor suppressive genes are the best characterized [3]. Awareness of these variants, by germ line genetic testing, informs the patient of the life-time risk of susceptibility to the disease compared to the general (2 of 11) 1700015

Clinical Relevance
There remains an unmet need to provide high risk premenopausal women with a regular and convenient alternative to mammography (reduced accuracy primarily due to breast density and jeopardizing patients with further exposure to radiation) for breast cancer screening. Tissue biopsies provide valuable diagnostic and prognostic information to support selection of treatments once tumors have been detected, and genomics has identified high penetrance genes to indicate those women at highest risk, but neither approach helps to detect the earliest manifestations of the disease. Detection of cancer biomarkers in blood receives much attention, but suffers from massive dilution in circulation compared to the dis-eased tissue of origin. Nipple discharge and nipple aspirate fluid are naturally occurring liquids secreted by the ducts and lobules, and hence have the potential to provide important diagnostic information regarding breast health. In this pilot study, we used proteomics to analyze paired NAF samples, to identify the protein profiles of volunteers and patients.
The results indicate that matched pairs have similar protein composition but there are significant differences between individuals. The data can be diagnostic of breast health however a longitudinal patient study is required to establish protein changes that relate to cancer stage.
population, but it does not tell the patient when the disease will occur [4].
Mammography successfully detects breast cancer in postmenopausal women (98% sensitivity), but is less effective in younger women due to image obfuscation by breast density [5,6]. Also, mammography lacks the high specificity in differentiating between benign and malignant growths and also between microcalcifications associated with low grade ductal carcinoma in situ (DCIS), which may not require surgery, and higher grade DCIS that may progress to an invasive tumor [7], thereby resulting in overdiagnosis and overtreatment [8]. Hence, the discovery of a mutation may result in elective surgery to remove both breasts or prophylactic administration of tamoxifen, with consequential side effects before the disease has occurred [9]. Therefore, new methods for the early detection of breast cancer are required to support high risk younger women. The search for diagnostic biomarkers of breast cancer has been extensive and proteomics strategies increasingly employed as part of the discovery process [10]. Plasma is by far the most common biofluid used, but putative markers are massively diluted relative to the site of origin of the cancer, thereby reducing sensitivity [11]. As an alternative, we have chosen to analyze secretions from the cells lining the ducts and lobules of the breast that manifest as a spontaneous nipple discharge or nipple aspirate fluid (herein collectively referred to as NAF), collected by massage or breast pump, thereby differentiating them from liquid biopsies obtained by lavage or needle extraction.
NAF comprises a diverse range of biological materials such as micronutrients (tocopherols, cholesterols, carotenes) [12], hormones (estradiol, estrone, progesterone, and testosterone) [13], carbohydrate antigens (Thomsen Friedenreich and Tn) [14], microRNA [15], and microbes [16] as well as proteins. It has multiple advantages as a liquid biopsy for detection of breast cancer: (i) premenopausal women are more likely to produce NAF than postmenopausal women where ductal atrophy may prevail [17], (ii) NAF expression is noninvasive, causing minimal discomfort compared to breast cancer screening procedures [18], (iii) it enables procurement of matched pairs of samples to provide an intraindividual comparison of the diseased with the healthy breast, (iv) biomarkers remain highly concentrated for analysis compared to blood and urine, and (v) minimal sample preparation is required, compared to tissues, thereby excluding yield-reducing protein extraction steps. NAF collection can be challenging, often using microcapillaries, but recently Guthrie cards were employed, though subsequent proteomic analysis only identified high abundance proteins [19]. NAF volumes are small, but protein concentrations are sufficient to enable analyses with state-of-the-art mass spectrometric techniques. A number of strategies to determine the NAF proteome coverage have been undertaken many of which have been summarized by Pavlou et al., as part of a comparison with their own dataset of 854 proteins [20]. More recently, similar studies by Brunoro et al. [21] and Kurono et al. [22] identified 557 and 372 proteins, respectively. Mostly these studies have focused on optimizing protein and peptide separation using single breast samples from each patient.
Our objective in this paper is to deal with the fundamental definitions of NAF composition in paired samples and determine if the proteins present constitute biologically and physiologically relevant information for diagnosing breast health.

Patients and sample collection
NAF samples were collected from breast cancer-free (defined here as healthy) volunteers and breast cancer patients, who presented to Bradford Teaching Hospitals NHS Trust, in a prospective study between 2013 and 2016. All participants gave written informed consent to undergo bilateral nipple aspiration. The study protocol was approved by University of Bradford's Independent Scientific Advisory Committee (reference: application/13/051). Ethical approval was given by Leeds (East) Research Ethics Committee, reference 07/H1306/98+5. Before aspiration was attempted, the nipple was initially cleansed with an alcohol pad. NAF collection from cancer patients was performed under general anesthetic by the clinical team, prior to surgery, assisted by massaging the breast and the drop of liquid collected from the nipple surface using a sterile pipette. After collection, the samples were transferred to chilled, prelabeled tubes containing a freeze-dried protease inhibitor cocktail mixture (Roche Diagnostics, Burgess Hill, UK), and frozen within 30 min of collection. NAF from healthy volunteers (HVs) was collected in a similar manner but by the individuals themselves. Where possible, NAF samples were collected from both breasts. From a bank of 112 NAF samples (comprising 55 pairs and 57 single samples), 15 pairs were selected for study and characterized for volume and color prior to further analysis (Supporting Information Table 1). Samples were centrifuged to remove particulate matter, the protein concentration measured using the Bradford assay and paired aliquots analyzed by SDS-PAGE.

Proteomic analysis
An aliquot of each NAF sample was reduced, alkylated, and digested overnight using modified sequencing grade trypsin (see Supporting Information Materials and Methods). Digests were desalted, lyophilized, and then resuspended in 10 mM KH 2 PO 4 in 25% v/v acetonitrile, 0.01% w/v sodium azide, pH 3.0. The digests were subject to strong cation exchange chromatography (SCX) with peptides fractionated using stepwise increases in potassium chloride concentration. The SCX desalted fractions were desalted and lyophilized.

Fusion orbitrap analysis
The lyophilized SCX fractions were resuspended in 0.1% FA and analyzed in triplicate on a nano-LC UltiMate 3000 capillary HPLC system coupled to an Orbitrap Fusion TM Tribrid TM Mass Spectrometer (see Supporting Information Materials and Methods). Samples were applied to a C 18 , 300 m × 5 mm, 5 m diameter, 100Å PepMap precolumn before transfer to a C 18 , 75 m × 50 cm, 2 m diameter, 100Å PepMap column. A binary solvent system was used for chromatographic separations composed of 0.1% FA in 2% acetonitrile and 0.1% FA in 100% acetonitrile. Data-dependent acquisition using dynamic scan management was performed, generating full MS spectra in the Orbitrap and MS/MS acquisition in the ion-trap.

MS/MS fragment mass lists were searched using Proteome Discoverer version 2.1 and Mascot software version 2.4 (see Supporting Information Materials and Methods). Only
Master Proteins (i.e., containing at least one unique peptide) were accepted. Protein quantitation was defined as the sum of the peak areas of the three strongest parent signals. Quantitation was normalized for cross-sample comparison. Pearson correlation coefficient calculated to determine gross similarities of paired samples, Student t-tests used to identify significantly (p < 0.05) expressed proteins and FunRich 2.1.2 used to compare proteomes and identify common proteins. Database for Annotation, Visualization and Integrated Discovery (DAVID) version 6.8 was used for functional annotation [23], STRING version 10.0 for protein-protein interaction analysis [24], TMHMM Server v. 2.0 for membrane association, the Plasma Proteome Database used for comparison with the NAF proteome, the Kyoto Encyclopedia of Genes and Genomes (KEGG) database to identify metabolic pathways, and the Early Detection Research Network (EDRN) to identify breast cancer biomarkers.

NAF sample characterization
Our preliminary objective was to characterize matched pairs of NAF using basic biochemical procedures, measuring the protein concentration and amount, and visualization by SDS-PAGE, which has not been reported previously. NAF samples from 100 breast cancer patients and HVs were grouped into four clinical stages-invasive carcinoma (IC), DCIS, benign lesions, and healthy. From these, 15 pairs (two noncancer, two benign, one DCIS, and 10 ICs) were selected and characterized for volume (varying from 4 to 500 L) and protein concentration (3-70 mg/mL (Supporting Information Table 1). Samples were analyzed by SDS-PAGE demonstrating that, in the majority of cases, pairs from the same individual had similar profiles (Fig. 1A, Supporting Information Fig. 1 A-C). Some cases exhibited a dominant serum albumin band, suggesting a high plasma content (Fig. 1A, Case 2), while others had a relatively low albumin presence (Fig. 1A, Cases 1, 3, and 4). Based on sample color there was no indication of blood in the former group, no apparent correlation with disease compared to healthy, and hence was not indicative of tissue damage or tumor invasiveness. Four matched pairs, a HV (Case 4, HV), a patient with benign phyllodes tumor (Case 9, PB), a patient with DCIS (Case 10, PD) and a patient with IC (Case 12, PI) (Supporting Information Table 1), were selected for proteomic analysis, based on similar protein concentration and protein quantity (to minimize samples preparation variation) . Venn diagrams illustrating (C). overlapping identities with the data from the NAF study by Pavlou et al [20], and D. overlapping identities with the Human Plasma Proteome database.

NAF proteomic analysis
For NAF to be a useful clinical sample it was important to establish the quality and complexity of proteomic data that can be achieved using 2D-LC/MS separation. From 2D-LC/MS analysis of all eight samples, a total of 1990 gene products were identified (p < 0.05) (Supporting Information Table 2), with an average of 1265 proteins per sample (SD ± 185) (Fig. 1B). Prior to this study, the most complete proteomics profile of NAF was that of Pavlou et al. [20]. Comparison with our dataset, based on gene identity (691 entries Pavlou et al., and 1919 for our set, excluding immunoglobulin isoforms), indicated an overlap of 563 proteins (Fig. 1C), however our current study illustrated substantial progress in NAF characterization identifying 1374 new proteins not previously seen in NAF.
As plasma is by far the most commonly used and most completely characterized liquid biopsy for diagnosing disease, including breast cancer [10], we wanted to establish if the NAF proteome is likely to provide unique insights. The Plasma Proteome Database comprises the collated quantitative data for 10 546 proteins that have been detected in plasma and serum using immunoassays or mass spectrometric techniques (http://www.plasmaproteomedatabase.org/) [25]. A comparison of the NAF profile with the plasma proteome identified 1578 proteins in common (Fig. 1D), however 332 proteins (21% of the total NAF profile) were unique to NAF indicating excellent potential to provide molecular information specific to breast health.
Pairwise comparison of the NAF proteome complements exhibited more than 50% likeness in composition (Supporting Information Fig. 2). Profiles for matched pairs from the same individual showed the greatest similarity with 1017 of a total of 1282 proteins, 1374/1685, 948/1350, and 1082/1382 common for HV, PB, PD, and PI pairs, respectively ( Fig. 2A). When the quantitative data (sum of the peak areas of the 3 most abundant unique peptides per protein) were included, the bilateral pairs, again showed greatest positive correlation (Pearson correlation coefficient values of 0.92 to 0.99) (Fig.  2B). The sample pair for the HV, however, showed the lowest correlation with the other three cases, suggesting a unique constitution.

NAF composition
The 567 proteins common to all eight 2D-LC analyses were manually categorized based on location using UniProt (release 1026_07) into (i) intracellular, (ii) extracellular stroma surrounding the breast cells, or (iii) plasma (Fig. 2C). Intracellular proteins were further subdivided into cytoplasmic, membrane-linked (i.e., containing transmembrane domain, GPI or lipid anchors), or organelle-specific location. Of the common proteins, 25% are normally found in plasma, 14% function within the extracellular space, and 61% are from cell components. Of the 346 cellular proteins, 45% are normally found in the cytoplasm, 36% are membrane associated, 7% lysosome, 8% endoplasmic reticulum/Golgi apparatus, 2% mitochondrion, and 2% in the nucleus. The nuclear and mitochondrial proteins were particularly underrepresented, normally contributing 14 and 6% of the total human proteome, respectively [26,27].
Submission of the total NAF proteome to the TMHMM transmembrane protein search engine identified 415 proteins with transmembrane regions and a further 147 proteins with GPI-anchor, N-terminal, or cys-modified lipid attachments were identified from UniProt. Hence, 27% of the proteins present in NAF are normally found anchored in membranes and consequently provide a valuable source of information regarding cell integrity within breast tissue. Using STRING analysis, 74 of the membrane proteins were linked to cell adhesion (FDR 2.88 × 10 −24 ), 50 proteins involved in glycoprotein metabolism (FDR 2.88 × 10 −24 ), and 86 proteins have receptor activity (FDR 1.04 × 10 −21 ).
Although some are case specific, among the receptors detected were those associated with cancer signaling pathways, including epidermal growth factor receptor (EGFR), transforming growth factor beta receptor type II and III, hepatocyte growth factor receptor (HGFR), vascular endothelial growth factor receptor (VEGFR1), mast/stem cell growth factor receptor (KIT), and receptor tyrosine-protein kinase Erbb-3 (Fig.  2D). Furthermore, 21 mitogenic growth factors were detected including pleiotrophin, macrophage colony-stimulating factor 1 (CSF1), transforming growth factor beta, and those produced by cancer-associated fibroblasts; proepidermal growth factor (EGF), insulin-like growth factors I (IGF1) and II, and platelet-derived growth factors C (PDGF-C) and D (Table 1). Downstream processes activated by ligand-receptor binding, including protein kinases and protein phosphatases that play a role in MAPK-, NOTCH-, and Wnt-signaling pathways, were also identified.
In addition, cell adhesion proteins were common in NAF including CEACAMs 1, 5, 6, and 8, NCAM2, BCAM, ALCAM, ECAM, MCAM, and ICAM1, 14 cadherin/protocadherin proteins, and five integrin subunits. Basement membrane proteins, extracellular matrix (ECM) proteins, and proteoglycans, including laminins, mucins, collagens, and fibulins, which play an important role in cell-ECM and cell-cell interactions, were prevalent (see Supporting Information Table 2). Laminins form a complex comprising a heterotrimer of alpha, beta, and gamma subunits linked by disulphide bridges, constitute a key component of basement membranes and have an essential role in the structure and function of ECM [28]. Of the five laminin isoforms detected, alpha5, beta2, and gamma1 were the most abundant in all NAF samples, indicating the expression of the specific heterotrimer laminin-11 (or laminin-521) normally found in the glomerular basement membrane of the kidney, in the neuromuscular synaptic cleft and in placenta [29]. Overall NAF samples comprised of a high proportion of proteins functioning in the tumor biomarkers such as urokinase-dependent plasminogen activator (uPA) and plasminogen activator inhibitor (PAI-I), particularly for their role in ECM turnover associated with cancer invasiveness [30]. Our analysis of NAF identified approximately 100 proteolytic enzymes, including uPA, MMP9, and matrilysin, which function to modulate stromal composition. Within this group were ten members of the kallikrein family, including prostate specific antigen (PSA/KLK3) (see Supporting Information Table 2). Previously, an inverse correlation of PSA levels (measured by immunofluorometric assay) in NAF, with progressive breast cancer (DCIS to metastatic), has been described [31,32].
There were six proteins (kallikrein 6, ATP-binding cassette sub-family C member 11, secretoglobin family 3A member 1, mammaglobin-A, prolactin-inducible protein [PIP], and mucin-like protein 1) that are most strongly expressed in breast tissues (compared to all other tissues), and although not cancer specific, may prove useful indicators of breast health. PIP has previously been explored as a NAF-derived biomarker of breast cancer by proteomics and ELISA methods, and expression found to correlate with pre-/postmenopausal status and cancer stage [33].
The National Cancer Institute has coordinated the research of many institutions to accelerate the identification and validation of early stage cancer testing and detection (https://edrn.nci.nih.gov/). Of the 195 breast cancer proteins and genes under investigation by the NCI Early Detection Research Network, 46 are present in NAF of which 22 were detected in all eight samples and seven were not found in plasma (Table 2). Among the candidate biomarkers detected in NAF were C-C motif chemokine 28, CSF1, EGFR, VEGFR1, VEGFA, ICAM1, KIT, HGFR, MMP9, metalloproteinase inhibitor 1 (TIMP1), osteopontin, and Toll-like receptor 2.
Cytochrome P450 3A4 (CYP3A4) was detected at low abundance in two NAF samples, right breast of the benign patient and left breast of the IC patient, and verified by western blotting (Supporting Information Fig. 3). CYP3A4 plays an important role in converting tamoxifen to N-desmethyl-4hydroxytamoxifen, which has a 30-to 100-fold higher affinity for estrogen receptor than tamoxifen [34]. CYP3A4, measured by immnuohistochemistry in normal and cancer breast tissue biopsies was found to be prognostic for patient response to docetaxel [35,36] and by activity assay and western blot to correlate with ifosfamide activation [37]. The presence of CYP3A4 in NAF provides a unique opportunity to screen for patients who are most likely to respond to prophylactic tamoxifen treatment.

Diagnostic application
One of our objectives was to understand whether analyzing matched pairs would provide a more specific approach to detecting disease compared to normal. SDS-PAGE showed that most matched pairs have similar protein band patterns, which was corroborated by the high correlation of proteomics profiles. While some proteins were significantly different in bilateral samples, a more extensive longitudinal study is required to determine statistically valid differences between disease and healthy breast of an individual.
Nevertheless, the proteomic profiling of the HV (Case 1) provided important diagnostic feedback relating to the cause of nipple discharge. As already noted, the proteomic profiles from Case 1 exhibited least correlation with the three disease cases (Fig. 2B). A two-tailed Student t-test of HV, using the average normalized sum of the three strongest peak areas for each protein, from two breast analyses, compared to the equivalent data for the three cancer samples, identified 331 proteins that were present at significantly different levels (p < 0.05)(Supporting Information Table 3). The proteomic signature for Case 1 indicated high levels of milk proteins. Of the 20 most abundant proteins observed by Beck et al., the most complete proteome study of human milk to date, 17 were also present in the top 50 most abundant detected in the NAF samples of Case 1 [38] (Fig. 3). Case 1 presented at the outpatient clinic with a spontaneous milky pus discharge when either nipple was squeezed. The reproductive history of the volunteer, aged 48, indicated, she had achieved parity three times (with the first birth at age 38). Small amounts of milk or serous fluid expression can persist for months or years after weaning, but Case 1 did not engage in breastfeeding. Further investigation of her medical records, however, indicated that she had been prescribed amitriptyline, for depression and stress management, and omeprazole and lansoprazole for gastric esophageal reflux. In rare cases, these may cause breast tissue enlargement and nipple discharge, which is associated with galactorrhea rather than cancer [39].

Conclusions
In this study, NAF samples were collected by manual massage techniques prior to surgery or when the patients presented with a natural discharge at outclinic. Of those volunteers who consented, the success rate in NAF expression was approximately 50%. In order to provide a suitable screen approach for breast cancer specific biomarkers, increased success in expression and collection will be required. Improved expression rates can be achieved with the aid of oxytocin nasal spray [18,40] as well as application of manual or mechanical pumps  that are normally used for milk expression by mothers with preterm infants [41].
Our aim was to determine whether NAF has the potential to provide diagnostic value in screening for breast cancer. The possibility of using an internal control sample from the healthy breast for comparison with the diseased breast was considered. The complement of proteins in matched pairs showed strong similarity, probably due to transport through cross-lymphatic drainage, which may make symptomatic differentiation challenging. Conversely, single samples from patients, where expression is poor, would be sufficient for clinical diagnosis of disease-related biomarkers. In this respect, we have identified double the number of proteins previously detected in NAF, including 300 not found in plasma Figure 3. Representative profiles of milk proteins significantly increased in the HV compared to the three cancer patients (PB, PD, and PI). Quantitation is defined as the ratio of area for the specific protein relative to the median of the area for the protein complement. Each value is the average of the values for the left and right breast measurements.
(10 of 11) 1700015 and 24% of the markers currently part of the NCI Early Detection Research Network studying breast cancer. The current breast cancer markers, growth factors, and receptors which have been detected in plasma, required a number of independent approaches (immunoassays and MS), whereas we have the potential to develop a single quantitative, multiplexed, target method by multiple reaction monitoring MS, utilizing valuable NAF samples efficiently. Furthermore, the composition of NAF was dominated by proteins representative of the basement membrane, extracellular milieu, and interstitial fluid surrounding breast cells, with roles in tissue stability, cell adhesion, and cell-cell communication. Future NAF proteomic analysis will aim to investigate if changes in the proteins correlates with stromal disruption and degradation as cancer cells proliferate and migrate into the surrounding normal tissue environment. Overall, the study has identified many physiologically and oncologically important proteins that warrant a more expansive study of a larger cohort of patients and HVs.
Proteomics research was supported by Yorkshire Cancer Research projects, BPP047 and B381PA. The authors would like to thank the patients and volunteers for providing samples for this research.
The authors have declared no commercial conflicts of interest.