Empirically based comparisons of the reliability and validity of common quantification approaches for eyeblink startle potentiation in humans

Abstract Startle potentiation is a well‐validated translational measure of negative affect. Startle potentiation is widely used in clinical and affective science, and there are multiple approaches for its quantification. The three most commonly used approaches quantify startle potentiation as the increase in startle response from a neutral to threat condition based on (1) raw potentiation, (2) standardized potentiation, or (3) percent‐change potentiation. These three quantification approaches may yield qualitatively different conclusions about effects of independent variables (IVs) on affect when within‐ or between‐group differences exist for startle response in the neutral condition. Accordingly, we directly compared these quantification approaches in a shock‐threat task using four IVs known to influence startle response in the no‐threat condition: probe intensity, time (i.e., habituation), alcohol administration, and individual differences in general startle reactivity measured at baseline. We confirmed the expected effects of time, alcohol, and general startle reactivity on affect using self‐reported fear/anxiety as a criterion. The percent‐change approach displayed apparent artifact across all four IVs, which raises substantial concerns about its validity. Both raw and standardized potentiation approaches were stable across probe intensity and time, which supports their validity. However, only raw potentiation displayed effects that were consistent with a priori specifications and/or the self‐report criterion for the effects of alcohol and general startle reactivity. Supplemental analyses of reliability and validity for each approach provided additional evidence in support of raw potentiation.

Potentiation of the defensive startle reflex in the presence of a threatening stimulus relative to a neutral, nonthreatening stimulus is a well-validated translational measure of negative affect used in affective and clinical science (see Grillon & Baas, 2003;Vaidyanathan, Patrick, & Cuthbert, 2009, for reviews). Widely accepted guidelines exist for recording and measurement of the startle response in humans (Blumenthal et al., 2005;Stern, Ray, & Quigley, 2001). Recommendations also exist for the quantification of startle modification broadly (e.g., Berg & Balaban, 1999) and the quantification of prepulse inhibition specifically (e.g., Blumenthal, Elden, & Flaten, 2004;Hawk & Cook, 2000). However, empirically based guidelines for quantifying startle potentiation in humans have yet to be established despite the frequent use of this measure in clinical and affective science. This omission is nontrivial because commonly used approaches to quantify and analyze startle potentiation may yield qualitatively different and contradictory conclusions about the effects of focal manipulations or group differences on negative affective response (Grillon & Baas, 2002;Walker & Davis, 2002). In particular, these quantification approaches vary substantially in how they adjust for individual and/or group differences in the influence of activity in the primary startle circuit during neutral conditions. In this report, we review these common approaches to the quantification of startle potentiation in humans and empirically compare them in a simple shock-threat task.
Neuroscientists (Davis, 2006;Koch, 1999) describe how acoustic startle response magnitude is affected by two neural circuits during shock-threat tasks in rats: a primary, obligatory circuit and a secondary, modulatory circuit. The obligatory circuit consists of a simple neural pathway from the cochlear root neurons through the nucleus reticularis pontis caudalis to the spinal cord (whole body startle) or facial motor nucleus (pinna reflex). This obligatory circuit is engaged by the startle probe, which is a reflex-eliciting stimulus that is intense and has a rapid rise time. The modulatory circuit involves both direct and indirect projections from the central nucleus of the amygdala to the reticularis pontis caudalis. This modulatory circuit potentiates the startle response when elicited in the presence of a threatening stimulus that predicts an aversive outcome (e.g., threat cue that signals electric shock) relative to a neutral, nonthreatening stimulus (e.g., no-threat cue). However, the functional form (e.g., additive, multiplicative) of this modulatory input to the primary, obligatory circuit has not yet been precisely determined. Therefore, multiple approaches have been proposed to quantify startle potentiation due to the many possible forms of this modulatory input. These approaches differ in their explicit or implicit assumptions about how best to adjust potentiation scores across individuals, groups, and experimental conditions that may differ with respect to the level of activity in the primary, obligatory circuit.
The three most commonly used approaches to quantify startle potentiation are based on (1) raw potentiation, (2) standardized potentiation, or (3) percent-change potentiation. In the first raw potentiation approach, startle potentiation is quantified as the difference between raw (i.e., untransformed) startle response in the threat versus no-threat conditions. These raw potentiation scores then serve as the dependent measure in an analysis of variance general linear model (ANOVA/GLM) that includes the other focal manipulations or group IVs. 1 In contrast to the next two approaches we describe, no adjustment is applied to the magnitude of these raw difference scores. However, it should be noted that the magnitude of raw potentiation is typically greater for participants with higher startle response in neutral conditions (Bradford, Kaye, & Curtin, 2014). Use of raw potentiation scores has been our laboratory's longstanding, preferred approach (e.g., Baskin-Sommers, Curtin, & Newman, 2011;Bradford, Shapiro, & Curtin, 2013;Curtin, Lang, Patrick, & Stritzke, 1998;Curtin, Patrick, Lang, Cacioppo, & Birbaumer, 2001;Hogle & Curtin, 2006;Moberg & Curtin, 2009).
In the second standardized potentiation approach, the raw startle response is first standardized at the level of individual trials using a within-subject T score (or statistically equivalent z score) transformation (e.g., Yancey, Vaidyanathan, & Patrick, 2014). This yields transformed trial-level startle responses with the same overall mean and standard deviation for all participants. Startle potentiation is then quantified as the difference between the standardized startle responses in the threat versus no-threat conditions. Subsequent analysis is comparable to the raw potentiation approach described above. Standardization of the startle response is common although numerous subtle differences exist across common standardization methods (e.g., mean response calculated across all trials vs. intertrial interval [ITI]; standard deviation pooled within or across conditions). Recent and classic examples of these standardization methods include Bradley, Codispoti, Cuthbert, and Lang, 2001;Grillon et al., 2015;Levenston, Patrick, Bradley, and Lang, 2000;Nelson et al., 2013;Sege, Bradley, and Lang, 2014;Yancey et al., 2014. Within-subject standardization adjusts the size of each participant's potentiation scores by the variability of their raw startle response. This is accomplished by dividing participants' raw startle responses across trials by the standard deviation of these responses (or a subset of these responses; e.g., from the ITI). This adjustment increases the potentiation scores for participants with low variance while decreasing the potentiation scores of participants with large variance. This adjustment may be useful if the magnitude of participants' modulatory difference between threat and no-threat is artifactually dependent on the variance of their responses across individual trials. Nonetheless, raw and standardized startle response approaches will produce comparable results when the size of within-subject effects is consistent despite individual differences in response variance. The current experiment was not designed to manipulate the relationship between effect size and response variance to explicitly and sensitively contrast raw and standardized approaches to the quantification of startle potentiation. Despite this, we report analyses of the standardized potentiation approach to be complete.
In the third percent-change potentiation approach, startle potentiation is calculated from the raw startle response as a percent change from the no-threat to the threat condition: ([raw startle in threat 2 raw startle in no-threat]/raw startle in no-threat)*100. Following this, analysis of percent-change potentiation proceeds as with the analysis of raw and standardized potentiation described earlier (for examples of the percent-change potentiation approaches, see Gazendam et al., 2014;Jovanovic et al., 2014;Rich et al., 2005;Vanman, Mejia, Dawson, Schell, & Raine, 2003).
The percent-change approach adjusts the size of each participant's potentiation scores by the magnitude of their response in the no-threat condition. This is accomplished by dividing participants' threat versus no-threat difference scores by their startle response in the no-threat condition. This adjustment increases the potentiation scores for participants with low startle response in the no-threat condition and decreases potentiation scores for participants with high startle response in this condition. This adjustment may be useful if the magnitude of participants' modulatory difference between threat and no-threat is artifactually greater when they have higher response in the no-threat condition. Because of this unique adjustment for magnitude of response in the no-threat condition, analysis of percent-change startle potentiation can yield qualitatively different conclusions from the raw and standardized approaches in experiments where focal manipulations (e.g., drug administration; Grillon et al., 2015;Grillon, Sinha, Ameli, & O'Malley, 2000;Rodr ıguez-Fornells, Riba, Gironell, Kulisevsky, & Barbanoj, 1999) or groups (e.g., patient groups; see Vaidyanathan et al., 2009, for review) exhibit systematic differences in the magnitude of the startle response in the absence of threat. Grillon and Baas (2002) raised concerns about the potential for divergent conclusions from the raw versus percent-change potentiation approaches. Furthermore, they explicitly called for empirical comparisons between these approaches. Later that year, Walker and Davis (2002) compared the raw and percent-change approaches in rodents across five IVs, which were expected to affect startle response in the no-threat condition but not necessarily startle potentiation. Although they report a preference for percent change, the basis for this conclusion is equivocal. Two of the IVs in these experiments did not clearly support either approach (i.e., startle probe intensity; corticotropin releasing hormone [CRH] administration). Two of the IVs are difficult to interpret because the rodents' fear response may have been expected to covary with the IV (i.e., participants grouped by general startle reactivity; strychnine administration). For the final IV (i.e., unsignaled footshock), potentiation was more stable when measured by raw potentiation than percent change. Furthermore, it can be argued that guidelines for 1. We explicitly calculate and analyze startle potentiation difference scores for the raw and standardized potentiation approaches to allow us to use identical analysis models across these two approaches and the percent-change approach. Of course, the results reported from analyses of these within-subject difference scores are statistically equivalent to the results that would be obtained from analyzing startle response (rather than potentiation difference scores) and including threat as a withinsubject IV.

1670
D.E. Bradford et al. quantification/analysis of startle potentiation in humans may be more confidently established from experiments with humans given differences in measurement across species (e.g., full body startle vs. eyeblink electromyography) that may affect the measures' respective statistical properties (e.g., floor/ceiling effects).
In this report, we compare startle potentiation quantification approaches in a simple shock-threat task in humans with three experimental manipulations (probe intensity, time, and alcohol administration) that have well-established, robust effects on startle response in no-threat conditions. Given their effects on startle response, these manipulations should produce a divergent pattern of results for the percent change relative to the raw and standardized approaches. Critically, we also chose these three manipulations because they afforded clear (and verifiable) assertions for their expected effects on participants' fear of the shock threat (i.e., stable fear across probe intensity and time, reduced fear following alcohol administration). In addition to these three manipulations, we measured general startle reactivity (measured as baseline startle reactivity in this study) to allow us to explore the relationship between individual differences in general startle reactivity and startle potentiation across the three quantification approaches. We collected self-reported fear/anxiety of the shock threat to further substantiate our predictions regarding the expected effects of each of these IVs in the shock-threat task. We propose that the following pattern of IV effects should be observed given a valid quantification of startle potentiation: 1. Intensity (95 vs. 100 vs. 105 dB). Startle response varies systematically with the intensity of the eliciting startle probe (Blumenthal & Berg, 1986;Cuthbert, Bradley, & Lang, 1996). A valid approach to the quantification of startle potentiation should yield stable potentiation scores across probe intensities despite this increase in startle response. This assertion follows from two basic assumptions. First, participants' fear following presentation of a shock-threat cue should not vary based on the independent intensity of the subsequent startle probe. Second, contemporary neuroscience suggests that the startle probe is processed by and impacts the obligatory but not modulatory circuit of the startle response (Bradley, Lang, & Cuthbert, 1993;Davis & File, 1984;Walker & Davis, 2002). 2. Time (first vs. second half of the experiment). Startle response habituates over repeated probe trials across the first and second half of experiments (Bradley et al., 1993). A valid approach to the quantification of startle potentiation should yield stable scores across the experiment despite this reduction in the startle response. This assertion follows from early validation studies that observed habitation in the obligatory but not modulatory startle response circuits (Bradley et al., 1993;Campeau, Liang, & Davis, 1990;Davis & File, 1984). Of course, it remains possible that the inputs to the modulatory startle response circuit could vary over time. Therefore, we confirm that participants' fear/anxiety response to shock threat was stable across this experiment via self-report. 3. Alcohol (no-alcohol versus alcohol). Startle response is reduced robustly by alcohol administration (Grillon et al., 2000;Stritzke, Patrick, & Lang, 1995). Despite this reduction in startle response, a valid approach to the quantification of startle potentiation should remain sensitive to the well-documented stress response dampening (SRD) properties of this anxiolytic drug (Sher, 1987).
Specifically, alcohol has been demonstrated to reduce behavioral, subjective, and physiological indicators of fear/anxiety and to diminish amygdala response to threat using fMRI (Armeli et al., 2003;Bartholow, Henry, Lust, Saults, & Wood, 2011;Levenson, Sher, Grossman, Newman, & Newlin, 1980;Sher, Bartholow, Peuser, Erickson, & Wood, 2007;Sripada, Angstadt, McNamara, King, & Phan, 2011). We also confirm that alcohol reduced participants' fear/anxiety to shock threat in this experiment via self-report. 4. General startle reactivity. Startle response in experimental tasks is strongly positively related to general startle reactivity measured during a baseline procedure (Bradford, Kaye, & Curtin, 2014). We do not offer a precise a priori specification regarding the appropriate relationship between general startle reactivity and startle potentiation during shock threat given valid quantification of startle potentiation. However, recent theory and empirical evidence suggests that general startle reactivity may index individual differences in defensive reactivity to aversive stimuli generally (Bradford, Kaye, & Curtin, 2014;Vaidyanathan et al., 2009). If true, general startle reactivity should be positively related to startle potentiation in the shock-threat task. Alternatively, if individual differences in general startle reactivity indexes sources of variance that are independent of affect and/or defensive response, general startle reactivity and startle potentiation during shock threat should be unrelated. We explore the relationship between general startle reactivity and fear/anxiety to shock threat via self-report.
In addition to evaluating the stability and sensitivity of the three startle potentiation quantification approaches across these four IVs, we also conduct supplemental analyses of the reliability (split-half internal consistency) and validity (criterion correlations with selfreport) for each approach.

Method Participants
We recruited 96 participants (49 female; mean age 5 22.1 years, SD 5 2.0 years) from the university community. Participants were at least 21 years old, had experience within the last year with the study dose of alcohol, reported no history of alcohol-related problems, no current psychiatric medication use, no alcohol contraindicated medical condition, and were not pregnant (verified by urine sample). We paid participants $10/h or class extra-credit points for their participation.

General Startle Reactivity Assessment
Stimulus presentation was controlled by a PC-based MATLAB script using the Psychophysics Toolbox (Brainard, 1997;Pelli, 1997). Prior to beverage assignment, we measured participants' general startle reactivity in a baseline procedure (see Startle Response Measurement below). During this assessment, participants viewed a series of yellow and blue colored squares with a diagonal of approximately 7.5 in. presented in the center of a CRT monitor for 5 s each with a 14-s ITI. No shocks were administered during this assessment.

Beverage Manipulation
We randomly assigned participants initially to one of three beverage conditions from the standard balanced placebo design (Rohsenow & Marlatt, 1981): alcohol (N 5 48), no-alcohol/told alcohol (i.e., placebo; N 5 24), and no-alcohol/told no-alcohol (N 5 24). We informed participants in the alcohol and no-alcohol/told alcohol (placebo) conditions that they would receive a moderately impairing dose of alcohol that should produce a blood alcohol concentration (BAC) of approximately 0.08%.The alcoholic beverage consisted of 100-proof vodka (Smirnoff Blue Label) and a juice mixer, with the juice accounting for three quarters of the drink volume. We calculated the alcohol dose to produce a target BAC of 0.08% approximately 30 min after beverage consumption (see Curtin & Fairchild, 2003, for details regarding the dosing formula). Participants assigned to the no-alcohol/told alcohol (placebo) condition received a beverage consisting of fruit juice mixed with water poured from a vodka bottle in their presence. Outside of participants' view, beverages in the alcohol and no-alcohol/told alcohol (placebo) conditions were misted with alcohol, and 2 ml of alcohol was floated on top of the beverages to provide sensory stimulation to support the placebo manipulation. Participants in the no-alcohol/told no-alcohol condition simply drank juice mixer matched to the total drink volume of the beverages in the other two conditions. We divided beverages in all three conditions into two drinks, each consumed over 15 min, for a total drinking period of 30 min. The experimental session began 15 min after the end of the drinking period. We measured BAC via breathalyzer (Alcosensor IV; Intoximeters Inc., St. Louis, MO) immediately before, at the midpoint of, and immediately after completion of the main shock-threat task.
The use of separate no-alcohol/told alcohol (placebo) and noalcohol/told no-alcohol conditions from the balanced placebo design is common in alcohol administration research to rule out the possibility of alcohol expectancy effects. If expectancy effects are not observed, these two no-alcohol conditions can be combined to provide equal N alcohol and no-alcohol conditions. Preliminary analyses coded two regressors to test contrasts among the three beverage conditions in this experiment. However, no significant differences were detected between the two no-alcohol conditions for the primary dependent variables. Therefore, we combined these two no-alcohol conditions and proceeded with a single beverage condition regressor that contrasted alcohol (N 5 48) versus noalcohol (N 5 48) in the final analyses (see Data Analysis Strategy below).

Shock Tolerance Assessment
Five minutes after the drinking period, we measured participants' subjective shock tolerance to a series of 200-ms electric shocks of increasing intensity (7 mA maximum) using standard procedures (Curtin et al., 2001). We administered electric shocks using a custom shock stimulator (Bradford, Magruder, Korhumel, & Curtin, 2014) via stainless steel electrodes across the distal phalanges of the index and ring fingers of the left hand. The procedure was stopped once participants reached the maximum level of shock that they could tolerate. We set shock intensity during the main task to each participant's subjective maximum tolerance threshold to minimize individual differences in shock tolerance.

Shock-Threat Task
Participants viewed a series of 84 shock-threat and no-threat square cues (equal-probable) presented in color on a CRT monitor for 5 s each separated by a variable ITI (10-14 s, mean 5 12 s). Cues were either blue or yellow in color (counterbalanced for shock and noshock across participants). The diagonal of the cues measured approximately 7.5 in. Cues were positioned in the center of the computer monitor. We instructed participants that shocks would be administered during the majority of the threat cue presentations and not during presentation of the no-threat cues or ITIs. Shocks occurred 4.8 s after cue onset. Actual shock contingency for threat cues was 50%. We measured self-reported fear/anxiety of the shock threat (1 5 not at all fearful/anxious; 7 5 extremely fearful/ anxious) at midpoint and task completion.
We measured eyeblink startle response to 50-ms white noise probes with near instantaneous rise time. All noise probes were 100 dB during the baseline procedure. We manipulated noise probe intensity across three levels (95, 100, and 105 dB) during the shock-threat task. We presented six noise probes at 4.5 s postcue onset during the baseline procedure. We presented 48 noise probes (24 each for threat and no-threat) at 4.5 s postcue onset during the shock-threat task. We presented an additional 24 noise probes (eight per probe intensity) during the ITIs in the shock-threat task to decrease probe predictability. We matched serial position of probes across probe intensity and cue types (threat vs. no-threat) within participants in two counterbalanced orders. We also presented three habituation probes at the start of the baseline and shock-threat tasks that were not included in any analyses. A minimum of 14.5 s separated each probe from any previous startleeliciting event (i.e., another probe or shock).
We conducted offline data processing using the PhysBox plugin (Curtin, 2011) within the EEGLAB toolbox (Delorme & Makeig, 2004) in MATLAB (The Math Works Inc., Natick, MA). We followed published guidelines for startle response reduction and processing (Blumenthal et al., 2005;Bradford, Magruder et al., 2014). Specifically, we high-pass filtered (4th order 28 Hz Butterworth filter), epoched (250-250 ms surrounding probe), rectified, and smoothed (4th order 30 Hz Butterworth low-pass filter) the data. We rejected trials with greater than 6 20 lV deflections in the 50ms preprobe baseline as artifact (i.e., unstable baseline). We scored peak eyeblink startle response between 20 and 100 ms postprobe onset relative to mean 50-ms preprobe baseline. 2 We calculated general startle reactivity as the average startle response to the six probes during cues in the baseline assessment. We calculated startle potentiation in the shock-threat task for the raw potentiation approach as the difference between raw startle responses during threat versus no-threat cues. We calculated startle potentiation for the standardized approach as the difference between standardized startle responses during threat versus no-threat cues following within-subject T score standardization 3 of the raw startle response. We calculated startle potentiation for the percent-change approach from raw startle responses during threat and no-threat cues as ([threat 2 no-threat]/no-threat)*100.

Open Science Practices
We support emerging open science guidelines (Nosek et al., 2015). Following these guidelines, we have made the data and analysis scripts associated with this report publicly available via Open Science Framework. These materials can be accessed at osf.io/5nfvu

Primary Comparison of Startle Potentiation Quantification Approaches
We analyzed raw potentiation, standardized potentiation, and percent-change potentiation in separate GLMs using R (R Development Core Team, 2014). 5,6 We included additive effects to model repeated measures for probe intensity (95 vs. 100 vs. 105 dB) and time (first vs. second half). We also included additive effects for between-subjects regressors for beverage condition (no-alcohol vs. alcohol) and general startle reactivity (measured quantitatively and mean centered) in all models. In all analyses, we coded the beverage condition regressor such that within-subject effects (i.e., probe intensity, time) were evaluated in the no-alcohol condition. We followed up significant omnibus effects of probe intensity with three planned pairwise contrasts using Fisher's LSD (least significant difference) approach to protect against inflation of familywise error (Kirk, 1995). We report both GLM coefficients (b) and partial etasquared (g p 2 ) to describe effect sizes.
Probe intensity. Figure 1 displays the effect of probe intensity for each quantification approach. We proposed that a valid approach for startle potentiation should be stable across probe intensities 3. Trial-level raw startle responses (i) during the no-shock and shock cues in the main task were standardized within-subjects using a T-score transformation based on each participant's (j) raw startle response mean (M j ) and standard deviation (SD j ) across their 72 trials in the main task (excluding the three habituation trials) using the following formula:. TStartle ij 5 ([RawStartle ij 2 M j )/SD j ) * 10 1 50 4. We made the decision to include the self-report measure of fear/ anxiety after data collection was initiated, which resulted in N 5 71 participants available for self-report analysis (36 alcohol and 35 no-alcohol participants). We conducted supplemental analyses to confirm that the effects we report for our three quantification approaches (raw startle potentiation, standardized startle potentiation, and percent-change startle potentiation) for our four primary IVs (probe intensity, time, beverage condition, and general startle reactivity) were comparable across subsamples of participants who did and did not provide self-report of fear/ anxiety. Specifically, we added self-report data available (yes vs. no) as a factor to our primary analyses for each of the three quantification approaches. This factor did not significantly moderate the effects of any of the four IVs across any of the three approaches, which confirms that the reported effects are comparable in these two subsamples.
5. We conducted case analyses to identify participants who were GLM outliers (i.e., studentized residual with Bonferroni corrected p < .05; Fox, 1991) for the primary analyses of the three quantification approaches. These case analyses resulted in the exclusion of eight participants (3 alcohol, 5 no-alcohol) from analysis for raw potentiation and five participants (4 alcohol, 1 no-alcohol) from analysis for percentchange potentiation. No GLM model outliers were identified for standardized potentiation. The pattern of significant/nonsignificant results is identical for raw potentiation with and without these GLM outliers included. The pattern of results for percent change is also identical with one exception. The significant increase in percent-change potentiation for high versus low probe intensities that is observed with GLM outliers removed (41.3%, p 5 .023) is larger in magnitude but only trend level when GLM outliers are included (47.3%, p 5 .074) due to the larger standard error produced by retaining the model outliers (26.1 vs. 17.8, with and without GLM outliers, respectively).
6. Startle potentiation scores had the following distributional shapes: raw potentiation (skew 5 2.2, kurtosis 5 6.5), standardized potentiation (skew 5 0.2, kurtosis 5 20.5), percent-change potentiation (skew 5 3.2, kurtosis 5 13.7). We chose to present results in this report without transforming these scores to correct their distributional shape (e.g., positively skewed) because such transformations are uncommon in startle research. Furthermore, distributional transformations can hinder interpretability of psychophysiological data (Stern et al., 2001). However, we performed supplemental analyses of transformed raw and percent-change potentiation scores (square root and log, respectively) to confirm that our conclusions were robust to this issue. Standardized potentiation scores were already normally distributed in this particular experiment. The results were essentially unchanged for the raw potentiation approach. Results were also consistent for the percent-change approach except that the effect of time was no longer significant. used to elicit and measure the response. 7 Consistent with this, the effect of probe intensity was not significant for either raw potentiation, F(2,166) 5 .79, p 5 .457, g p 2 5 .01, or standardized potentiation, F(2,182) 5 1.54, p 5 .217, g p 2 5 .02. In contrast, percent-change potentiation was not stable across probe intensities, F(2,172) 5 3.51, p 5 .032, g p 2 5 .04. Percent-change potentiation decreased with increasing probe intensity, and pairwise contrasts indicated a significant difference between 95 and 105 dB, b 5 241.3, t(86) 5 2.32, p 5 .023, g p 2 5 .06.
Beverage condition. Figure 3 displays the effect of beverage condition (no-alcohol vs. alcohol) for each quantification approach. We expected that alcohol would significantly reduce negative affect based on the large literature documenting its anxiolytic, stress response-dampening properties. In addition, alcohol reduced self-reported fear/anxiety in this experiment as reported earlier. Consistent with our prediction and self-report results, alcohol sig-nificantly reduced raw potentiation, b 5 215.1, t(83) 5 3.91, p < .001, g p 2 5 .16. In contrast, alcohol did not significantly change either standardized potentiation, b 5 21.02, t(91) 5 1.15, p 5 .254, g p 2 5 .01, or percent-change potentiation, b 5 4.9, t(86) 5 .28, p 5 .779, g p 2 5 .00.
General startle reactivity. Figure 4 displays the relationship between general startle reactivity and startle potentiation for each quantification approach. We did not offer strong a priori predictions regarding the expected relationship between general startle reactivity and startle potentiation, although we suggested that either a positive or no relationship could be supported by existing theory and/or empirical evidence. As reported earlier, a significant positive relationship was observed between general startle reactivity and self-reported fear/anxiety in this experiment. Consistent with self-report, general startle reactivity and raw potentiation were significantly positively related, b 5 0.1, t(83) 5 4.03, p < .001, g p 2 5 .16. The relationship between general startle reactivity and standardized potentiation was not significant, b 5 0.0, t(91) 5 .22, p 5 .823, g p 2 5 .00. General startle reactivity and percent-change potentiation were significantly negatively related, b 5 20.2, t(86) 5 2.03, p 5 .045, g p 2 5 .05.

Supplemental Analyses of Reliability and Validity
We tested correlations between potentiation scores derived separately from odd and even trials to assess the internal consistency reliability of the three approaches. These correlations were significant for raw potentiation (r 5 .81; df 5 92; p < .001), standardized potentiation (r 5 .53; df 5 92; p < .001), and percent-change potentiation (r 5 .72; df 5 92; p < .001). Pairwise tests of differences between these correlations indicated that the correlation for the standardized approach was significantly lower than that observed for both raw potentiation (z 5 3.62 p < .001) and percent change approaches (z 5 2.14, p 5 .032). The correlations yield Spearman- 7. We used simple additive models for our primary analyses given the focal nature of our study hypotheses. However, we conducted supplemental analyses for each quantification approach that allowed probe intensity to interact with all other IVs. No significant interactions with probe intensity were observed (ps > .331), which confirms that the main effects of alcohol, time, and general startle reactivity were consistent across the three probe intensity levels.
Brown corrected internal consistency reliability estimates of 0.90 for raw potentiation, 0.69 for standardized potentiation, and 0.84 for percent-change potentiation for the full measures using all trials.
We conducted supplemental validity analyses by testing correlations of self-reported fear/anxiety during the shock-threat task with each startle potentiation quantification approach. The correlations between self-reported fear/anxiety and startle potentiation were significant for both raw (r 5 .38, df 5 67, p 5 .001) and standardized potentiation (r 5 .34, df 5 67, p 5 .005). The correlation with percent-change potentiation was nonsignificant (r 5 .17, df 5 67, p 5 .160). Tests of differences between correlations were not significant for contrasts among these three approaches (ps > .088).

Discussion
Our four IVs-probe intensity, time, alcohol administration, and general startle reactivity-all affected startle response in the shockthreat task. These effects provide an experimental context where   the three quantification approaches could yield different conclusions regarding the effects of our four IVs on startle potentiation. Analyses of participants' self-report confirmed that fear/anxiety of the shock threat did not change over the course of the experiment, was reduced by alcohol, and was positively related to general startle reactivity at baseline. These analyses substantiated our a priori assertions regarding the appropriate IV effects on startle potentiation given a valid quantification approach for startle potentiation.
With the validity of our test bed established by these manipulation checks, we were well positioned to offer recommendations regarding the quantification of startle potentiation and to identify issues that warrant further examination.

The Percent-Change Approach
Our analyses generate substantial concerns about the validity and sensitivity of the percent-change approach. First, percent-change potentiation significantly decreased with increasing probe intensity. If valid, we would have to conclude that participants' fear/anxiety of the shock-threat phasically decreased when their startle reflex was elicited by more intense probes, even though probe intensity was unpredictable and intermixed within blocks. Unless we conclude that startle response methodology is susceptible to such problematic measurement reactivity based on probe intensity, this instability associated with percent-change substantially undermines its measurement validity. Second, percent-change scores significantly increased in the second half of the experiment relative to the first half. This result for percent-change potentiation is unexpected given the repeated administration of a well-controlled aversive stimulus (i.e., electric shock) for which stable or possibly habituated response should be expected over time. More directly, the increase in percent-change potentiation over time was also discordant with participants' selfreported fear/anxiety, which marginally but not significantly decreased across time.
Third, percent-change potentiation failed to detect the expected anxiolytic effect of alcohol. In fact, percent-change scores were descriptively greater in the alcohol relative to no-alcohol condition. This conflicts with evidence that suggests that alcohol has anxiolytic, stress response-dampening properties. Moreover, this result again conflicted with participants' self-report, which confirmed the reduction in fear/anxiety by alcohol in this experiment.
Fourth, general startle reactivity at baseline was negatively related to percent-change potentiation. General startle reactivity may index individual differences in defensive reactivity to aversive stimuli generally (Bradford, Kaye, & Curtin, 2014;Vaidyanathan et al., 2009). Alternatively, individual differences in general startle reactivity may be independent of affect and/or defensive response. Given this, we proposed that a valid approach to the quantification of startle potentiation may yield either a positive or null effect for general startle reactivity. Of note, we observed that general startle reactivity was positively related to self-reported fear/anxiety of the shock threat in this experiment, which provides further support to expect a positive relationship. Regardless, it is difficult to explain the observed significant negative relationship between general startle reactivity and percent-change potentiation, which further undermines confidence in the validity of this quantification approach.
Percent change demonstrated adequate reliability, comparable to the raw potentiation approach. However, no significant correlation was observed between percent-change potentiation and self-reported fear/anxiety of the shock threat. The aggregate of these findings for the percent-change approach substantially undermine its validity.
Use of the percent-change approach for startle potentiation has likely emerged because of longstanding concerns about how the magnitude of responding in neutral conditions influences responses to experimental task stimuli for many psychophysiological measures. This issue has been famously described as the law of initial value (LIV; Wilder, 1967). Wilder's initial formulation of the LIV proposed that "the higher the initial [neutral] value, the smaller the response to function-raising, and the larger the response to functiondepressing stimuli" (Wilder, 1967, p. viii). However, others subsequently proposed that, when examined appropriately, higher initial values generally lead to increased psychophysiological reactivity except at the upper limits of the measure (Jin, 1992;Myrtek & Foerster, 1986). In fact, empirical tests of the LIV have spurred considerable debate over if, when, how, and for which measures the LIV manifests in psychophysiology, and it may be the exception rather than the rule for most psychophysiological measures (Berntson, Uchino, & Cacioppo, 1994;Furedy & Scher, 1989;Geenen & Van De Vijver, 1993;Jin, 1992;Stern et al., 2001). The percent-change approach appears to have emerged to adjust for LIV in the form proposed by Jin and others (Jin, 1992;Myrtek & Foerster, 1986). For startle potentiation, this would be appropriate if the input from the modulatory circuit was positive and multiplicative. If such LIV were present, the adjustment provided by percent change would have yielded stable startle potentiation across changing levels of startle response in the no-threat condition due to probe intensity and time. This was not the case. Instead, the observed pattern of results from these IVs suggests a functionally additive input from the modulatory startle circuit consistent with the raw and standardized approaches.

The Raw Versus Standardized Startle Potentiation Approaches
As we acknowledged earlier, the IVs in this experiment were selected to affect startle response in the no-threat condition. This provided an opportunity to carefully contrast the percent-change and raw potentiation approaches. Given the nature of the within-subject standardization transformation, a sensitive contrast of raw and standardized approaches would require direct manipulation of the variance of participants' responses individually and within specific experimental conditions. Nonetheless, the current experiment afforded us a preliminary opportunity to compare these two approaches. Both raw and standardized potentiation approaches yielded stable startle potentiation scores across both probe intensity and time manipulations. Thus, confirmation of the stability of both approaches across these two IV manipulations provides support for their validity.
The raw potentiation approach was sensitive to the welldocumented anxiolytic effect of alcohol, which we confirmed in this experiment via self-report (Armeli et al., 2003;Bartholow et al., 2011;Levenson et al., 1980;Sher et al., 2007;Sripada et al., 2011). In contrast, although alcohol descriptively reduced startle potentiation quantified by the standardized approach, this effect was not significant. This putative loss of power for the standardized approach may have resulted from substantial reductions in the variance of raw startle responding across trials for participants in the alcohol condition (e.g., see standard deviations by beverage condition in Table 1). The standardized approach may add noise by amplifying the threat effects of a subset of these participants whose trial variance may be near the floor due to robust reductions of startle magnitude by alcohol. It is possible that the standardized approach may have similar problems with power in experiments more generally when "undetected" nonresponders are included because many of their trials that contain only noise artifact are incorrectly classified as true responses (i.e., false positives) because of conservative methods for the identification of these no-response trials and/or removal of nonresponders from the sample.
The exploratory analyses of general startle reactivity also produced divergent results across the raw and standardized approaches. General startle reactivity was positively correlated with raw potentiation but uncorrelated with standardized potentiation. If general startle reactivity is unrelated to the strength of participants' defensive reactivity to threats, then standardized potentiation appears to be superior by removing this artifact. However, general startle reactivity may index real differences across individuals in their propensity to respond to threats. It would not be surprising to us if participants that respond more strongly to the aversive startle probe also respond more strongly to other aversive threats such as shock. If true, the positive correlation between general startle reactivity and raw potentiation may represent valid differences in participants' responding to threats generally that may be informative and best examined by including general startle reactivity in the analytic model (Bradford, Kaye, & Curtin, 2014). The observed significant relationship between general startle reactivity and self-reported fear/ anxiety of the shock threat in this experiment offers some support for this latter perspective. Of course, more definitive support would require further empirical evidence regarding the psychobiological construct indexed by general startle reactivity.
Our supplemental analyses of reliability provide some support for the raw versus standardized approach. The internal consistency of the raw potentiation approach was the highest of the three approaches (0.90), though only modestly higher than percent change (0.84). However, the internal consistency for the standardized approach was significantly lower overall (0.69) and possibly low enough to impact on statistical power. In contrast, criterion validity correlations with self-reported fear/anxiety were approximately comparable in magnitude and significant for both raw and standardized potentiation (rs 5 .38 and .34, respectively).
There are other reasons to be cautious about the use of standard scores. As noted in the introduction, there are currently numerous related but distinct methods used within and across laboratories that standardize startle response. Unfortunately, few laboratories publish the specific standardization formula they use, and the ability to choose between standardization formulas may represent an undesirable "researcher degree of freedom" (Simmons, Nelson, & Simonsohn, 2011). In addition, standardizing the effects of IVs based on within-participant standard deviations that may vary across samples, and/or experimental designs may degrade comparisons across experiments regarding IV effect sizes. Furthermore, the use of raw but not standardized response allows for the presentation of signal-averaged waveforms if these waveforms are deemed useful to portray effects. Such waveforms are not typical for studies measuring eyeblink startle potentiation but are commonly displayed when the postauricular reflex is measured (e.g., Benning, Patrick, & Lang, 2004). Fridlund & Cacioppo (1986) described how standardizing responding can sometimes produce unanticipated, suboptimal results. Specifically, they demonstrate that, even if raw response levels across all trials in two conditions are perfectly replicated across experiments, the ordering of the two associated condition means can be artifactually reversed simply by including a third condition that produces more extreme responding. They conclude that standard scores may introduce such problems in any experiment that does not elicit the full range of responding for all participants. Given how difficult this may be to do, they caution against routine and exclusive use of standardized scores (see pp. 583-584 and their Table 1).
Unfortunately, Fridlund and Cacioppo's (1986) early caveats regarding standardization did not motivate sufficient additional direct empirical comparisons of the raw and standardized approaches to yield definitive recommendations regarding these competing approaches. Thus, we strongly call for additional research on the topic that can lead to clear guidelines for the field to follow. When competing approaches exist, it is often considered 1678 D.E. Bradford et al. conservative to recommend that results from both approaches be reported. This recommendation may be appropriate in a measure's infancy but report of conflicting results across quantification approaches does not help advance understanding of the phenomenon of interest and can undermine subsequent use of that measure.

Other Considerations and Future Directions
We have no reason to believe that our results should not generalize to the quantification of startle potentiation to threats other than shock (e.g., aversive auditory or tactile stimulation; Miller, Curtin, & Patrick, 1999;Schmitz et al., 2011). Our results also may extend to other paradigms that manipulate affective response such as the affective picture-viewing task (Lang, 1995), but this should be confirmed empirically. Of interest, the percent-change approach has not been recommended or frequently used for analysis in the picture-viewing task even when other focal manipulations or groups differ with respect to startle response in the neutral condition. Nonetheless, the empirical evidence presented in this report should further reinforce avoidance of this approach in picture-viewing tasks. Both raw and standardized potentiation approaches are commonly used in these tasks (for raw, see Baskin-Sommers, Curtin, & Newman, 2013;Smith, Bradley, & Lang, 2005;Stritzke et al., 1995; for standardized, see Bradley et al., 2001;Levenston et al., 2000;Sege et al., 2014).
Research directly contrasting the raw and standardized approaches for the analysis of startle response in the picture-viewing task is needed given the central role this task plays in affective science. We believe it is equally important to explicitly acknowledge that our conclusions are unlikely to extend to the use of the startle response to measure processes other than affective response. For example, clear, evidence-based guidelines currently recommend the use of percent-change scores for the quantification of startle prepulse inhibition (PPI; Blumenthal et al., 2004). Of course, startle potentiation and PPI are directionally different, thus the impact of the respective denominators on each measure's scores will vary. Perhaps more importantly, startle potentiation and PPI index different psychological constructs (fear/anxiety vs. sensory attentional gating for startle potentiation and PPI, respectively) that have distinct neural modulatory mechanisms (Davis, Walker, Miles, & Grillon, 2010;Hawk & Cook, 2000;Koch, 1999;Swerdlow, Geyer, & Braff, 2001). As such, each likely requires different quantification approaches. Nonetheless, our field needs further direct empirical comparisons and dialogue about quantification for startle potentiation, PPI, and other well-established psychophysiological measures for which multiple quantification approaches exist.
Alcohol administration provided an attractive pharmacological manipulation to contrast these quantification approaches. Alcohol has both robust effects on startle response in the no-threat condition, and its effects on fear/anxiety are well established in the literature and confirmed in the current study with self-report. Nonetheless, future research should examine alternative pharmacological manipulations. In particular, the use of drugs that change startle response magnitude but do not alter affective response would provide an important and necessary extension of the research we report here.
Results from our exploratory analyses of general startle reactivity dovetail attractively with existing research on this potentially interesting individual difference. Vaidyanathan et al. (2009) have suggested that general startle reactivity, measured independent of an affective foreground, may serve as a neurobiological indicator of dispositional defensive reactivity. Vaidyanathan, Malone, Miller, McGue, and Iacono (2014) have recently observed that individual differences in general startle reactivity are highly heritable, which positions it as a potentially attractive endophenotypic marker of defensive reactivity. Furthermore, we have observed that general startle reactivity measured at baseline can identify individuals who will subsequently display exaggerated responding to affective stimuli or more potent effects of drugs and/or drug deprivation (Bradford, Kaye, & Curtin, 2014;Bradford et al., 2013;Gloria, Hefner, Baker, & Curtin, 2015;Hogle, Kaye, & Curtin, 2010). Consistent with these observations, increased general startle reactivity was associated with greater fear/anxiety response to the shock threat when measured either via self-report or startle potentiation quantified by the raw potentiation approach in this experiment. Given these observations, future research in psychopathology and affective science should more routinely measure general startle reactivity at baseline or otherwise and formally model its effects in subsequent analyses. By anchoring general startle reactivity in a more elaborate nomological network that includes other constructs that we either measure or manipulate, we can refine and clarify this potentially important neurobiological index of fear circuitry consistent with the emerging NIMH RDoC perspective (NIMH-Negative Valence Systems: Workshop Proceedings, 2011). Of course, modeling the effects of general startle reactivity in our analyses, when significant, will further increase our statistical power to test the effects of other focal IVs as well (Bradford, Kaye, & Curtin, 2014).