In the following section, we develop a modeling approach that allows us to quantify the sources of information that enter into individual naming decisions, and how these information sources have changed over time. Our primary goal in these analyses was to predict the entire popularity distribution of names for each year of the SSA data set given assumptions about how the social environment leading up to that year influenced the decisions of individual parents. By making a number of simple assumptions about the decision strategies used by individual parents, and the sources of information that enter into these decisions, we are able to infer changes in the ‘‘parameters’’ of naming behavior over the last century and to model the effect that these changes have had on the overall distribution of names. In addition, our modeling analyses allowed us to test the hypothesis developed in the last section that present-day parents are increasingly influenced by the ‘‘momentum’’ of a name in the recent past.

#### 4.1. Using the model to predict the distribution of names

Recall that in the random-drift model, naming choices are made according to the relative frequency of a name in the previous generation (Eq. 1). Thus, we assume that the probability that an individual parent would choose name *i* at time *t* should be a function of the relative prevalence of name *i* among previously named babies and the degree of novelty-seeking/innovation in the culture (i.e., mutation rate in the random-drift model). Building on this basic framework, we included a number of additional assumptions that we felt enriched the ecological validity of the model. For example, in the Hahn et al. model, decision makers sample a name with replacement from the immediately preceding generation (owing to its foundation as a model of *genetic* drift processes). While this assumption is necessary to model genetic transmission, in a cultural context this assumption is somewhat unrealistic. Regardless of how one treats time in the random-drift model (i.e., does one generation equal 1 year or 1 day?), in real life, names can be selected not just from the immediately preceding time period but from any number of previous time periods. Thus, in our extension of the model, we assumed that expecting parents’ decisions reflect the overall probability of encountering an individual with a given name, which is biased over time. Formally, for each year, a recency-weighted estimate of the popularity of each name was computed via the following temporal difference (TD) equation (Sutton & Barto, 1998):

- (2)

where is a long-running estimate of the value/frequency of name *i* at time *t*, is the probability of encountering name *i* among those born in year *t* as given by the SSA data set, and *α*_{g} is a parameter that controls the degree to which the current estimate depends on the most recent naming information, . Note that this equation is equivalent to a simple exponential decay over time, the rate of which is controlled by the parameter *α*_{g} with larger values reflecting a stronger weighting of recent information. Thus, Eq. 2 holds that the estimated value of a name is an exponentially weighted average of the probabilities of encountering that name, such that one is more likely to hear more recent name tokens, and less likely to come across infrequent or particularly old names.

When attempting to predict the naming distribution for year *t*, Eq. 2 was iteratively applied for all years up to *t*−1 using a single value of *α*_{g}. For example, when trying to predict the relative name frequencies for 1950, the value of for each name was initialized to zero, then Eq. 2 was used to update the values of the for each successive year starting at 1880 and ending at 1949. This process was repeated for each year, thus predicting 1951 means starting over at 1880 with a new setting of the *α*_{g} and stopping at 1950. Using this procedure, we found the setting of *α*_{g} that best predicted the entire name distribution for each overlapping 5-year interval of the SSA data set (year-to-year fits find a similar result but are more influenced by idiosyncratic noise from year to year). We interpret changes in the best-fit value of *α*_{g} from one period to the next as measuring generational differences in the types of information that parents rely on in making their naming decisions.

The resulting estimates of the ‘‘value’’ or prevalence of each token are assumed to then bias individual decisions. In particular, the final values of were converted into a predicted choice probability according to:

- (3)

where reflects the *predicted* probability of a parent choosing name *i* in year *t* (as opposed to , which is the empirical probabilities) and *N* is the total number of non-zero name tokens. In addition, we assumed that there is some probability that an individual invents a novel name. This tendency was captured by a single parameter, *μ*_{g}, and was implemented by subtracting a small probability, , from each name.^{5} Thus, each existing (predictable) name loses some of its market share in order to accommodate innovation or novelty seeking (cf. Xu et al., 2008). Like the ‘‘cultural memory’’ parameter, *μ*_{g} was assumed to be shared among all members of a generation but could vary from one generation to the next.

Our model instantiates the basic principles of the random-drift model in a fairly direct way (names are chosen in proportion to their popularity in the recent past, while a small percentage of individuals choose novel names). In addition, this analysis allowed us to assess the predictive utility of the model’s central principals (relative to its already established ability to generate power-law-shaped distributions; Hahn & Bentley, 2003). Most importantly, the fitting of the model allowed estimation of period-to-period changes in the ‘‘memory’’ (*α*_{g}) and novelty-seeking (*μ*_{g}) parameters in the population by comparing the best-fit value of these parameters for each period (parameters were estimated by maximizing log-likelihood of the observed name distribution: The full details of the fitting procedure are described in the Appendix). In order to verify that the model provides an adequate account of the data (over and above some less interesting alternatives), we compared the fit quality for the random-drift account against a number of baseline models (see Appendix). These analyses confirmed a superior fit for each time period for this model.

Fig. 6 shows the results of these fits. The left panel shows the changes in the best-fit innovation rate parameter (*μ*_{g}) and the right panel shows the best-fit memory parameter (*α*_{g}) for each decade and for both male and female names. Overall, the model recovers our intuition that in the period following 1950, there has been an increase in the probability of parents choosing a name that goes against the current name distribution (i.e., increasing *μ*_{g}, faster for female than for male names). In addition, it appears that the more recent name lists are better fit by considering a smaller window of prior history relative to the early part of the century (i.e., a larger *α*_{g}). For example, the best-fit value of *α*_{g} steadily increases until around 1960. Interestingly, in the last 20 years, the best-fit value of *α*_{g} has trended downward, suggesting that present-day parents may be integrating over a longer window of recent history than were parents in the middle part of the last century (perhaps reflecting the ability in recent time to search for names through online resources). In addition, early sex differences in the *α*_{g} parameter for males and females appear to be dissipating. Importantly, the fact that the *α*_{g} value remains below 1.0 means that one does a better job predicting each year’s list using a recency-weighted estimate of past naming behavior than using last year’s list alone. This challenges the assumption in the standard random-drift model that names are copied from only a previous time step and confirms the intuition that name choice is best described as a sampling process that aggregates over time. The negative year-to-year correlations for names in the early part of the century are thus described as a consequence of the longer cultural memory over which naming choices integrated (leading to regression to the mean). Overall, the basic principals in the random-drift model appear to provide both a predictive (demonstrated here) and generative account (Hahn & Bentley, 2003) of the distribution of names in the culture by positing a process of frequency-dependent sampling and random mutation.

#### 4.2. The MILEY model: Measuring cultural changes in memory, choice momentum, and innovation

In our second set of model-based analyses, we extended the predictive random-drift model above to include a bias that favored names that have increased in popularity in recent time and away from names that have fallen. The new model, named MILEY (Momentum Influences Liking Each Year), derives its name from the fastest growing girl’s name in 2007 (which was not present in 2006, but debuted at no. 278 in 2007). As before, our goal was to fit the entire distribution of names each year using past choice data and to recover parameters that reveal the changes in individual decision strategies over time.

For each year, a recency-weighted estimate of the popularity of each name was computed using Eq. 2. In MILEY, a second equation is used to estimate the more recent popularity of each name:

- (4)

where *γ*_{g}>*α*_{g}. Thus, both and estimate the popularity of a name in the recent past. However, given the parameter constraint, is an estimate of the more recent popularity of the name, while tracks the longer term popularity. Parents are assumed to compare the recent popularity of a name, , with the long-running average, , in order to detect the ‘‘momentum’’ associated with the name:

- (5)

Names that, in the recent past, have gathered more adherents relative to the long-running average will have a positive momentum score. In contrast, names that very recently have gathered fewer adherents than would be expected given the long-running popularity will have a negative momentum score. Thus, the momentum term indexes the degree of surprise or deviation that an observer would have about a recent popularity of a name relative to its long-term popularity. In other words, names that people detect are outpacing their long-term average popularity are assumed to be positively biased, while names that are underperforming relative to the average are negatively biased. Note that this prediction is also somewhat consistent with Berger and Mens (2009) in that names that grow slowly are expected to have less momentum associated with them because their long- and short-term estimates are always similar (i.e., is closer to zero). In contrast, very fad-like names that in one year strongly outpace their average long-term growth are predicted to continue to rise more quickly. Estimates of long-term popularity provided by Eq. 2, and the estimates of the direction of recent change provided by Eq. 5 were combined to generate a final choice probability:

- (6)

where once again reflects the probability of a person choosing name *i* in year *t*. Parameter *β*_{g} controls the influence that momentum has on the current estimate of the value of a name. The momentum term in this equation () multiplicatively combines the current estimate of the value of the name and the estimate of the change in time capturing the finding in Fig. 4 that momentum appears stronger for more common names. Thus, is positive if the short-term popularity of a name is higher than average, and it is negative if the short-term popularity of a name is lower than average. The combined sum, , was constrained to be positive; thus, relatively unpopular names for whom the contribution of momentum would make the estimated prevalence negative were simply predicted to disappear from next year’s list. As before, we assumed that some percentage of the parents choose a novel name according to parameter *u*_{g} and that this tendency simply reduces the probability of existing names by a small factor. The predictive random-drift model is thus a special case of the more general MILEY account (where *β*_{g} = 0). As in our fits with the simpler random-drift model, the best-fit values for *μ*_{g}, *α*_{g}, *γ*_{g}, and *β*_{g} were found by maximizing the log-likelihood of the actual distribution reported by the SSA in overlapping 5-year windows.

Fig. 7 shows the results of these fits. The top left panel shows the changes in the best-fit innovation rate parameter (*μ*_{g}), the top right panel shows the best-fit memory parameter (*α*_{g}), the bottom left shows the best-fit ‘‘recent’’ memory parameter (*γ*_{g}), and the bottom right shows changes in the weighting of momentum (*β*_{g}) for each year and for both male and female names. For almost every period the inclusion of the momentum term in MILEY provided an improved fit when compared with the random-drift model described in the previous section (see the Appendix for more details on the model comparison). Consistent with our previous fits, MILEY captures the fact that in the period following 1950, there has been steady increases in the probability of parents choosing a name that goes against the current name distribution (increasing *μ*_{g}).

Most importantly, MILEY captures changes in the way that recent changes in name prevalence influence choice. This is particularly clear in the panel showing the best-fit value of *β*_{g}, which shows a gradual increase in the weight given to the momentum term in Eq. 6 over the entire data set (generally the model adjusted *γ*_{g} so that the more recent name popularity was influenced only the previous year). Overall, the recovered period-to-period changes in the model parameters are broadly consistent with the idea that recent naming decisions more heavily weight both recent name frequency information *and* recent changes in popularity. In this sense, the model provides additional insights into the data patterns reported above. Our simple model assumes that agents are influenced by the distribution of names in the past and are biased toward names whose recent popularity outstrips its long-running popularity. The shift from anti-correlated year-to-year changes to positively correlated changes thus reflects the combined forces of parents basing decisions on both recent popularity, and recent deviations from the norm (i.e., ‘‘momentum’’). Also significant is that the incorporation of momentum in the MILEY model significantly improved the fit over the extended random-drift account for each year in our data set (see Appendix).