STAR: Spread of innovations on graph structures with the Susceptible‐Tattler‐Adopter‐Removed model

Adoptions of a new innovation such as a product, service or idea are typically driven both by peer‐to‐peer social interactions and by external influence. Social graphs are usually used to efficiently model the peer‐to‐peer interactions, where new adopters influence their peers to also adopt the innovation. However, the influence to adopt may also spread through individuals close to the adopters, known as tattlers, who only share information regarding the innovation. We extend an inhomogeneous Poisson process model accounting for both external and peer‐to‐peer influence to include an optional tattling stage, and we term the extension the Susceptible‐Tattler‐Adopter‐Removed (STAR) model. In an extensive simulation study, the proposed model is shown to be stable and identifiable and to accurately identify tattling when present. Further, using simulations, we show that both inference and prediction of the STAR model are quite robust against missing edges in the social graph, a common situation in real‐world data. Simulations and theoretical considerations demonstrate that, when edges are missing, the STAR model is able to accurately estimate the shares attributed to the external and internal sources of influence. Furthermore, the STAR model may be used to improve the inference of the external and viral parameters and subsequent predictions even when tattling is not part of the real data‐generating mechanism.

The process of social interaction is complex, and people may have an impression and spread information regarding an innovation without actually adopting it.Bhagat et al. (2012), for instance, aiming at maximising social influence, emphasised the importance of distinguishing between product adoption and product influence.They showed that there may exist so-called tattlers who contribute to product spread but do not adopt the product themselves.These tattlers serve as information bridges in the propagation of influence and may significantly affect product adoption (Bhagat et al., 2012).According to Wang and Street (2018), distinguishing tattlers from adopters is an important step towards more realistic modelling.Consider the following situation as an example: Alma has bought a new smartphone and told her friend Bella that she really likes it.If Bella again chats with her friend Christina, Bella will propagate Alma's excitement regarding her new purchase.In this example, Christina does not need to be a close contact of Alma's, and Bella does not even need to have an opinion about the product.She passes on the influence from Alma to Christina indirectly.Similarly to Bella, tattlers usually are messengers passing word-of-mouth influence onto their contacts, thereby implicitly spreading influence (Wang & Street, 2018).Tattlers play an important role and they should be explicitly modelled as part of the internal sources of influence that underpin the adoption spread.
We propose an extension of the inhomogeneous Poisson process model introduced by Parviero et al. (2022) to specifically account for tattling behaviour.We name this extension the STAR (Susceptible-Tattler-Adopter-Removed) model.The implementation of the model still offers a computationally efficient formulation of the log-likelihood and may thus be used with (exceedingly) large social graphs.The extension builds on assuming that each individual connected to an infectious neighbour (but not being infectious themselves) may also spread influence to their susceptible neighbours, a behaviour representing tattling.When an individual turns infectious, it may thus exert influence directly and indirectly, through their neighbouring tattlers.In this new formulation of the model, the internal sources of influence are modelled both with interactions between adopters and susceptible individuals, that is, viral interactions, and interactions between tattlers and susceptible individuals, that is, tattling interactions.The construction was used, for instance, by Zhou et al. (2015) allowing individuals to influence others in an intermediate tattle state, where they were interested in the product but not enough to buy it.In the tattling state, the individuals may comment or communicate with their peers based on product information received from an adopter.The tattling individual may promote (or demote) the product with some probability p tattle , and if the customer adopts the product, he or she will promote the product with some adoption probability, as described in Parviero et al. (2022).
In practical settings it is, however, common that not all social interactions in the customer population are observed.When the modelling of these social interactions is based on a social graph, unobserved edges may deteriorate estimation and prediction performances.This was shown for the model without the tattling component in Parviero et al. (2023).As neighbourhoods in social graphs are often tightly-knit (Ugander et al., 2011), we investigate whether introducing tattling may improve estimation and prediction when edge information from the social graph is missing.
We also study this setup when the data-generating mechanism does not contain any tattling, which implies a model misspecification.The question is whether an over-parameterised model can be beneficial when edges in the social graph are missing.
The paper is organised as follows: Section 2 introduces the tattling extension of the model proposed by Parviero et al. (2022), together with an explanation of how inference and predictions will be carried out in the rest of the paper.Section 3 explores the impact of missing graph information on inference, when the proposed model is correctly specified.Section 4 presents a broad simulation study in two different settings, first exploring inference and prediction performances of the proposed model when it is correctly specified and second when the true data generating process follows the model of Parviero et al. (2022).Section 5 gives concluding remarks and discusses further research directions.

| SUSCEPTIBLE-TATTLER-ADOPTER-REMOVED (STAR) MODEL
Following Parviero et al. (2022), we consider a finite population of n individuals and an adoption process of a given product or service.In this framework, the population is modelled as a graph, where the vertices represent individuals and the edges represent social interactions between them.The fully observed population graph is denoted by G ¼ fV, Eg and captures all individuals in the vertex set V and all their relevant social interactions in the edge set E. Furthermore, the fully observed population graph is assumed to be fixed in time.
We assume that the individuals can be influenced to adopt by both internal factors, which depend on the topology of G, and by external factors, affecting all individuals simultaneously.The adoption spreading can be described by an epidemic metaphor in which adopting individuals may be viewed as 'infectious' by influencing non-adopter individuals, who may be viewed as 'susceptible'.The adoption process evolves by each vertex being assigned one of four states: Susceptible, Tattling, Adopting or Removed (STAR).The original methodology introduced by Parviero et al. (2022), in comparison, constitutes the corresponding Susceptible-Adopting-Removed (SAR) model.At t ¼ 0, all vertices of G are susceptible.When the adoption spreads, susceptible vertices become adopting and thus also 'infectious'.We assume that a vertex is in the infectious state for a predetermined period of time, during which it may affect all susceptible neighbours, that is all susceptible vertices it shares an edge with.We refer to this internal source of influence as the viral influence.The susceptible vertices that have an infectious neighbour are again also considered infectious for as long as they are exposed to a viral source of influence.This state of infectiousness represents the tattling, and the tattling vertices will revert to the susceptible state if they do not adopt while being exposed.After the time period of the infectious state, the adopting vertices go into the removed state and no longer participate in the infection mechanism.The adoption process is modelled as a point process with adoption events registered on the vertices.Given a time point t, the individuals that have not adopted, the ones in the Susceptible and Tattling states, have an associated adoption rate of The rate depends on two sources of influence: the external and internal.The external sources of influence affect each susceptible individual, regardless of their position in the population graph, while the internal sources take into account the neighbourhood of the individual through the viral and tattling component.
The first component of the internal influence, λ vir,i ðtÞ, registers all viral sources of influence directed at individual i at time t.Individual i is subject to a viral source of influence if they have some infectious neighbours.We assume an individual to be infectious for a predetermined amount of time, Δ, after adoption.The viral influence source is then modelled by the term where the set N 0 i ðtÞ collects all infectious neighbours of i at time t.The second component of the internal influence, λ tat,i ðtÞ, registers all tattling sources of influence directed to individual i at time t.Tattling sources of influence are directed towards i from tattling individuals.Tattling individuals are susceptible individuals that have an infectious neighbour.The tattling source of influence is modelled by where the set N 00 i ðtÞ collects all tattling neighbours of i at time t.Note that tattler vertex h acts as a bridge, channelling the influence of all their infectious neighbours in N 0 h ðtÞ towards susceptible individual i.Also note that infectious vertex j may belong to both N 0 i ðtÞ and N 0 h ðtÞ, for some t.This situation arises when vertices j, i and h share edges with each other, for example, forming a triangle, or more generally, when they belong to the same clique.
Each infectious individual modifies the adoption rates of their neighbours through the internal sources of influence.Viral influence is directly exerted from infectious individuals to their neighbours while tattling influence is propagated by the immediate neighbours of an infectious individual towards their other neighbours.Infectious individuals become removed after Δ amounts of time in the infectious state and no longer participate in the adoption spread.Adoption rates are then inherently time-dynamic, as each vertex that is not removed can go out of its state at any given time, and thus alter the rates of its neighbouring vertices.The adoption process then becomes a self-exciting point process (Hawkes, 1971;Reinhart, 2018).
Data regarding the evolution of the adoption process is collected during a predetermined time window ½0, T. We assume that the status and the adoption time of all individuals are known.Let K define the set of individuals that have been infected (i.e., the adopters) in ½0, T, then the data describing the adoption process are given by the pairs fv k , t k g, with v k V and t k ½0,T, 8k K.
The framework is related to the SEIR compartmental model in epidemiology (Anderson & May, 1991;Bjørnstad et al., 2020;Brauer, 2008), in which the exposed state corresponds to the tattling state.A key difference between the SEIR model and our framework lies in the probability of transitioning from the exposed state to the infectious state.In epidemiological SEIR modelling, this probability usually equals 1, while in our framework, the probability is always less than 1 as tattling individuals may revert to the non-tattling, susceptible state.Such parallels between point process models and epidemiological compartmental models have been explored by Rizoiu et al. (2018) and Kresin et al. (2022).
We parameterise the baseline levels of the different sources of influence as λ i,ext ðtÞ ¼ expfβ 0 g, λ ji ðtÞ ¼ expfα 0 g and λhi ðtÞ ¼ expfϕ 0 g.We obtain the maximum likelihood estimate θ of θ ¼ ½β 0 , α 0 ,ϕ 0 > by numerical optimisation of the log-likelihood function.Given the adoption process data in ½0, T, the log-likelihood function is given by where T i is the exposure window of vertex i towards external sources of influence, T ji is the exposure window of vertex i towards infectious neighbour j and T hi is the exposure window of vertex i towards tattling neighbour h.We define T i ¼ ½0,t i for vertices belonging to K, otherwise T i ¼ ½0,T.Furthermore, given a pair formed by susceptible vertex i and its infectious neighbour j, T ji is defined Given our assumptions, the tattling exposure window T hi may be the union of disjointed sets on the positive real line.This situation occurs when a given tattler reverts to the susceptible state after tattling and then resumes to tattle at a later time, during the observation period.We will refer to this phenomenon as flickering.For a given tattler h, let the set ft ð1Þ , …, t ðmÞ g collect the ordered adoption times of their m neighbours that became adopters for some t in ½0,T.This information permits us to understand when the tattler h started tattling and if they flickered during the observation period.Hence, t ð1Þ is the first time the tattler h was activated by an infectious neighbour.If all consecutive adoptions of the neighbours of h happened less than Δ units of time apart, the tattler h will keep tattling until t ðmÞ þ Δ.Otherwise, h must have reverted to the susceptible non-tattling state, for some t in between t ð1Þ and t ðmÞ .
The set ft ð1Þ , …, t ðmÞ g is the ordered statistic of ft j : j N , where the number of terms in the union C equals the number of times the tattler h flickered.The tattling exposure window T hi of vertex i (from tattler h), with i = 2 K, from vertex h is then Figure 1 visualises the tattling time windows of two separate tattlers, both connected with and thus directing their tattling influence to, vertices 1 and 2. The tattler on top exerts their tattling influence in just one instance (bold segment), and thus, there is no need to assess flickering.
Instead, the tattler on the bottom exerts their tattling influence in potentially three separate instances.We see that in this case, we would divide their ft ð1Þ ,…, t ðmÞ g set into three different disjointed subsets, as the three bold segments indicate.Since vertex 1 adopts during the second time segment, C1 ¼ 2, whereas for vertex 2, we have that C2 ¼ 3. Defining the exposure windows in this way permits the log-likelihood in (4) to be computed in a fast and scalable fashion, using sparse matrix representations of the external and internal sources of influence, as in Parviero et al. (2022).A detailed explanation of the computation complexity is found in Supplementary Material A in the supporting information.

| IMPACT OF MISSING GRAPH INFORMATION
When the model likelihood of a point process is correctly specified, the maximum likelihood estimates will be unbiased (Aalen et al., 2008;Ogata, 1998).In our setting, this implies that the shares of the three sources of influence will be estimated correctly.This requires that the social graph was fully observed when used in the log-likelihood.In practice, however, it is common to not observe all social interactions in the Visualisation of tattling exposure windows stemming from two tattlers in ½0, T. Notice the different behaviour in terms of flickering.
population graph G, and we introduce the term proxy graph for the partially observed version G * of G.In the case of missing edges, each proxy graph retains the same vertex set as G but has E * p ⊆ E. The proxy graph is thus defined by G * p ¼ fV, E * p g.For the rest of the paper, we assume that, for a given proxy graph, E * p is obtained by removing edges uniformly at random from E with probability p.This probability may be interpreted as the edge missingness probability.For simplicity, the subscript p will be omitted.
The direct consequence of using the limited information from G * instead of the full G in the estimation of model parameters is complicated.
Consider, for simplicity, the model in Equation ( 1) with only intercepts for each of the sources of influence.Then the estimate of the individual intensity rate defined by the complete graph G is given by The same individual intensity rate estimated using a proxy graph G * is given by where the sets N * 0 i ðtÞ and N * 00 i ðtÞ are defined as the sets N 0 i ðtÞ and N * 00 i ðtÞ using the topology of G * , instead of G. Assume that it is possible to observe the same adoption process on both G and G * , yielding the maximum likelihood estimates θ and θ * .Following from results shown by Therneau et al. (1990), we have that despite stemming from two different specifications of the individual rates, the estimated cumulative intensity of both models must equal the number of observed events in T, NðTÞ: This equality permits a comparison of the estimated parameters from the models fitted on the same adoption process, but using two different graphs.In detail, the estimated cumulative intensity, assuming the graph G, is given by whereas the estimated cumulative intensity assuming the graph G * is given by The relation in Equation ( 6) highlights the necessity of comparing how the component shares of the cumulative intensity change when estimating using G * instead of G.The shares of each component are obtainable by taking the ratio between the estimates and the number of events, NðTÞ.
Both ΛT and Λ * T are computed using information coming from the same individuals.However, on the individual level, the contribution to in which every source of exposure is weighted with their exposure time.Individual i's exposure to external sources will be correctly specified, whereas the individual's exposure to internal sources (i.e., the viral and tattling influence) is specifically determined by the topology of G * , through the sets N * 0 i and N * 00 i .One may observe that i T hi by definition, reaching equality only if none of the edges connecting individual i to the individuals in N 0 i or N 00 i are missing on G * .Observing this event for all individuals is implausible.Hence, when individual rates are specified using information from G * , the sum of the exposure time related to the internal sources, and thus their weight, will be smaller.For this reason, estimates β * 0 , α * 0 and φ * 0 will not be centred in β0 , α0 and φ0 , to avoid a contradiction with the result stated in Equation (6).

| Decomposition of the cumulative intensity under missing edges
In this section, we outline the theoretical arguments to explain how the estimates β * 0 , β * 0 , φ * 0 must adjust in order to respect the relation in Equation ( 6), when the proportion of missing edge in G * increase.The estimated cumulative intensity for the model in which individual intensity rates are specified as in Equation ( 5) is whereas specifying the individual rates using the graph topology of G * yields The different exposure of vertices to viral and tattling factors is highlighted by using the sets V * 0 , V * 00 , where by definition V * 0 ⊆ V 0 and V * 00 ⊆ V 00 .
For any G * , the true viral exposure on G can be split in three parts: Here, the set V * 0 collects all vertices that are exposed to viral influence on G * , the set W * 0 collects all vertices that are not exposed to viral influence on G * but retain some exposure to tattling sources and the set U * 0 collects all vertices that lose all exposure to both viral and tattling sources on G * .In detail, the set W * 0 is defined as This set contains all vertices that are in the same position as vertex A in the right panel of Figure 2. Lastly, U * 0 is defined as We observe that, by definition, For the tattling influence, a parallel argument is made for V 00 .Thus, we have that , with W * 00 defined as W * 00 ¼ fi : i V 00 ^i = 2 V * 00 ^i V * 0 g and U * 00 defined as U * 00 ¼ fi The tattling exposure is then split into three parts as well: F I G U R E 2 Schematic illustration of the tattling effect under partially observed graphs.The left-hand side illustrates a fully observed graph with nodes A, B and C all connected (solid lines).A has adopted the product and influences nodes B and C (red arrows).The right-hand side illustrates the situation when the edge between A and C is missing (dashed line).Node A can then only influence B directly (red arrow), while B tattles to C (black arrow), thereby further communicating the influence.The tattling may thus partly recapture the lost influence from A to C due to the missing edge.
Hence, for either type of internal source of influence, we defined a set that comprises all vertices that are correctly observed as exposed to viral influence on G * (V * 0 and V * 00 ), a set that comprises all vertices that are not observed as exposed to a certain type of internal source but are still exposed to the other on G * (W * 0 and W * 00 ) and, lastly, a set that comprises all vertices that are observed to not be exposed to either type of internal source (U * 0 and U * 00 ).
Viral and tattling exposure contained in the sums i T hi is unobserved on G * , but it is linked to components of the true intensity.Exploiting the results of Equation ( 6), we propose the following definition.
i T hi always a positive quantity, B β * 0 is expected to increase as edge missingness probability p increases.
Regarding the viral sources of influence, we have that all vertices in W * 00 lose their exposure to tattling sources but are still exposed to viral sources of influence on G * .Since tattling sources are a particular type of internal source, we expect that α * 0 compensates for this portion of missing exposure.The analogue value for α * 0 , A α * 0 is defined in the following.
Definition 2. A α * 0 is defined as the value satisfying When α * 0 is equal to A α * 0 , the estimated viral share of Λ * T is equal to the component of the true viral share associated with vertices that are still observed to be exposed to viral sources of influence, that is, the vertices in V * 0 , plus the component of the true tattling share associated to vertices that are not observed to be exposed to tattling sources on G * but are still observed to be exposed to viral sources, that is, the vertices in W * 00 .
Lastly, the definition of the value for φ * 0 follows the same logic, expecting φ * 0 to compensate for the missing viral exposure of the vertices in When φ * 0 is equal to F φ * 0 , the estimated tattling share of Λ * T is equal to the component of the true tattling share associated with vertices that are still observed to be exposed to tattling sources of influence, that is, the vertices in W * 0 , plus the component of the true viral share associated to vertices that are not observed to be exposed to viral sources on G * but are still observed to be exposed to tattling sources, that is, the vertices in V * 00 .Finally, we explore if the STAR model may be beneficial when G is not fully observed, even when no tattling source is present and the model therefore represents a deliberate model misspecification.Given the topology of social graphs, we expect that allowing for tattling interactions may retrieve viral interactions when they are undetected due to missing edges.Assume that an adoption process is observed on both G and G * , with no tattling in the data-generating mechanism.Figure 2 shows a simplified graph as fully observed, on the left, and as a proxy graph with a missing edge, on the right.Since the adoption process only allows for viral interactions, shown in red, vertices B and C are correctly observed as exposed to internal influence only on G, on the left.On the right, vertex C is not reached by the internal influence shown in red.However, if the adoption process includes tattling, shown in blue, vertex C will again become exposed to internal sources on G * on the right, as vertex B will carry over the influence received from A to C. We expect that the introduction of the tattling component may yield improved estimates of the total internal intensity share under missing edges, even if the viral and tattling shares might be incorrectly estimated individually.This hypothesis will be investigated in Section 4.2.

| SIMULATION STUDY
We conduct a simulation study to first assess the stability and identifiability of the maximum likelihood estimates of the STAR model and then explore how estimates behave when obtained on proxy graphs G * with missing information.The latter is done both with and without tattling in the adoption process.As the fully observed population graph G in the simulation study, we consider an open source data set from Jankowski et al. ( 2017) on the complete history of a Polish social network website operating from 2007 to 2012.These data enable us to simulate adoption processes on a real social graph, formed by all observable interactions between users of this particular social network website.This way, we can study the behaviour of our methodology on a real-life topology, without introducing assumptions on the graph structure.For computational reasons, we selected the graph formed by active users in the third quarter of 2009.The resulting graph has 97,804 vertices and $2.2 million edges.We verified that the selected subgraph has similar graph properties (i.e., vertex degree distribution and clustering coefficient distribution) as the entire social graph for the period.
We organise the simulation study in two distinct setups: 1. Adoption processes are simulated with tattling and estimated by the STAR model on the fully observed graph G and on different proxy graphs G * , 2. Adoption processes are simulated without tattling and estimated by both SAR and STAR models on G and on different proxy graphs G * .
For both setups, the generative model for the adoption process contains only intercepts (α 0 , β 0 ,ϕ 0 , in the first case, and solely α 0 , β 0 in the second case) and 100 processes are simulated.For each adoption process, one proxy graph is obtained by removing edges at random from G with the probability p set at 0.2, 0.5 and 0.8.

| Setup 1: With tattling
First, we assess the stability and identifiability of the maximum likelihood parameter estimates when the population graph G is fully observed.Second, we assess the behaviour of the maximum likelihood parameter estimates under missing edges, that is, when the population graph G is not fully observed and some proxy graph G * is used in the estimation.Figure 4 shows box plots of the parameter estimates based on different G * 's for the three parameters α 0 , β 0 and ϕ 0 of the STAR model when the proportion of missing edges increases.From this point on, we use one size for the training set, NðTÞ ¼ 5000, for each simulated adoption process.The box plots are shown for 100 simulated adoption processes for no missing edges and the proportions of missing edges of 0.2, 0.5 and 0.8.The left panel shows that the parameter estimates of the external influence do not become significantly biased when the edge missingness probability increases.This stands in contrast to the results of Parviero et al. (2023), showing that the external parameter estimates for the SAR model become substantially biased.On the other hand, both the estimated viral and tattling parameters become biased when the proportion of missing edges increases, which is analogous to the behaviour seen for the SAR model (Parviero et al., 2023).
Figure 5 shows the difference between the estimated and true intensity shares for the external and internal components for an increasing proportion of missing edges.The top left panel shows the difference between the estimated and true external share computed by 1 NðTÞ The top right panel shows the difference between the estimated and true internal share, computed by substituting λ * int,i ðtÞ and λ int,i ðtÞ in Equation ( 10).From the two top panels, it is seen that error bars always contain zero, indicating that estimates of the external and internal compo-    (Ugander et al., 2011).Hence, almost all individuals who lose exposure to viral sources of influence due to missing edges in G * will retain their exposure to tattling, and they will thus be observed as exposed to some form of internal source of influence.From the equality relation in Equation ( 6), the average estimate of the viral parameter α * 0 must then be consistently below the average value of A α * 0 .On the parameter scale, since the external source of influence is also overestimated, the overestimation of the tattling sources is not large enough to overcome the underestimation of the viral influence.This agrees with the behaviour seen in Figure 5.
To illustrate further the effect of incorporating a tattling component under missing edges, Figure 7 shows the percentage shares attributed to the external, viral and tattling sources of influence for one specific simulated process.The shares are shown for an increasing number of missing edges, p.In Figure 7, the area for each time segment is split into three components of intensity: blue for the external source, red for direct sources and green for tattling sources.The figure shows that the estimated external share is dominant in the beginning phase of the adoption process.
The other components of the intensity assume greater importance at later stages when a greater number of adoptions has been observed.The figures demonstrate further the combined estimates to be robust against edge missingness.As p increases, the importance of direct viral sources decreases (red areas), while the importance of tattling sources increases (green area), thus keeping balance.The model's predictive ability under missing edges is assessed by selecting 10 realisations of the adoption process as observed processes and predicting forward in time.The prediction performance is then assessed by computing the root mean squared error at a fixed time point, averaging over the 100 members of the simulation ensemble.Starting from the time point of Nð TÞ ¼ 5000, the fixed endpoint is defined by T where Nð TÞ ¼ 6000, for each one of the 10 chosen realisations.Figure 8 shows that the prediction performance is remarkably stable as p increases.This agrees with the hypotheses formulated in Section 3 and suggests that the estimation bias introduced by missing edges is necessary to prevent a deterioration in the prediction performance.As expected, as more information in the fully observed graph is missing, variability grows as p increases.
We illustrate the impact of missing edges on prediction by selecting one adoption process as the observed process and observing the deviation between the observed process and the ensemble of simulated processes.Figure 9 shows the observed number of adoptions per time unit (black) and from the dotted line (Nð TÞ ¼ 5000), the mean and 90% interval of the predicted processes (red), for one specific process.
The top panel shows the result for no missing edges with an increasing proportion of missing edges for the lower panels.Predictions seem to be reliable up until p ¼ 0:5, as the observed process is contained within the prediction bands.However, this does not hold when the proportion of missing edges increases beyond 50%.We see clearly that as the proportion of missing edges increases, the overestimation of the external sources affects the predictions and pushes the prediction interval upwards.The corresponding cumulative adoption curve for Figure 9 is found in Figure S1.
F I G U R E 7 Percentage shares of adoptions attributed to the external, viral and tattling sources of influence, for an increasing number of missing edges (lower panels), for one specific simulated process.Blue shows the external share, red shows the viral share and green shows the tattling share.F I G U R E 9 For one specific process, the observed number of adoptions per time unit in black and the predicted number of adoptions per time unit (with 90% confidence intervals) in red, shown for an increasing number of missing edges.As the edge missingness increases, the number of predicted adoptions is overestimated.

| Setup 2: Without tattling
Finally, we assess how the maximum likelihood parameter estimates of the STAR model behave when there is no tattling present in the generative model, that is, that the adoption process follows the SAR model of Parviero et al. (2022).parameters will be increasingly biased when more edges are missing.The bottom left panel demonstrates that the estimates of the external component are biased when including the tattling component, even though no tattling was present.The bottom right panel shows that for no missing edges, the estimated tattling parameter approaches φ0 !À∞, corresponding to the true value of the influence rate being e φ0 !0. When the proportion of missing edges increases, the tattling estimates also increase, being consistent with the results for simulation setup 2.
Figure 11 shows the difference between the true and estimated shares of the different model components for an increasing proportion of missing edges, using both the SAR and STAR models.The top panels show the difference between the external and internal sources of influences obtained by the misspecified STAR model.It is seen that estimates of the external and internal shares are estimated with a significantly lower error for the misspecified STAR model than the correctly specified SAR model.Both the external and internal sources for the STAR model are correctly estimated even for large values of p, with the error bars always containing 0. The bottom panels show the difference obtained by the SAR model for an increasing proportion of missing edges.In this case, both the external and internal sources substantially deviate from their true values as more edges are missing.Regarding prediction performances, Figure 13 shows the RMSE computed at T, such that Nð TÞ ¼ 6000, for both the SAR and the STAR model, on 10 different simulated adoption processes.We can observe that for each value of p, the prediction results between the two models are comparable, and hence, the misspecified model produces predictions that are not worse than the ones of the correctly specified model.Furthermore, the results of the STAR model display a larger increase in variability as p increases, which is to be expected as the model is misspecified.Moreover, these results highlight once again that when social graph information is incomplete, the compensation observed in the parameter estimates is necessary not to deteriorate prediction performances.However, the STAR model is still able to perform better inference on the training set, as foreshadowed by the behaviour shown in Figure 11.
We illustrate again the impact of missing edges on prediction by selecting one adoption process as the observed process and observing the deviation between the observed process and the ensemble of simulated processes.Figure 14 shows the observed number of adoptions per time unit (black) and from the dotted line (Nð TÞ ¼ 5000), the mean and 90% interval of the predicted processes, for one specific process.The prediction interval from the STAR model is shown in red and the prediction interval for the SAR model is shown in blue.The top panel shows the result for no missing edges, demonstrating that the two models then are essentially the same.For the lower panels, as the proportion of missing edges  F I G U R E 1 4 For one specific process, the observed number of adoptions per time unit is shown in black.The predicted number of adoptions per time unit (with 90% intervals) for the STAR model is shown in red, while the predictions for the SAR model is shown in blue.All predictions are shown for an increasing number of missing edges.As the edge missingness increases, the number of predicted adoptions based on the STAR and SAR model deviates, and the SAR model in blue is seen to better fit the observed adoptions.increases, the red and blue prediction intervals are seen to deviate more and more but still yield comparable results.When the proportion of missing edges increases beyond 50%, in this case, the correctly specified SAR model (blue) fits better to the observed adoptions.The corresponding cumulative adoption curve for Figure 14 is found in Figure S2.

| DISCUSSION
In this paper, we have extended the model framework of Parviero et al. (2022) for adoption processes to include tattling.We have shown that with this enhanced model framework, the STAR model, the external, viral and tattling parameters are identifiable and may be accurately estimated via a maximum likelihood procedure.In addition, the introduction of tattling sources of influence still permits a definition of a computationally efficient version of the likelihood function.One of the main limitations of the current model is the assumption of a fixed exposure window Δ.It would be more realistic to, for instance, have an exponential decay of the exposure over time (with a potential cutoff) as is common in the modelling of innovation diffusion.A more realistic temporal function for the exposure would, however, complicate the fast computations required by the model.Subsequently, we investigated how both inference and predictions of the STAR model are impacted by missing edges in the social graph.
When the true adoption process comprises tattling sources of influence, we see that the STAR model is particularly robust to missing edges.To this point, we observe that the overall importance of internal sources of influence is remarkably well estimated even when a large portion of edges are missing since the overestimation of the tattling parameter naturally compensates for the underestimation of the parameter linked to viral interactions.Prediction results confirm this robustness, as the average prediction performance does not deteriorate when the edge missingness probability p increases.Furthermore, we investigated whether or not the robustness of the STAR model is retained when it is used to perform inference on for adoption process with no tattling interactions and missing edges in the social graph.To this point, we find that allowing for tattling sources of influence improves the estimation of the share of external and internal sources of influence.In this sense, our simulation study has shown that purposefully misspecifying the model may in fact aid in overcoming the biases arising from missing social interactions in G * .This is similar to for instance Hellkvist et al. (2023), which showed that a regression model with fake features (i.e.not belonging to the true data generating mechanism) might achieve better estimation performances than a correctly specified model.Also, we show that the prediction performances remain stable as edge missingness probability p increases.As a concluding remark, we find that the misspecified model yields prediction performances that are of comparable magnitude to the ones of the correctly specified model, albeit with higher variability.
For future work, it would be interesting to include more complex models for the influence window, for instance, utilising exponential decay, and explore dynamic edge modelling.Another specific direction would be to study the effect of missing vertices, a relevant issue in many applications.
As a specific example, a graph of one telecom company only includes every individual of the target population if the product is exclusively offered to their customers.A scenario with missing vertices could arise if all adopters are assumed observed (except possibly, their covariates), but that non-adopter vertices may be missing at random.For many products, the adopters are observable, even if they don't belong in the company telecom graph.An interesting direction would be to investigate if this assumption leads to an overestimation of the viral-related intensity.
the estimated external share of Λ * T is equal to the true external share plus the component of the true viral share and the true tattling share associated with vertices that are not observed to be exposed to either one of the internal sources of influence on G * .B β * 0 can be interpreted as the value of β * 0 accounting for all viral sources of influence directed towards vertices in U * 0 and U * 00 .These vertices are observed to be exposed solely to external sources of influence on G * , and thus, β * 0 has to compensate at least for them.Being e α0 P i U * 0

Figure 3
Figure3shows box plots of the maximum likelihood estimates based on G for the three parameters α 0 , β 0 and ϕ 0 of the STAR model.The box nents are unbiased.The bottom panels show the difference between the estimated and true share of the internal component divided into the viral and tattling parts.The bottom left panel shows the difference between the estimated and true viral share, while the bottom right panel shows the difference between the estimated and true tattling share.It is seen clearly that as the proportion of missing edges increases, the tattling component is overestimated, while the viral component is underestimated, and the overestimation and underestimation in both components seem to be counterbalanced.F I G U R E Box plots of the parameter estimates of the STAR model across simulations with tattling present for increasing missing edge probability.True values are shown by black horizontal lines.

Figure 6
Figure 6 shows the average parameter estimates (red solid lines) of the three parameters α 0 ,β 0 and ϕ 0 across simulations, for increasing edge missingness probability, compared to the average values of A α * 0 , B β * 0 and F φ * 0 (black dashed lines) and the true values (black horizontal lines).The Average point estimates (solid red lines) compared with the average values of A α * 0 , Boxplots showing the RMSE computed using results of the STAR model 10 simulated adoption processes.The average prediction performance is remarkably stable, as edge deletion probability p increases.As expected, variability increases as p increases.

Figure 10
Figure 10 shows box plots of the parameter estimates across simulations in the case when there is no tattling present.The top panels show the parameter estimates for the SAR model, and the bottom panels show the parameter estimates for the STAR model, with the true values shown by black horizontal lines.The top panels demonstrate the main conclusion of Parviero et al. (2023) that both the external and the viral

Figure 12
Figure12shows the average point estimate of the SAR and the STAR models, compared with the average B β * 0 and A α * 0 values of the external and viral parameters (the non-zero parameters of the model), for an increasing proportion of missing edges.When no edges are missing, both the STAR and SAR models produce unbiased estimates of these two parameters.For the correctly specified SAR model (solid blue lines), the average external parameter estimates are overestimated, while the average viral parameter estimates are slightly underestimated, compared to the values of B β * 0 and A α * 0 (dashed lines).This reflects the poor estimation performances of the correctly specified model under missing edges, as concluded byParviero et al. (2023).On the other hand, for the misspecified STAR model (solid red lines), the average external parameter estimates are significantly closer to the true values than that of the SAR model, and they are even below the B β * 0 and A α * 0 values.This is all due to the tattling component retrieving the undetected direct interactions not observed when the edges are missing.
Boxplots showing the RMSE computed using results of both the SAR (on the left panel, in blue) and STAR model (on the right panel, in red) on 10 simulated adoption processes devoid of tattling.At p ¼ 0, results from three runs of the STAR model are omitted from the visualisation due to issues encountered in the estimation phase.When p > 0, it is possible to appreciate that the misspecified model produced comparable results to the correctly specified one.When p ¼ 0:8, the STAR model results show a greater increase in variability.
Average parameter estimates for the SAR model (solid blue lines) and the STAR model (solid red lines) compared with the average B β * 0 and A α * 0 values of the SAR (dashed lines) for increasing proportion of missing edges, in the case of no tattling.True values are shown by black horizontal lines.
For each tattler, each one of these subsets is linked to one single tattling activity, since the tattler is activated in t c ð1Þ , and keeps being activated until t c ðlÞ þ Δ.If a tattler flickered, the set ft ð1Þ , …, t ðmÞ g will be the union of disjointed subsets of this type, since t cþ1 ð1Þ À t c ðlÞ > Δ. Stemming from these definitions, we have that ft ð1Þ , …, t ðmÞ g¼