How long is the Chaos Game?

In the 1988 textbook"Fractals Everywhere"M. Barnsley introduced an algorithm for generating fractals through a random procedure which he called the"chaos game". Using ideas from the classical theory of covering times of Markov chains we prove an asymptotic formula for the expected time taken by this procedure to generate a $\delta$-dense subset of a given self-similar fractal satisfying the open set condition.


Introduction
An iterated function system or IFS is defined to be a tuple of contracting transformations of a complete metric space, which in this article will be taken to be R d . It is well-known that if (S 1 , . . . , S N ) is such an IFS then then there exists a unique non-empty, compact set F = N i=1 S i F ⊆ R d which is called the attractor or limit set of (S 1 , . . . , S N ). The properties of attractors of iterated function systems and the natural measures supported on them have been the subject of substantial mathematical inquiry for several decades since their introduction in [3,8], and remain a highly active topic of contemporary mathematical research (we note for example [1,4,7,10,13,14,15]) as well as being noted for their aesthetic appeal.
In [2] Barnsley introduced an algorithm known as the Chaos Game for the construction of the limit set F of an iterated function system (S 1 , . . . , S N ). Given an arbitrary starting point x 0 ∈ R d , we define a sequence (x n ) ∞ n=0 inductively by choosing for each n ≥ 1 an index i n ∈ {1, . . . , N } independently at random according to some fixed non-degenerate probability vector (p 1 , . . . , p N ), and taking x n := S in x n−1 for every n ≥ 1. It is not difficult to show that the resulting sequence almost surely has the attractor F as its ω-limit set (that is, we have ∞ m=1 {x n : n ≥ m} = F ) and it is not much more difficult to show that the distribution 1 n n−1 k=0 δ x k converges almost surely to the unique Borel probability measure m supported on F which satisfies m = N i=1 p i (S i ) * m, as was first established in [5]. If the initial point x 0 is taken to be in the attractor (for example, by taking x 0 to be the fixed point of one of the contractions S i ) then one obtains the simpler result that the sequence (x n ) ∞ n=0 is almost surely dense in the attractor, and for the rest of this article we will prefer to make this assumption on the starting point x 0 . Yet surprisingly, we have found no trace in the literature of the following question: how quickly does the randomly-generated sequence (x n ) become dense in the attractor? In this direction we are aware only of the article [6], which informally investigates the problem of choosing probabilities in such a way as to generate fractal images with maximal efficiency using the chaos game procedure. In the present note we attempt to fill this gap in the literature with a rigorous investigation.
Let us make our question precise. Given a compact subset F of R d we will say that a subset X of F is δ-dense in F if for every z ∈ F there exists x ∈ X such that d(x, z) ≤ δ in the standard metric on R d . (Since X is a subset of F , this is equivalent to asking that the Hausdorff distance between X and F is at most δ.) Given an IFS (S 1 , . . . , S N ) with attractor F , for each δ > 0, i = i 1 i 2 . . . ∈ {1, . . . , N } N and starting point v ∈ F we define the δ-waiting time along the sequence i as If additionally a nondegenerate probability vector (p 1 , . . . , p N ) is understood, then we define the expected δ-waiting time with starting point v to be the expectation E(W δ,v ) with respect to the (p 1 , . . . , p N )-Bernoulli measure on {1, . . . , N } N .
We recall that an IFS is said to satisfy the open set condition or OSC if there exists a nonempty open set U ⊂ R d such that N i=1 S i U ⊆ U with the sets S i U being pairwise disjoint. The set U may without loss of generality be taken to be bounded, and we will always assume that this is the case. We recall that S i : R d → R d is called a similitude or similarity transformation if there exists r i ∈ (0, 1) such that d(S i u, S i v) = r i d(u, v) for all u, v ∈ R d ; in this case we say that r i is the contraction ratio of S i . If (S 1 , . . . , S N ) is an IFS of similitudes with respective contraction ratios r 1 , . . . , r N then the similarity dimension of (S 1 , . . . , S N ) is defined to be the unique real number s ≥ 0 such that N i=1 r s i = 1. By a classical theorem of Hutchinson (see [8]), if an IFS of similitudes satisfies the open set condition then the Hausdorff and box dimensions of the attractor F are both equal to the similarity dimension s. It was also shown by Hutchinson that there exists a unique Borel probability measure m satisfying m = N i=1 r s i (S i ) * m, and that this measure is supported on F and has Hausdorff dimension equal to that of F ; moreover, if any other probability vector is chosen then the resulting measure has Hausdorff dimension smaller than that of F . At an intuitive level this suggests that the limit distribution of the random sequence (x n ) ∞ n=0 generated by the Chaos Game will be most evenly distributed around the attractor when the underlying probability vector is (r s 1 , . . . , r s N ), and will be more concentrated in certain subregions of the attractor for other probability vectors. Thus we might expect the probability vector (r s 1 , . . . , r s N ) to generate a random sequence which fills up the attractor most efficiently, and for other choices of probability vector to result in longer waiting times for the sequence to become δ-dense in the attractor. This intuition is realised in our main result: Theorem 1.1. Let (S 1 , . . . , S N ) be an IFS of similitudes S i : R d → R d , with contraction ratios given by r i ∈ (0, 1), which satisfies the OSC. Let s denote the similarity dimension of (S 1 , . . . , S N ), let (p 1 , . . . , p N ) be a nondegenerate probability vector, and define where we observe that by the arithmetic-geometric mean inequality we have t ≥ s with equality if and only if (p 1 , . . . , p N ) = (r s 1 , . . . , r s N ). If the maximum in (1) is attained at a unique value of i ∈ {1, . . . , N } then there exists a constant C > 0 such that for every starting point v ∈ F and 0 < δ < min 1≤i≤N r i , If the maximum in (1) is not attained at a unique value of i ∈ {1, . . . , N } then there exists a constant C > 0 such that for every starting point v ∈ F and 0 < δ < min 1≤i≤N r i , Thus for the probability vector (p 1 , . . . , p N ) = (r s 1 , . . . , r s N ) we have for every starting δ for all 0 < δ < min 1≤i≤N r i , and for every other probability vector E(W δ,v ) tends to infinity more rapidly as δ → 0. At an intuitive level the principle underlying this result is that for the "natural" probability measure (r s 1 , . . . , r s N ), all regions of the attractor with diameter δ take an approximately equal time to visit; for other probability measures, some δ-balls in the attractor are substantially more difficult to access than others. As we will see below, the key determiner of the expected waiting time is the expected time taken to visit the most slowly accessible part of the attractor, and it transpires that this in turn corresponds to a region of the form S n i U where i is chosen to maximise the ratio log p i / log r i and n ≥ 1 is chosen such that this region has diameter approximately δ. In the case where log p i / log r i is maximised at a unique index i it is interesting to ask to what extent the result (2) may be sharpened, but we have not been able to determine the exact rate of growth of E(W δ,v ) in that case in the present article. It is also interesting to ask what information may be obtained regarding the pointwise almost sure behaviour of the family of random variables W δ,v for fixed v.
In the case where the starting point v is not taken to be in the attractor, since the sequence S in · · · S i 1 v approaches the attractor at a uniform exponential rate, one may obtain the same asymptotics for the expected waiting time as in Theorem 1.1 but with a larger constant C depending on the initial distance between v and the attractor; we leave the details of this adaptation of our result to the reader.
To illustrate the structure of the proof of Theorem 1.1 it is helpful to consider a simpler case in which the contraction ratios r i are all equal to the same constant r and the diameter of the set U is precisely 1. In this case, if every set of the form S im · · · S i 1 U has been visited by the sequence (x n ) ∞ n=0 by time N then we certainly have W r m ,v ≤ N . On the other  hand one may shown that there exists κ > 0 such that for every m ≥ 1, every set of the form S im · · · S i 1 U contains an open ball of radius r m κ which in particular does not intersect any other set of the form S im · · · S i 1 U . Thus if the sequence of indices i fails to include a particular string of the form i 1 · · · i m before time N , we expect that W r m κ,v > N . (There is some imprecision here in that the initial point v may by chance have belonged to the ball r m κ, but it transpires that this imprecision has a negligible effect in practice.) This suggests that the asymptotic behaviour of the expectation of W δ,v can be reduced to the problem of determining the expected first time for an IID random sequence in {1, . . . , N } N , chosen with respect to the Bernoulli measure (p 1 , . . . , p N ) N , to include all of the distinct words of length m over the alphabet {1, . . . , N }, where m is chosen so that r m is approximately the size of δ. But this symbolic problem is precisely the classical coupon collector's problem described in, for example, [9]. In the full generality of Theorem 1.1 this approach must be adapted somewhat: since the sets S im · · · S i 1 U will in general have different diameters for different sequences i 1 , . . . , i m of the same length m, it is necessary to partition the set {1, . . . , N } N into cylinders and estimate the expected time for all of these cylinders to be visited by a random sequence. This results in a Markov chain analogue of the coupon collector's problem which we solve using techniques adapted from [9, §11].
Our proof of Theorem 1.1 will thus be divided into two parts: the reduction of the problem to a covering problem for Markov chains, and the solution of the latter covering problem. Some possible directions for future research are described at the end of this note.

A Markov chain construction
For the remainder of this article we fix an IFS of similarities (S 1 , . . . , S N ) which satisfies the OSC and denote the contraction ratio of each map S i by r i . We also fix a nondegenerate probability vector (p 1 , . . . , p N ). Let us write r min := min i=1,...,N r i . In this section we will show that for each 0 < δ < r min , we can construct a Markov chain whose expected time to visit all of its states is approximately proportional to EW δ,v 0 . More precisely, given a Markov chain (X n ) ∞ n=0 on a finite state space Ω, we define the covering time by τ cov = min{t ≥ 0 : ∀y ∈ Ω, ∃s ≤ t s.t. X s = y}, that is, the first time that all of the states in Ω have been visited by the Markov chain. Given x ∈ Ω, we denote by E x τ cov the expected covering time given that X 0 = x. We can now state the main result of this section.
Proposition 2.1. For each 0 < δ < r min there exists an irreducible Markov chain (X δ n ) ∞ n=0 on a finite state space P δ such that for each v 0 ∈ F , there exists i 0 ∈ P δ for which where the constant c ∈ (0, 1) is independent of δ, v 0 and i 0 .
We will derive Proposition 2.1 from the combination of two results to be proved below, Proposition 2.2 and Proposition 2.3. The significance of (4) is that for each 0 < δ < r min and i 0 ∈ P δ , E i 0 τ cov can be estimated by employing classical methods for bounding covering times of irreducible Markov chains. In particular, we will show that for any 0 < δ < r min and i 0 ∈ P δ , E i 0 τ cov satisfies the upper and lower bounds presented in (2) and (3), which will directly imply the bounds for EW δ,v 0 by (4), up to a change in the uniform constant.
We introduce some notation. Define I = {1, . . . , N }, which we call the index set. Let I n = {i 1 . . . i n : i j ∈ I} denote the set of all words of length n over the index set, I * = n∈N I n the set of all finite words over the index set and Σ = I N the set of all sequences over the index set. Σ is equipped with the infinite product topology with respect to which it is compact and metrisable. If i ∈ I * and j ∈ I * ∪ Σ we let ij denote the concatenation of i with j. Given i ∈ I * , let [i] denote the cylinder set [i] = {ij : j ∈ Σ}. Cylinder sets are clopen and generate the topology on Σ. Given i = i 1 . . . i n ∈ I * let |i| denote the length of the word i, so that in this case |i| := n.
It is not difficult to show that for every i ∈ Σ the limit exists for every v ∈ R d and that moreover the limit is independent of the starting point v. This coding map π : Σ → R d is continuous and its image π(Σ) is precisely F . (However, in cases where T i U ∩ T j U = ∅ for some i = j the coding map π can fail to be injective.) Obviously, for every i ∈ Σ and n ≥ 1 we have i ∈ [j] where j is the word corresponding to the first n symbols of i. Since π : Σ → F is surjective, for every x ∈ F and n ≥ 1 there exists at least one word j ∈ I * with |j| = n such that x ∈ π([j]) = S j F . For each δ ∈ (0, r min ), define a subset of I * by Note that since δ < r min , if r i ≤ δ then necessarily |i| ≥ 2 and hence i − is well defined.
It is easy to see that We say that a list of words j 1 , . . . , j k ∈ I * visits the cylinder set [i] ⊂ Σ if at least one of the words j i satisfies [j i ] ⊆ [i]. We will show that instead of keeping track of which regions of the attractor are visited by the chaos game algorithm, we can keep track of which cylinder sets in {[i] : i ∈ P δ } are visited by a symbolic analogue of the algorithm. This is made precise in the following proposition.
Proposition 2.2. There exist κ > 0 and ∆ > 0 depending only on (S 1 , . . . , S N ) having the following property. Let v 0 ∈ F be arbitrary, δ ∈ (0, r min ) and choose any i 0 ∈ P δ such that v 0 ∈ S i 0 F : Proof. Since the IFS (S 1 , . . . , S N ) satisfies the OSC, by a result of A. Schief ([12]) it also satisfies the strong open set condition (SOSC), that is, there exists a bounded open set U such that N i=1 S i U ⊂ U where the union is disjoint and U ∩ F = ∅. Let ∆ := diamF . It follows that there exists x ∈ F and 0 < < ∆ with B(x, ) ⊂ U . In particular for i, j ∈ P δ with i = j, the balls S i B(x, ) and S j B(x, ) are disjoint. If y ∈ F is arbitrary then we may choose i ∈ P δ such that y ∈ S i F . Since S i x, y ∈ S i F it then follows that Now, to prove (i), fix i 0 , i 1 i 0 , . . . , i n . . . i 1 i 0 satisfying the hypothesis of (i) and consider an arbitrary i ∈ P δ . By assumption there exists k satisfying 0 ≤ k ≤ n such that This completes the proof. Proposition 2.2 is key to the construction of the Markov chain (X δ n ) ∞ n=0 , which we are now ready to provide details of: 0 otherwise, and define a vector π δ ∈ R N δ by π i := p i . Then: In order to prove Proposition 2.3 we require two preliminary lemmas concerning P δ .
. . , m. We appeal to the fact that if two cylinder sets intersect then one of them contains the other. If for some k the set [i ] is a subset of [j k ] then either i = j k or i = j k k for some finite word k. In the former case we have [i] = [i 1 i ] = [i 1 j k ] and the proof is complete. In the latter case we have i = i 1 i = i 1 j k k so that i 1 j k is a proper prefix of i ∈ P δ . This implies r i 1 r j k > δ and in particular r j k > r −1 Proof. Fix such an i and i. To demonstrate the existence of j we observe that by the partition property there must exist j ∈ P δ such that [ Proof of Proposition 2.3. To see that A δ is row stochastic we observe that for each i ∈ P δ and each i = 1, . . . , N there exists a unique j ∈ P δ such that [ii] ⊆ [j] by Lemma 2.5. This implies there exists a unique j ∈ P δ such that a i,j = p i . It follows that every p 1 , . . . , p N occurs once in the row of A δ corresponding to i and the remaining entries in that column are zero, so A δ is row stochastic as claimed.
That π δ is a stochastic vector follows directly from the fact that P δ is a partition of Σ. To verify the equation π δ A δ = π δ , let j ∈ P δ be arbitrary and using Lemma 2.4 we can write [j] = m k=1 [ii k ] where i 1 , . . . , i m ∈ P δ and i is the first symbol of j. We obtain To show that A δ is irreducible, it is sufficient to show that there exists L > 0 such that A L δ is a positive matrix. We will show that this is true for L = L δ := max i∈P δ {|i|}. Let i, j ∈ P δ be arbitrary with j = j 1 , . . . , j n , say, and if n < L let j n+1 , . . . , j L ∈ {1, . . . , N } be arbitrary. Define k L+1 := i. By a simple inductive application of Lemma 2.5 starting at t = L and descending to t = 1 we may choose k 1 , . . . , k L ∈ P δ such that for all t = 1, . . . , L we have [j t k t+1 ] ⊆ [k t ]. Define t := min{L + 1 − t, |k t |} for each t = 1, . . . , L + 1. We claim that for each t = 1, . . . , L + 1 the first t symbols of k t are j t , . . . , j t−1+ t in that order. This statement is clearly true for t = L + 1, so let us assume its truth for some t ∈ {2, . . . , L + 1} and deduce its truth for t − 1. Since [j t−1 k t ] ⊆ [k t−1 ] we have |k t−1 | ≤ 1 + |k t | and therefore also implies that the first |k t−1 | symbols of j t−1 k t are precisely the word k t−1 . Since the first 1 + t symbols of j t−1 k t are j t−1 j t · · · j t−1+ t , the first t−1 ≤ min{|k t−1 |, 1 + t } symbols of k t−1 must be j t−1 · · · j t−2+ t−1 . This is precisely what is required for the claim to be true in the case t − 1. The claim follows by induction.
Applying the claim with t = 1 it follows that the first 1 = min{L, |k 1 |} = |k 1 | symbols of k 1 are j 1 , . . . , j 1 . If 1 < n then [j] is a proper subset of [k 1 ] and if 1 > n then [k 1 ] is a proper subset of [j], but both of these contradict the partition property of P δ and we conclude that 1 = n and therefore k 1 = j. The relation [j t k t+1 ] ⊆ [k t ] for each t implies that a k t+1 ,kt > 0 for all t = 1, . . . , L. Since k 1 = j and k L+1 = i it follows that a i,k L a k L ,k L−1 · · · a k 2 ,j > 0. This is precisely what is needed to show that the entry of A L δ in position (i, j) is positive. Since i and j were arbitrary the proof of (c) is complete.
We may now prove Proposition 2.1: Proof of Proposition 2.1. Let κ > 0 be as given by Proposition 2.2 and for each δ ∈ (0, r min ) let (X δ n ) ∞ n=0 be the irreducible Markov chain on the state space P δ which is induced by the transition matrix A δ defined in Proposition 2.3.

Bounds on the covering time
Fix 0 < δ < r min . In order to prove Theorem 1.1 it suffices, via Proposition 2.1, to obtain estimates on the expected covering time E i 0 τ cov for all i 0 ∈ P δ . To this end, we will make extensive use of a family of bounds on the expected covering time of irreducible Markov chains derived from the so called "Matthews method" [9,11]. Roughly speaking, this type of bound reduces the task of estimating the expected covering time to only having to estimate the expected time for the Markov chain to travel between a pair of states. In particular, for i ∈ P δ we define the hitting time τ i := min{t ≥ 0 : X δ t = i}, that is, the first time that the state i is visited by the Markov chain, and denote by E i τ j the expected hitting time of j ∈ P δ given that X δ 0 = i. Then, provided the expected hitting time between a pair of states is fairly homogeneous over a (large part of) the state space, the Matthews method exploits the fact that states can be visited by the Markov chain in many different orders to deduce that the expected covering time is approximately proportional to the logarithm of the cardinality of the state space times the typical expected hitting time between a pair of states. Translated through the symbolic coding introduced in the previous section, the requirement that the expected hitting time be fairly homogenous over a sufficiently large part of the state space turns out to correspond to the situation that the probability vector (p 1 , . . . , p N ) is chosen in such a way that (1) is not satisfied for a unique index. In this situation the measures m = N i=1 p i (S i ) * m of many of the similarly sized pieces {S i F } i∈P δ are approximately comparable, which implies that expected hitting times between many pairs of states in P δ are also roughly uniform. On the other hand, when the expected hitting time between pairs of states varies substantially according to the pair of states which are chosen -which is the case when (1) is satisfied uniquely and therefore the measure m is far from being uniformly distributed over any large subcollection of pieces {S i F } i∈P δ -Matthews' method yields bounds which are less sharp, which accounts for the gap between the upper and lower estimates for the rate of growth of EW δ,v 0 in (3).
We will provide precise statements for the "Matthews method" bounds which we use in Propositions 3.1, 3.3 and 3.7, each of which will appear directly preceding the proof of the corresponding bound from the part of Theorem 1.1 to which it pertains.
There are a couple more notions related to the covering and hitting times which will be useful in our analysis. Firstly, for i ∈ P δ we define τ + i := min{t ≥ 1 : X δ t = i}, which we call the first return time to the state i.
In order to introduce the second one, it is helpful to visualise the Markov chain (X δ n ) ∞ n=1 in the following way. At each transition we append a new bit i ∈ {1, . . . , N } with probability p i on the left of the current state, say i = i 1 . . . i n . There is now a unique way that we can delete the tail of ii = ii 1 . . . i n to yield a new word j = ii 1 . . . i n−m ∈ P δ . Then, j is the new current state of the chain. In this sense, our Markov chain (X δ n ) ∞ n=0 is closely related to the Markov chain which describes observing patterns of heads and tails in coin tossing, and we can exploit this connection to adapt techniques which were used in [9, §11.3.3] to compute waiting times for all patterns of a fixed length when tossing a fair coin.
Let w i denote the first time that i appears using all new bits, that is, with no overlap with the initial state. This random variable is easier to study than the waiting time τ i since it does not depend on the initial state. There are two trivial observations which will be useful: (i) w i ≥ τ i for all i ∈ P δ and (ii) since w i does not depend on the initial state, Ew i ≥ E i τ + i , where we have purposefully suppressed the dependence on the initial state in Ew i .
Finally, we fix some extra notation which will be used throughout the proofs. We will write A B if A ≤ cB for some constant c which depends only on the parameters fixed by the hypothesis of the result being proved, which in our case may be the IFS itself and the probability vector (p 1 , . . . , p N ), but crucially it will never depend on δ, v 0 or i 0 . Similarly  we write A B if B A and A ≈ B if both A B and B A. Also recall that in the proof of Proposition 2.3 we introduced the notation L δ = max{|i| : i ∈ P δ } = log δ log r min .

Proofs of upper bounds.
First, we obtain the upper bound on EW δ,v 0 in the case where (1) is not maximised uniquely in {1, . . . , N }, that is, we settle the upper bound in (3). For this we will appeal to Matthews' original upper bound on the expected covering time of an irreducible Markov chain (see [9,Theorem 11.2] based on [11]).
We will require the following short lemma which describes which states are most difficult to hit and, given such a state, provides an approximate formula for its stationary probability.
where the implied constants are independent of δ.
where the last approximate equality follows because δr min < r Also, since t = max i log p i log r i , for any i = i 1 . . . i n ∈ P δ we have where the second inequality follows because r i ≥ r i 1 · · · r i n−1 r min > δr min by definition of P δ . Combining (8) and (9) gives (7).
Proof of upper bound in (3). Let i 0 ∈ {1, . . . , N } satisfy Fix arbitary 0 < < r min throughout the proof, and for each k ∈ N consider P k . Let i ∈ P k .
We can write i uniquely as i k i k−1 . . . i 1 where for each 1 ≤ m ≤ k, i m . . . i 1 ∈ P m . We note that since < r min , |i 1 | ≥ 2 by definition of P . Moreover, for each 1 ≤ m ≤ k, where i 1 denotes i 1 with its last digit removed. Therefore m+1 < r im...i 1 < r im...i 1 r min ≤ r im...i 1 ≤ m hence i m+1 must necessarily have non-zero length. Let C 1 := max j∈P Ew j < ∞ and let C 2 := max j∈P |j|. Let C = max{C 1 , C 2 }. Now, for each 2 ≤ m ≤ k, Therefore where p := max i∈I p i . Therefore, where the implied constant depends on the probability vector (p 1 , . . . , p N ) and on the choice of but does not depend on the length k = |i|. Now, fix δ > 0, and let k ∈ N satisfy k ≤ δ < k−1 . Let j ∈ P δ . Then we can choose some i ∈ P k such that i| n = j for some n ∈ N. Then p i ≥ p j p C 2 min ≈ p j where p min := min i p i . In particular Let i 0 ∈ P δ be the unique string consisting only of the digit i 0 . By (7), By Proposition 3.1, where we have used the fact that N δ := |P δ | ≤ r −s min δ −s as remarked at the beginning of the previous section. The result now follows by Proposition 2.1.
In the above proof we did not refer to the question of whether or not (1) is uniquely maximised only at i 0 , and indeed the upper bound of EW δ,v 0 δ −t log 1 δ for all v 0 ∈ F is valid whether or not this is the case. However, in the case where (1) is maximised uniquely this upper bound is not optimal owing to the fact that on a large part of the state space P δ , the expected hitting times are significantly lower than δ −t . Indeed for a typical choice of i ∈ P δ an expected hitting time E i τ j will be of the order δ −t only if the word j contains many instances of the digit i 0 . Thus in the case where (1) is maximised at a unique index it is useful to separate the less accessible part of the state space from the more accessible part of the state space and estimate the expected covering times of each of these parts separately.
To be precise, for any B ⊂ P δ we define τ B cov = min{t ≥ 0 : ∀i ∈ B, ∃s ≤ t s.t. X s = i}, that is, the first time that all of the states in B have been visited by the chain. The expected covering time of B satisfies an analogous upper bound to the expected covering time (see [9, (11.16)] which can easily be proven by adapting the proof of [9, Theorem 11.2]): In particular, in the case where (1) is uniquely maximised one can improve on the upper bound of EW δ,v 0 δ −t log 1 δ by instead estimating how long it would take for the Markov chain to first visit all states in a subset B consisting of states i ∈ P δ which contain a restricted number of the digit i 0 , followed by visiting all states in B = P δ \ B. The subset B would constitute most of the state space P δ , but would benefit from having a reduced upper bound on max i,j∈B E i τ j . On the other hand, the upper bound on max i,j∈B E i τ j would be of the order δ −t , but B would only comprise a small proportion of the state space. Of course, there is a lot of flexibility in how the subset B could be defined, so by carefully considering the contribution of each covering time E i τ B cov and E i τ B cov one can choose B in such a way that the upper bound on EW δ,v 0 is improved to the degree noted in Theorem 1.1: Proof of upper bound in (2). Suppose that i 0 ∈ {1, . . . , N } is the unique digit that satisfies Fix δ > 0 and consider k δ satisfying 1 ≤ k δ ≤ δ to be chosen later. Define B ⊂ P δ to be all strings which contain at most k δ − 1 digits from the set {1, . . . , N } \ {i 0 }. Also denote B = P δ \ B, so that B are all strings in P δ which contain at least k δ digits from the set {1, . . . , N } \ {i 0 }.
We can bound the covering time above by the time it would take to first cover B and then cover B (plus the intermediate travel time) yielding max i∈P δ We will use Proposition 3.3 to bound max i∈B E i τ B cov and max i∈B E i τ B cov in terms of k δ , before choosing k δ in a way that optimises this bound.
To calculate |B|, notice that B = k δ −1 m=0 B m where B m is the set of strings in P δ which contain exactly m occurrences of digits from the set {1, . . . , N } \ {i 0 }. Since any i ∈ B has length |i| ≤ L δ , there are no more than L δ positions where the first digit from the set {1, . . . , N } \ {i 0 } can appear, followed by no more than L δ − 1 positions where the second digit from the set {1, . . . , N } \ {i 0 } can appear and so on. Since we have N − 1 possible choices of digits for each of these, Since L δ log 1 δ we have log |B| k δ log log 1 δ . Notice that B contains a string i 0 which contains only the digit i 0 , therefore min i∈B p i ≈ p i 0 ≈ δ t by (7).
Next we consider B . Since B = P δ \ B we have log |B | ≈ log 1 δ . Next we calculate min i∈B p i . Let i ∈ B . Then for some integer m satisfying k δ ≤ m ≤ L δ , i contains m digits from the set {1, . . . , N } \ {i 0 }. In particular there exist j 1 , . . . , j m ∈ I \ {i 0 } and some integer n such that r i = r j 1 · · · r jm r n i 0 ≈ δ. We have where positivity follows from the fact that i 0 uniquely achieves the maximum in (1). It follows that for each i ∈ {1, . . . , N } \ {i 0 },  (15), Consider the function f δ (x) = x log log 1 δ − r cx max log 1 δ . Observe that f δ (2) < 0 and f δ ( δ ) > 0, provided δ is sufficiently small. Therefore there exists 2 < x δ < δ such that f δ (x δ ) = 0. Notice that f δ (log log 1 δ ) < 0 provided δ is sufficiently small. Since f δ (x) is increasing with x, x δ > log log 1 δ . Therefore, if we fix k δ = x δ , we have and k δ log log 1 δ . Hence by (16), The result now follows from Proposition 2.1.

Proofs of lower bounds.
To estimate the lower bounds on the expected covering time, we begin by obtaining a lower estimate for the expected hitting time E i τ j in terms of Ew j . Lemma 3.4. Given arbitrary i, j ∈ P δ where i = j denote Proof. Observe that where w * j is the amount of time required to build j from new bits after the L δ th bit has been added. Indeed (18) holds since if (X t ) is such that j appears for the first time after L δ time then w j ((X t )) = τ j ((X t )) whereas if (X t ) is such that j appears for the first time before L δ time then w j ((X t )) ≤ L δ + w * j ((X t )). Now, since w * j is independent of the event {τ j < L δ } and w j has the same distribution as w * j , we can take expectations in (18) to obtain Ew j ≤ E i τ j + θ i,j (L δ + Ew j ) which completes the proof of (17).
The usefulness of (17) is that Ew j ≥ E j τ + j , and the expected return time E j τ + j satisfies the following formula (see [9,Proposition 1.14 ]).
Proposition 3.5. Fix 0 < δ < r min . For all i ∈ P δ , So, in order to apply (17) we require an estimate on the probability θ i,j of a fast hitting time of state j from state i, which is provided by the following lemma.
Lemma 3.6. Fix i, j ∈ P δ , where i = j. Suppose that τ j ((X t )) ≥ j whenever X 0 = i. Then Proof. Fix such i and j. We have For each j ≤ k ≤ |j| − 1, P i (τ j = k) ≤ p k , since k correct transitions are required (which correspond to the k correct digits that need to be appended to the left of the word i). If |j| < L δ , for each |j| ≤ k ≤ L δ − 1, P i (τ j = k) ≤ p |j| , since |j| correct transitions are required. Therefore P i (τ j < L δ ) ≤ P i (τ j = j) + P i (τ j = j + 1) + · · · + P i (τ j = L δ − 1) ≤ p j + p j+1 + · · · + p |j|−1 + (L δ − |j|)p |j| We are almost ready to prove the lower bounds on EW δ,v 0 from (2) and (3) via appropriate lower bounds on the expected covering time of (X δ n ) ∞ n=0 . Analogously to Proposition 3.1, the original lower bound of Matthews [11] bounds the minimum expected covering time from below by the minimum expected hitting times between distinct states multiplied by the logarithm of the cardinality of the state space. Clearly, this bound is insufficient for our purposes, since it can yield a lower bound merely of the order log 1 δ owing to the fact in general some states in P δ will be extremely close to one other. Instead, we can again improve on this bound by considering the expected covering time of a subset of the state space, where this time the elements in the subset are chosen in such a way that they are all "uniformly far" from each other in the sense that the expected hitting times of possible pairs of states are uniformly bounded below. For this we will require the following analogue of Proposition 3.3 (see [9,Proposition 11.4 ] 1 ).
Proposition 3.7. Let 0 < δ < r min and B ⊂ P δ . Then for all i 0 ∈ B, We are now ready to obtain a lower bound on EW δ,v 0 in the case that (1) is maximised uniquely at some i 0 ∈ {1, . . . , N }. Since the least accessible part of the state space P δ comprises states i ∈ P δ which consist mostly of the digit i 0 , the most effective choice of B in Proposition 3.7 is a subset of this type. Then applying the lower estimates from Lemma 3.4 on the expected hitting time will yield the desired result.
Proof of lower bound in (2). Fix δ > 0 sufficiently small such that p δ −1 1−p + p δ (L δ − δ ) ≤ 1 2 and denote i 0 ∈ {1, . . . , N } to be the digit that satisfies i contains one instance of j and |i| − 1 instances of i 0 } and B = {jji : i ∈ A} ⊂ P δ . For any j ∈ B, π j = p j ≈ δ t and therefore for any i, j ∈ B, where i = j we have by (17) and Proposition 3.5. In order to bound θ i,j , notice that by definition of B, at least |j| − 1 transitions are required to hit the state j from i. Thus by Lemma 3.6 and our assumption on δ, Finally, to calculate |B|, observe that there are at least l δ/r 2 j ≈ log 1 δ distinct positions at which the digit j can be placed within a string i ∈ A and therefore log |B| = log |A| ≈ log log 1 δ .
By Proposition 3.7 The result follows by Proposition 2.1.
All that remains is for us to obtain the lower bound in Theorem 1.1 in the case that (1) is not uniquely maximised in {1, . . . , N }. Since there must be at least two digits which attain this maximum, Proposition 3.7 can be applied for a choice of B ⊂ P δ where the cardinality of B is exponential in δ −1 . This allows us to recover a sharp lower bound for EW δ,v 0 in this case. (3). Let J ⊂ {1, . . . , N } be the set for which the maximum in (1) is attained, where |J | ≥ 2 by assumption. Define P δ = P δ ∩ J * where J * denotes the set of finite words with digits in J . We begin by showing that there exists a > 0 such that |P δ | ≈ δ −a . Let a > 0 satisfy i∈J r a i = 1 and let P be the Bernoulli measure on Σ where P([i]) = r a i if i ∈ J and P[i] = 0 otherwise. Then 1 = i∈P δ p i = i∈P δ r a i , therefore r a min δ a |P δ | < 1 ≤ δ a |P δ |, implying that |P δ | ≈ δ −a . Fix j ∈ N sufficiently large such that p j 1−p ≤ 1 4 . Fix δ > 0 sufficiently small such that δ > j and p δ (L δ − δ ) ≤ 1 4 . Fix arbitrary i 0 ∈ P δ and denote the first j digits of i 0 by i 1 . . . i j . Define a new word k = k 1 . . . k j by setting k 1 = i 1 and fixing k 2 = · · · = k j = w 1 ∈ J \ {i 1 }. Note that it is not necessarily true that {i 1 } ⊂ J , but if this is the case then w 1 ∈ J \ {i 1 } can always be chosen since |J | ≥ 2. Define B i 0 ⊂ P δ as the set B i 0 = {i ∈ P δ : i = kj for some j ∈ J * }.

Proof of lower bound in
We begin by claiming that for all δ > 0 sufficiently small and all i ∈ B i 0 , π i δ t . Writing i = kj as in the definition of B i 0 , we have π i = p i = p k p j ≤ δ t r jt min p j δ t .
Next we estimate θ i,j = P i (τ + j < L δ ) for i, j ∈ B i 0 , i = j. Fix such i and j. Observe that both i| j = k and j| j = k. Since k 1 does not agree with k 2 , . . . , k j , at least j transitions are required before the chain can hit the state j, when starting from state i. Therefore by Lemma 3.6 and our assumptions on j and δ, θ i,j ≤ p j 1 − p + (L δ − δ )p δ ≤ 1 2 .
Next we bound |B i 0 |. Observe that where a satisfies i∈J r a i = 1, as before. Therefore by Lemma 3.7, for any j ∈ B i 0 , By Proposition 3.7, The result follows by Proposition 2.1.

Directions for future research
Besides the problem of obtaining a sharp estimate for the asymptotic behaviour of the expected δ-waiting time in the case where the maximum in (1) is attained uniquely, several further directions of research suggest themselves. On the one hand, while this work helps to shed light on how the sequence (x n ) ∞ n=0 approaches the attractor set, we have not investigated the related question of how quickly the measures 1 n n−1 k=0 δ x k approach the self-similar limit measure m = N i=1 r i (S i ) * m (with respect to, for example, the Wasserstein distance) and this question may be of interest in future research. It is also interesting to ask how far these results may be extended to the context of iterated function systems defined by maps which are not similarities (such as affine or conformal differentiable transformations) and to cases where the open set condition is not satisfied. Finally, we note that there are analogous questions which make sense for deterministic chaotic dynamical systems. For example, if T : R/Z → R/Z is the doubling map T (x) := 2x mod 1, then for Lebesgue almost every x ∈ R/Z the sequence {x, T x, T 2 x, . . . , } is dense in R/Z. One could just as easily ask how the expectation with respect to x of the first integer n such that the sequence {x, T x, . . . , T n−1 x} is δ-dense in R/Z behaves as a function of δ in the limit δ → 0. For the doubling map T (x) := 2x mod 1 this question can be reduced via Markov partitions to the coupon-collector's problem, but for smooth expanding maps or even Anosov diffeomorphisms the details of such an argument are less clear.