This is an important issue. More specifically, we are interested in when
in other words when
If (6) is the case we are in a situation that, when using sufficient databases, we can reach (almost) complete coverage of the sought documents. Note that this is the case for all searches in Figure. 1 of Hood and Wilson (2001). We will, however, see that (6) (or (7)) is not always valid. In case (6) (or (7)) is not always valid, we have that:
and in this case, no matter how many databases we are searching, we will never come close to complete coverage of the sought documents. In the sequel we will give an example of both cases: one where we have (6) and one where we have (8). Note that in the special case (1) we always have (6), which shows that our extension of f to formula (3) has its merits.
First we will give some definitions on convergent or divergent products. They can be found in Apostol (1974) (p. 206–209). We limit our definitions to the case studied here.
Definition 1. Denote by pn the product
Then we say that this product converges if there exists a number p ≠ 0 such that . The number p is then denoted
If we say that the product diverges to 0 (hence the case (7) or (6), the most interesting case since we are able to retrieve most documents on the topic by taking n high enough).
We can give a characterization of convergent or divergent products of the form (10) by quoting a Theorem in Apostol (1974), p. 209.
Theorem 1. Since all ai satisfy ai < 1, we have that the product converges if and only if the series converges. This represents the case (8), hence where we are not able to come close to a complete coverage of the sought documents (no matter how many databases that are used). Complete coverage (as in (6)) is hence possible using the next Theorem, which follows immediately from Theorem 1.
Example 1. Let , i = 1, 2, …. Hence diverges and, according to Theorem 2, (i.e., diverges), so (6) and (7) are valid and complete coverage of the sought documents is (in the limit) possible. We can verify this directly. We have, for every n = 1, 2, …
Since we can illustrate “how fast” we approximate the 100% coverage. Take, for example, n = 10 databases, then we can cover , hence more than 90% of the sought documents.
Example 2. Let , i = 1, 2, …. Now we have that is convergent and hence the product is convergent (i.e., is ≠0). This means that (8) is valid and that we cannot approximate complete coverage of the sought documents. We can here, concretely, calculate what fraction of the sought documents can be covered. We have, for every n = 1, 2, …
and hence . This also implies that
so that we certainly do not cover at least 50% of the sought documents (no matter how many databases we will use). This is due to the small coverage of the sought documents of each database i = 1, 2,. … This example shows the interest in the general model (3) above the limited model (1) where always
Note that in both examples f(a1 … an) is concavely increasing since the sequence (ai)i=1,2… decreases and by Proposition 1.
Remark. Since all ai satisfy ai < 1, we have that convergence of also means absolute convergence. This also means that the series converges unconditionally, that is, it converges in any order of the databases i. More exactly, let π denote any permutation of the natural numbers, that is, a function whose domain is the natural numbers and whose range is the natural numbers and which is a bijection. Then convergence of implies convergence of (see, e.g., Apostol, 1974, Theorem 8.32, p. 196) and hence, by Theorem 1, the product converges (and is equal to ). Similarly, if diverges, then , diverges and hence, by Theorem 2, the product diverges (i.e., its value equals 0). This means that the coverage of sought documents, in the limit, is not influenced by the order in which we use the databases. Of course, for every finite n = 1, 2, … , the values of f(a1 … an) are determined by the used order of the databases.
Note: Considering an infinite number of databases is, of course, only a theoretical issue. Yet our results on complete/incomplete coverage (Theorems 1 and 2) yield insight in the finite case where there are n databases (n: natural number and high).