#### 3.1. ID

[12] We first gives a brief introduction of the ID proposed by *Liberty et al.* [2007]. Suppose **C** is a complex *m* × *n* matrix of rank *k* with *k* ≤ *m* and *k* ≤ *n*. There exist a complex *k* × *n* matrix **P** and complex *m* × *k* matrix **B** whose columns consists of a subset of the columns of **C** such that

1. some subset of the columns of **P** makes up the *k* × *k* identity matrix;

2. no element of **P** has an absolute value greater than 1;

3. ;

4. the least (that is the *k* − *th* greatest) singular value of **P** is at least 1; and

5. when *k* < *m* and *k* < *n*, , where *σ*_{k+1} is the (*k* + 1)-st greatest singular value of **C**.

[13] Based on these statements, an approximation can be derived as

when the exact rank of **C**^{m×n} is greater than *k*, but the (*k* + 1)-st greatest singular value of **C**^{m×n} is small.

[14] The ID employs randomness to reach the decomposition described in equation (3). It begins with generating a random vector *ω* with Gaussian distribution and forming the product of *y* = *ω*^{H}**C**, where the superscript *H* means adjoint operation. Vector *y* can be regarded as a random sample from the range of **C**. Repeating this sampling process *l*(*l* > *k*) times:

Owing to the randomness, the set *ω*^{(i)} : *i* = 1, 2, ⋅⋅⋅, *l* of random vectors form a linearly independent set and no linear combination falls in the null-space of **C**. Therefore, to produce an orthonormal basis of the range of **C**, we just need to orthonormalize the sample vectors by rewriting equation (4) into the compact form,

Employing some stable methods for performing the orthonormalization, such as the pivoted QR factorization, a *k* × *n* matrix **P** whose columns form an orthonormal basis for the range of **Y** can be obtained, such that

where the columns of **L**^{l×k} constitute a subset of the columns of **Y**. That is to say, there exists a set of integers *i*_{1}, *i*_{2}, ⋅⋅⋅, *i*_{k} that, for any *j* = 1, 2, ⋅⋅⋅, *k*, the *j*-th columns of **L** is the *i*_{j}-th column of **Y**. Collect the corresponding columns of **C** into a complex *m* × *k* matrix **B**, so that, for any *j* = 1, 2, ⋅⋅⋅, *k*, the *j*-th columns of **B** is the *i*_{j}-th column of **C**.

[15] The ID algorithm typically requires [*Liberty et al.*, 2007]

floating-point operations, where *C*_{H} is the cost of applying **C**^{H} to a vector.

[16] As shown by *Liberty et al.* [2007], *l* = *k* + 5 or *l* = *k* + 10 is sufficient. In practice, the rank *k* is rarely known in advance. The ID are usually implemented in an adaptive fashion where the number of samples is increased until the error satisfies the desired threshold *ε*_{ID} as discussed in Section 5.2. This will at most double the cost [*Liberty et al.*, 2007]. Due to the randomness used, the ID does have the possibility to fail. However, the possibility is very slim [*Liberty et al.*, 2007]. In a word, compared with the classical pivoted QR factorization, the cost is reduced a lot since we need only to factorize the small matrix **Y**.

[17] In some cases, it is more efficient to construct matrix **Ω**^{l×m} in such a manner that the resultant matrix consists of uniformly randomly selected rows of the product of the discrete Fourier transform matrix and a random diagonal matrix [*Liberty et al.*, 2007].

#### 3.2. Approximating Matrix by ID