Quantum verification and estimation with few copies

As quantum technologies advance, the ability to generate increasingly large quantum states has experienced rapid development. In this context, the verification and estimation of large entangled systems represents one of the main challenges in the employment of such systems for reliable quantum information processing. Though the most complete technique is undoubtedly full tomography, the inherent exponential increase of experimental and post-processing resources with system size makes this approach infeasible even at moderate scales. For this reason, there is currently an urgent need to develop novel methods that surpass these limitations. This review article presents novel techniques focusing on a fixed number of resources (sampling complexity), and thus prove suitable for systems of arbitrary dimension. Specifically, a probabilistic framework requiring at best only a single copy for entanglement detection is reviewed, together with the concept of selective quantum state tomography, which enables the estimation of arbitrary elements of an unknown state with a number of copies that is low and independent of the system's size. These hyper-efficient techniques define a dimensional demarcation for partial tomography and open a path for novel applications.


INTRODUCTION
In the coming decades, thanks to rapid technological advances, the probability of a new information revolution appears quite high. Quantum systems involving photons, atoms, spins, molecules, solid-state and optomechanical devices, even with the absence of perfect control and manipulation, are already promising candidates for building new applications aside from universal quantum computing. As difficult as it is to predict how emerging technologies will be most effectively applied, one can expect to see quantum technologies with a high degree of variability in architecture and capacity (as when classical computers emerged in the 1950s), the socalled noisy, intermediate-scale quantum (NISQ) [1]. Here intermediate-scale refers to the size of the quantum processors, in the regime of tens of qubits up to a few hundred in the next decade or so. Remarkable achievements in creating larger quantum states have already been reported [2][3][4][5][6][7][8] using different quantum platforms, from superconducting architectures to trapped ion systems and photonic setups. Moreover, impressive demonstrations (such as those of a computational quantum advantage) have recently been reported by several groups that used 53 [9] and 56 [10] superconducting qubits and up to 113 photons [4,11].
Such rapid development and demonstration of a quantum supremacy indicate that quantum information processing is sufficiently mature that another problem, quite aside from noisy quantum systems, has begun to make its presence known with increasing frequency. While it is all very well to coherently process quantum states that reside in an exponentially large space, it means little if one cannot retrieve and validate the results of such manipulations. So begins consideration for the metrology of quantum systems. The gold standard of quantum measurement is full state tomography [12], wherein complete knowledge about the state is gained via measurement. Though certainly sufficient, the complex-ity in both measurements and computational processing power grows exponentially fast with the dimension of the quantum system.
Given that our interest in quantum information processing is this rapid growth, inserting a step that requires exponential resources seems rather counterproductive. Until very recently, however, this exponential cost was largely irrelevant as our ability to rapidly measure or classically compute vastly outstripped our ability to perform meaningful operations with more than a few qubits. Thus, simply performing full state tomography and retrieving a complete quantum state was a viable strategy. This approach was only ever practical at the very small scale of NISQ and pre-NISQ however. In the long term, fault-tolerant and noise-resistant quantum computers ought to make a complete validation of the system less important but we are far away from such feats of quantum engineering, while still being capable of constructing large quantum systems. Thus a gap has appeared -systems are too large for anything nearing complete tomography but not advanced enough to assume low errors.
The advantages of a complete tomography are obvious. One need make no assumptions on any properties of the target system except that it can be repeatedly produced (reinitialised) and measured. The price of such ignorance is an exponential cost in measuring, reconstructing and storing the state of the target and is naturally unsustainable as we move into the intermediate regime. But such a problem has hardly taken the quantum estimation community by surprise and many strategies exist to mitigate such a heinous complexity cost. Often, complete information is not required in many cases and when married with random sampling techniques can result in powerful verification methods [13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29] (see also [30] for general review on the topic) that probe only some specific quantities one might wish to know about a given state. To name but a few, one might wish to investigate only the presence of entanglement in a certain quantum state [23,24] or directly esti-mate the state fidelity [31], i.e., the quantification of the overlap between prepared and ideal states. It follows naturally that reducing the amount of obtainable information comes with a lower demand in terms of experimental resources, thus making these methods more viable alternatives when the full density matrix is not needed. For clarity, we will explicitly define here that any interrogation of a quantum system which reveals information about that system is termed a partial tomography.
It appears then that a trade-off of some kind must occur. Complexity costs can be reduced in one regard but increased in another [30], essentially shifting the difficulty to another stage of an experiment, or we can reduce the information extracted. Ultimately, an explicit dimension dependence remains in most tasks and this serves as a problematic complication for large-scale systems. With this in mind, we concern ourselves with strategies that appear to saturate some notion of maximal information extraction, paired with a resource cost (at every stage of the protocol) that has moderate growth in the dimension. This suggests a different mode of thinking may be in order. Rather than asking how large a quantum system we can effectively probe with a given strategy, consider instead the central question of this review: Given a limited number of interactions with a large system, how much classical information can we learn with a high degree of certainty?
This extracted classical information can take many forms and one must be careful of the kinds of questions one asks. Consider the task of entanglement detection, which may be performed indirectly by estimating the mean value 〈W〉 of an appropriate witness W and comparing it to some threshold value W c , which requires repeated measurements on large ensembles of identically prepared quantum states. An alternative to this is a direct approach by an oracular question "Is 〈W〉 < W c ?", which potentially can be queried with a single copy. For detecting entanglement they of course produce the same answer, but estimating the expectation value is far more resource-intensive than bounding it from above in the first place. The benefit of doing so is clear, however, the question then is how to operationally reformulate the former into the latter. This process of reformulation is one of the central topics that shall form this review.
Such thinking engenders a curious divergence from the norm of quantum metrology wherein both the dimension of a system and the number of copies are seen as a given and large. On the other hand, this decision-theory centric approach, that has estimation as comparable to traversing a finite tree of outcomes to arrive at a final conclusion has been shown [22][23][24][25][26][27][28][29] to yield vastly improved complexity bounds for previously challenging measurement tasks.
By rephrasing the problem of verification in this decisiontheoretic way we define our starting condition as the resources of an efficient strategy, such as a limited number of state copies, and then list measurement protocols that operate within these constraints. As an illustration of the method, consider testing some property with N copies available, where N is potentially low (e.g., few copies). Each copy may then be considered as a precious resource for measurements we are permitted to ask a quantum system in order to ascertain its properties. For example, we wish to test if the state ρ ∈ A or ρ ∈Ā (with A ∪Ā being the complete set of states) where A denotes the property being tested (as in Figure 1). An efficient strategy is one where the queried system is overwhelmingly unlikely to pass a test condition if it does not contain the queried property A.
The strategy is as follows. A set of carefully designed and easy-to-perform measurements Q = {Q 1 ,Q 2 ,Q 3 . . .Q L } that serve as queries to the system are constructed. For the kth instance of the N copies of a state, a query q k ∈ Q is randomly chosen and applied to that instance, producing a sequence of query outcomes i = (i 1 , ..., i N ) for i k ∈ {0, 1}. This sequence together with the sequence of chosen queries q = (q 1 , ..., q N ) is then passed to a decision (cost) function S(q, i) which produces a pass/fail result. We define a strategy to be efficient if it satisfies the following probabilistic expression holds for a dimension d state ρ with N repetitions (queries). This deceptively simple equation is at the heart of every strategy considered in this review. Conceptually it states that any estimator is only as good as its worst-case performance which is dictated by its probability of failure, defined as a system passing a test protocol that it should fail. If this false positive probability has a functional dependence α(d, N) that grows in N and does not vanish asymptotically in d, for example typically α(N, d) = O(1)N is dimension free, then failure is exponentially unlikely for all targets of the protocol and it is deemed efficient. This concept is schematically depicted in Figure 1, where the probability that the target state ρ contains the property A builds exponentially fast with the number of questions Q k that are asked to repeated copies of ρ.
Conventionally, verification problems are distinguished from estimation problems. In past years there is a however an opposing trend attempting to integrate both into a unified, information-theoretic framework [30,32]. In this respect, every partial tomography task (on finite-dimensional systems) may be posed in the decision theory point of view introduced here. To clarify this point, consider verification of certain property (e.g. presence of entanglement), the sampling complexity depends only on the required confidence 1 − δ, typically O(log δ −1 ) samples is required. On the other hand, we shall consider shadow-tomography like tasks where typically one is interested in estimation of mean values of certain set A 1 , . . . , A M observables [28]. To embed this problem into the decision procedure one fixes the confidence 1 − δ and error and poses the estimation as a yes or no procedure: given a set of observables A 1 , . . . , A M , do their mean values lie within an interval from some (estimated) value? The set of queries Q k is adapted to encompass the set of inequalities | 〈A n 〉 − A n,e | < , with 〈A n 〉 being the ground truth and A n,e the estimated value. Assuming a good estimator, if we have preset the error value and confidence 1 − δ, then the procedure returns a binary outcome together with the set of estimates {..., A n,e , ...}. The sampling complexity ranges from O(log M −2 log δ −1 ) for protocols such as those engendered by shadow tomography to O(d 2 −2 log δ −1 ) samples required . .

O(d 2 )
Full state tomography Figure 1. Schematic of the probabilistic procedure. The probability P r that the quantum system contains the property A is found by asking relevant questions Q k to the system. A probability close to 1 is indicated by a dark region, in contrast to a probability close to 0, associated to a lighter colour. Asking more and more questions builds up the probability that the system contains A.
for full state tomography (see Figure 1 to the right). Thus verification and estimation in this framework can be put on equal footing with the main difference being the inputs to the protocol (confidence 1 − δ for verification VS confidence 1 − δ and error for estimation) and their respective outputs (estimation procedure returns the set of estimates in addition to the binary yes/no output).
In a similar spirit, we require this demarcation not just in time but space as well, insisting on simple-to-implement queries on each quantum state. This will almost always mean local queries on the target system alone, rather than for example global (entangled) measurements on multiple instances. Finally, the computation of the decision function S(q, i) itself must also be efficient, in that it cannot have a computational complexity that depends on the system dimension in any significant way. To summarise our requirements: 1. Dimension demarcation: α(d, N) is not asymptotically small for large d, for example α(d, N) = O(1)N.
2. Fast convergence in the number of queries: α(d, N) grows with N for example, typically linearly.
3. Low computational complexity where the measurement queries Q k are implemented by local measurement or low-depth quantum circuits.
4. Simple post-processing, e.g. simple evaluation of the decision function S(q, i).
This review will progress through query/answer strategies that satisfy these demanding properties in the following way. Section 2.1 constructs an explicit probabilistic detection scheme in keeping with the above framework. Section 2.2 considers what tasks may be performed using this protocol with the minimum access to a quantum state, converging on an entanglement verification protocol that uses only a single copy of a quantum state. Section 2.3 relaxes the single copy regime to that of dozens, observing the increase in information extraction possible in an experimental setting. Section 2.4 gives a brief summation of related works, accentuating the extension of our method to quantum state verification and certification. Section 3.1 considers the limit of the few-copy regime, considering the maximal amount of information one can extract from any quantum state, of any size, given a fixed number of samples. Finally, Section 4 contains a recapitulation of all important points, addressing works that go beyond techniques mentioned in the review and discusses open questions.

ENTANGLEMENT VERIFICATION
In searching for worthwhile tasks, it is not a contentious statement that entanglement represents a crucial resource in many quantum-information protocols [33]. For this reason, the task of entanglement verification has by necessity spurred the development of a variety of different approaches over the past years [34]. Traditionally, the methods of detection (see [34,35] for a focused review) rely on the estimation of expectation values of observables linked to certain fundamental inequalities, such as is the case of entanglement witnesses [35][36][37], Bell inequalities [38][39][40] or the use of quantum Fisher information [41][42][43], local uncertainty relations [44] and nonlinear witnesses [45].
Typically, strategies will involve testing if (some function of) the expectation value(s) of some observable(s) exceeds a certain threshold, such as testing if 〈W〉 < W c and demanding, in practice, repeated measurements on large ensembles of identically prepared copies. This can be costly in terms of experimental requirements, scaling to impossibility with just a few steps as in photonic systems where coincidence rates fall exponentially fast in the system size [46]. An impressive yet example of this may be found in a recent 12-photon en-tanglement witness experiment [47], where the detection rate was approximately one copy per hour. The extraction of a mean value of a single local observable, which typically requires one hundred to one thousand copies of a given quantum state, in this case, translated to an experiment duration measured in weeks. Such non-viability is a consequence of the indirect approach for testing entanglement. If instead we employ the direct method in which we pose the detection question differently, i.e., to ask: "What is the chance for the system to achieve a value W < W c in a single-shot experiment?", we can gain a vast reduction in the detection complexity. In this respect, we will review several highly efficient methods [23,24,26] based on the information-theoretic framework introduced in the previous section.

Probabilistic detection scheme
Consider a quantum system consisting of n subsystems, each residing in a finite-dimensional Hilbert space of dimension d. The first step in any partial tomography is to define the relevant set of queries Q m that will be used to interrogate the system -as no information may be gleaned without them. Commonly these correspond to certain binary local measurements associated to yes/no questions. For the sake of generality, we include here the quantum measurements that go beyond binary logic, that is, the positive-operator valued mea- Here k labels the subsystem, m ∈ {1, ..., L} the local measurement setting, and i is the measurement outcome. For every subsystem, we can generate one random query associated to the setting m k which when applied to the kth party results in some outcome i k .
The probabilistic entanglement detection procedure, schematically shown in Figure 2, goes as follows: 1. A sequence of random local measurements (m 1 , ..., m n ) drawn from a prior distribution Π(m 1 , ..., m n ) is applied to a copy of quantum state ρ to generate the sequence of outcomes (i 1 , ..., i n ).

A certain binary cost function of settings and outcomes
3. If S [n] = 0/1 we associate "success/failure" to the experimental run.
The figure of merit for entanglement detection is the probability of success P[S [n] = 1]. In essence, the cost functions are created such that this probability vanishes exponentially fast in the size of the system n and/or in the number of repetitions N for all separable states ρ se p : where α(n) is a function depending on the particular strategy and system's size. On the other hand, the procedure is tailored to detect entanglement in the vicinity of some target state ρ T , i.e., P ρ T [S [n] = 1] ≈ 1, thus, given the target-state preparations and desired detection confidence 1 − δ, we can estimate the average number of copies required to verify entanglement: It is abundantly clear that as long as α(n) is not vanishingly small with the size n, for example, α(n) = O(1), we will have a logarithmic growth of the number of copies in δ. Considering it in the opposite direction: the confidence for entanglement detection grows exponentially fast in the number of repetitions N which constitutes what we dub the few-copy detection regime [24] where we achieve the high confidence detection by measuring only (thus the name) a few copies of the system (see Section 2.3).
The reduction of resources can be further traced down in the case where α(n) grows in n. In this case, for a sufficiently large system (large n) this number is reduced to the logical minimum leading to the single-copy detection [23,48]. This possibility is presented in detail in the next section.
An important aspect of these methods is that they bypass the so-called i.i.d. (independent and identically distributed) assumption taken for granted in standard approaches. This assumption means that a source produces identical copies of a quantum state in every experimental run. This is very questionable from a practical point of view, especially given the lack of perfect control and manipulation as is the case for NISQ systems. In contrast, the shown methods surpass i.i.d. through use of random sampling a set of measurement queries. In this case, the entanglement is seen as the ability of a system to compute a certain cost function (as quantified by the probability of success) in a single-shot experiment. In such a construction of the problem, the i.i.d. requirement may be relaxed without compromising the protocol.

Single-copy scenario
We review the construction of the single-copy detection procedure for k-producible states [49] which naturally extends to cluster states [50]. Further examples include ground states for local Hamiltonians with the entanglement gap [51], among which we find many important classes of quantum states, such as the matrix product states [52] and projected-entangled pair states [53]. In all examples provided we explicitly constrain to a single experimental repetition (N = 1) and attempt to optimise the chance of entanglement detection. We put the main emphasis on the construction of protocol, i.e., appropriate choice of the settings and cost function.

Example of k-producible quantum state
We start with the example of the k-producible entangled state [49], i.e., φ 1 φ 2 . . . φ m , where the products φ s in- A single copy of an n-partite quantum system ρ is repeatedly interrogated via random (local) measurements m 1 ,m 2 ,...,m n . The performance of the system is measured via the evaluation of a cost function S [n] . Repeating this procedure N times, the probability of detecting entanglement goes to unity exponentially fast in N for target state preparations, i.e., the (lower bound on) detection confidence grows as volve at most k parties. 1 Our aim is to show that entanglement can be detected with one copy of an n-folded state as long as n is large. To clarify the probabilistic procedure even better, we take the target state to be the product of quantum singlets The quantum singlet has the property of being the only state that returns perfect anti-correlations (the outcome −1) when measured with one of the operators X ⊗ X , Y ⊗ Y , or Z ⊗ Z. Therefore, the suitable measurements to identify singlet uniquely are the following projectors The pertinent fact is that no separable state may reveal Q X = Q Y = Q Z = 1 simultaneously; as already emphasised, this is the unique property of the target singlet state. Thus, the maximum probability to obtain the outcome 1 for all separable inputs if measurement settings are uniformly sampled from the set {X X , Y Y , ZZ} is 2/3: for all separable two-qubit states ρ se p . With this we can construct detection procedure for n pairs as follows: the set of 2n qubits is divided into consecutive pairs and for each pair, a random measurement from the set {X X , Y Y , ZZ} is applied to get a sequence of results ..., (i k , j k ), .... From these measurement outcomes we construct the following local cost function for every pair S k = 1 2 1 − (−1) i k + j k , where k = 1...n labels the qubit pair. Now, given bound (5), the relative frequency of the outcome 1 shall not exceed 2/3 significantly for all separable states. Therefore, we define the overall test to be where > 0 is a free parameter. The overall probability of success reads Using the standard Chernoff bound [54] we obtain: where The probability of success vanishes exponentially fast in n for all > 0. The procedure is convenient as we do not have to set in advance, i.e., we calculate as the experimental deviation of the measured sum 1 n n k=1 S k from the separable bound 2/3.
In the perfect case of n singlets ψ 0 = ψ − ⊗n , we shall measure S k = 1 deterministically, thus we find that = 1/3.

The bound (8) becomes
Therefore, if n is large enough, a single copy of ψ 0 is sufficient to certify entanglement with high probability. For example, already for n = 8, the confidence level for entanglement detection is at least 96%. Before we proceed further, let us illustrate the i.i.d. issue in following situation. Suppose that we have only n = 8 qubit pairs at our disposal and we want to inspect the presence of entanglement. Given the prescription above, we may try to measure the witness operator W = 1 3 (Q X + Q Y + Q Z ). However, it is not clear how to divide 8 pairs into three groups to measure three local observables Q X , Q Y and Q Z . Also, there is no guarantee for these pairs to be in an i.i.d. state ρ ⊗8 12 which seems to be needed for separate estimation of 〈Q X 〉, 〈Q Y 〉 and 〈Q Z 〉. In this case, it is not clear how to proceed. For example, we may use the first three copies to measure Q X , the second three to measure Q Y , and the last two for the measurement of Q Z . However, if the order of measurements is known in advance we may arrive at false entanglement verification: the following product state ⊗2 gives exactly the same result as the i.i.d. singlet state ψ − ⊗8 for these fixed measurements. The key procedure to surpass i.i.d. assumption is random sampling and the probabilistic detection described above. It provides a clear separation between the state ψ − ⊗8 and the product state φ p , as the later has only the chance of (2/3) 8 ≈ 0.039 in the best case to reveal the result S 1 + · · · + S 8 = 8. In contrast, the experiment with the singlestate preparation ψ 0 reveals "success" always thus we verify entanglement with at least C min = 1 − 0.039 ≈ 0.96 confidence.

Single-copy detection of cluster states
Another example we present here is that of cluster states [50] as a natural generalisation of the previous example of k-producible state. In contrast however, cluster states contain genuine multiparty entanglement [55] and represent a universal resource for measurement-based quantum computation [56]. For simplicity, we work out in detail an example of a linear cluster state (LCS); generalisations of the scheme to higher dimensions are straightforward and briefly discussed at the end of the section.
The n-qubit LCS is uniquely defined by the set of 2 n stabilizers where G k = Z k−1 X k Z k+1 and q k = 0, 1. Here {X k , Y k , Z k } is the set of standard Pauli matrices acting on kth qubit and without loss of generality we have chosen the cluster state with periodic boundaries, i.e., Z n+1 def = Z 1 and X n+1 def = X 1 . Let us analyse a small sub-cluster of four qubits (e.g. qubits {1, 2, 3, 4}) with the corresponding stabilizers acting exclusively on it. Even though these three stabilizers are commutative, they are not locally compatible, which means one can not measure all three simultaneously with local measurement. Therefore, there is no separable state for which G 2 = G 3 = G 2 G 3 = +1 simultaneously. Consequently, if we randomly chose to measure one of the stabilizers (with probability 1/3), there is only a chance of 2/3 to get the result +1, for all separable inputs. This observation empowers our detection method to work. The strategy is to pick a random partition of the set of n qubits into 4-qubit clusters and then measure one of the corresponding stabilizers randomly on each of them.
Given our previous analysis, the relative frequency of the outcome +1 can not substantially surpass the value of 2/3. It is convenient to introduce regular partitions (i.e., neighbouring clusters overlap on at most one qubit) of n-qubit cluster state into L-partition of 4-qubit clusters {c t 1 , c t 2 , . . . c t L }, where c t k is the cluster consisting of the sequence of four neighbouring qubits: The set of all regular partitions of size L is denoted by C L .
For every cluster c t k in the partition we associate three stabilizers: To each of them we associate three projectors projecting on the +1 outcome. To these we associate the following measurement settings {Z X ZZ, ZZ X Z, ZY Y Z}, and we assign "success" to the cluster measurement only if the outcome +1 (for the value of measured stabilizer) occurs. Formally speaking, for every cluster we define the following local cost function Finally, for a given partition {c t 1 , c t 2 , . . . , c t L } the overall cost function is represented in the following way where > 0 is a free parameter. We associate "success" to the experimental run if the number of local successes exceeds a certain threshold of ( 2 3 + )L. The detection procedure goes as follows: 1. Randomly generate a partition of n-qubit cluster state {c t 1 , c t 2 , . . . , c t L } from the set C L with probability 2. Draw one measurement setting for each cluster in the partition with probability 1/3.

3.
Perform local measurements and collect the sequence of results S 1 , S 2 , . . . , S L .
We shall analyse the probability to pass the test for separable states. Firstly, for all product states the local cost functions S k are independent binary random variables with 〈S k 〉 ≤ 2/3 for all k = 1 . . . L. The overall probability of success reads (17) which is the probability that the sum of independent random variables S 1 + · · · + S L exceeds the value of ( 2 3 + )L. As 〈S k 〉 ≤ 2/3, the sum S 1 + · · · + S L cannot exceed 2/3L significantly. Indeed, as before, the Chernoff bound holds (for detailed proof see Supplementary Information of [23]), i.e., where D(x||y) is the Kullback-Leibler divergence. As the bound holds for all product states, it also holds for their mixtures, i.e., for all separable states. On the other hand, for the case of cluster state preparation |LCS〉, each local cost function takes the value S k = 1, thus we have = 1/3. The bound (18) reduces to For the sufficiently large number of qubits even a single-copy of the LCS suffices to certify the presence of entanglement with high probability. For example, already for n = 24, we have L = 8 which gives a confidence level greater than 95%.
Finally, let us comment briefly on the generalization to the higher dimensional case. In the case of a 2D cluster state, one can introduce partitions into 4×4 qubit clusters with the corresponding stabilizer projectors (using complete analogy to Q t k , W t k and R t k for LCS) and define the local cost functions. The 2D detection scheme also consists of drawing a random partition followed by a random measurement of local projectors on individual clusters. The separable bound similar to (18) can be derived. On the other hand, if the 2D cluster state is the input state, the probability of success is 1.

Single-copy detection of ground-states of local Hamiltonians
One of the strong reasons why the single-copy entanglement scheme works for the cluster states is the robustness of entanglement to local perturbations, meaning that local measurements on qubits do not destroy the entanglement between the remaining qubits completely. Thus one can expect other classes of states sharing this property to admit single-copy entanglement detection. The ground states of local Hamiltonians share this property [57]; therefore they are good candidates. Let us consider a L-local Hamiltonian on some graph of n particles H = n k=1 H (k) , where H (k) acts on at most L subsystems (L is fixed and independent of n). Now, let ψ 0 be the ground state of the Hamiltonian H ψ 0 = n 0 ψ 0 , where E 0 = n 0 is the ground-state energy. We are working with Hamiltonians that exhibit the so-called entanglement gap g E = se p − 0 > 0, where se p = 1 n min ρ se p TrHρ se p is the minimal obtainable energy per particle by a separable state [51]. The main idea of the procedure is to use mean energy 〈H〉 as an entanglement witness: 〈H〉 ≥ n se p holds for all separable states, while at least the ground state violates this bound. This fact can be exploited to develop an efficient probabilistic procedure by employing a tomographically complete set of measurements. In this case, the operator H translates into a classical random variable H [n] which serves to witness entanglement in practice (the general procedure is explained in detail in the next Section 2.3). The central object for our detection protocol is then the following overall cost function: where 0 < δ < se p − 0 = g E is a free parameter. Since 〈H〉 ≥ n se p holds for all separable states, for the case of n being large, H [n] is unlikely to precede the separable bound n se p in a single-shot experiment. Indeed, analogously to the previous two examples, one can derive the Chernoff bound for all separable states: where κ > 0 is constant. Thus, for all separable inputs, the probability of success vanishes exponentially fast with n. In contrast, for the ground-state preparation ψ 0 , the probability of success reaches 1 in the thermodynamic limit, as it follows from the following bound: where β > 0 is constant. The first inequality (21) is the consequence of the McDiarmid's inequality, while the second (22) is derived by using the Chebyshev's inequality. Both bounds are rigorously derived in the Supplementary Information of Ref. [23].

Tolerance to noise
In the end, we briefly comment on the effects of noise on single-copy entanglement detection. Consider a n-partite target state ρ 0 which passes the single-copy test with probability p 0 . In practice, one needs on average 1/p 0 copies of ρ 0 to detect entanglement. On the other hand, let the separable bound hold, meaning that the probability of success for all separable inputs is exponentially small in n. We consider a mixture ρ = λρ se p + (1 − λ)ρ 0 , where ρ se p is an arbitrary separable state and parameter 0 < λ < 1 quantifies the amount of noise. The overall probability of success is a mixture of probabilities P ρ = λP ρ se p +(1−λ)P ρ 0 ≈ (1−λ)p 0 , as long as (1−λ)p 0 is significantly larger than P ρ se p = O(exp[−nc]). This implies that noise impacts detection by suppressing the probability of success by a factor 1 − λ, for any kind of noise representable by a separable state. Therefore, one requires on average 1 (1−λ)p 0 experimental runs to confirm the presence of entanglement. This represents a strong resistance to noise as long as (1−λ)p 0 is not exponentially small in n. For example, if we consider (1 − λ)p 0 > 0 constant and independent of n, then we verify entanglement with a fixed cost in terms of the number of samples. This described scenario is very different in comparison with conventional detection techniques. Generally, a witnessing method tolerates noise below a certain critical point, i.e., λ < λ c , meaning that if noise passes the threshold, the scheme fails to detect entanglement.

Entanglement detection with a few copies
In this section we review an entanglement detection method where the required number of copies grows logarithmically slow with the confidence as shown in equation (3). The main goal of this section is to translate one of the most common methods for entanglement detection, that is, the one based on entanglement witnesses [36,37] (see [35] for concise review), into an efficient framework that requires only a few experimental repetitions.
What makes the witness-based technique practical is the simplicity of its detection criterion, based on a simple mean value estimation of a single (witness) observable. Specifically, an observable W is designated a witness if 〈W〉 = Tr(Wρ sep ) ≥ 0 for all separable states ρ sep , while 〈W〉 < 0 holds for at least one entangled state. In principle, we can construct an entanglement witness for every entangled state ρ (theorem of completeness of witnesses [58]), which is then used to detect entanglement in a target state. While straightforward, a drawback of the method is that the witness W cannot be accessed locally, instead it must be decomposed into a sum of local observables W = L i=1 W i that must be individually estimated. This means that the mean value 〈W〉 is obtained from the 〈W i 〉's, each of which is measured in an independent experiment. The sampling complexity of the procedure is therefore dependent on the number of local terms L, which become a significant factor for generic witnesses on a large system. To overcome this problem, remarkable effort has been put into constructing entanglement witnesses whose measurement requires a smaller number of measurement settings, thus reducing the experimental requirements [59][60][61][62] (for more references, see recent review [34]). For example, refs. [63,64] find optimal decompositions of entanglement witnesses into a few local operators, even reducing in some cases the witness decomposition to only two local operators [65]. However, even with a minimal number of measurement settings, this method may become inconvenient or even unfeasible simply due to the lack of sufficient number of copies of the resource state needed to extract the witness expectation value. In such cases, alternative methods going beyond mean-value extraction are required. We review here the general method developed in Ref. [24] that translates the witness method into a resourceefficient probabilistic framework described in Section 2.1. In this scenario, the typical procedure achieves very high confidence in entanglement detection with just few experimental repetitions (copies of target state). As we shall see, the number of measurement settings involved into the local decomposition is not the crucial parameter determining the sampling complexity, in contrast to the standard belief [65]. We also re-view an experiment performed with a photonic system to test the practicality of the method [66].

Embedding entanglement witnesses in a probabilistic detection framework
The aim of this section is to review the translation of any entanglement witness into the probabilistic framework. As previously discussed, an entanglement witness W is normalised such that for all separable states ρ s . On the other hand there exists at least one entangled state ρ for which 〈W〉 = Tr(Wρ) < 0. The witness operator is normally tailored to detect entanglement in the vicinity of some target state for which 〈W〉 reaches the lowest possible value. We shall slightly change the general form of W and introduce the witness operator O in the following way: thus equation (23) which is positive semi-definite operator. Inequality (25) translates to the new condition: 〈O 〉 s = 〈O〉 s + αL ≤ γ s + αL (26) for all separable states ρ s . We now write the spectral decomposition of O i in terms of eigenprojectors (i.e., binary observables) M ik as O i = J i k=1 λ ik M ik , where λ ik ≥ 0 because O i 's are non-negative. Here J i counts the non-zero eigenvalues of O i . Since O i are local, M ik can be as well chosen to be local operators. To simplify the notation, we define the constant for all separable states ρ s . The last formula completely determines a probabilistic procedure for detection. Namely, since ik µ ik = 1 and µ ik ≥ 0, these numbers are sampling probabilities for local binary observables M ik . The LHS of the equation is just the probability of success to get M ik = 1, while the RHS is the corresponding separable bound p s . On the other hand, for target state preparations we have violation of separable bound (26) which directly translates to a different probability of success (the entanglement value) p e = (γ e + αL)τ, with γ e > γ s or equivalently the deviation p e − p s > 0.
To summarise, the procedure consists of the following steps: As before, we do not expect S [N] to significantly exceed the separable bound p s for all separable states, which is encapsulated into the following Chernoff bound On the other hand, for target state preparation we expect S [N] ≈ p e and thus the average number of target-state copies needed to achieve some fixed confidence C = 1−δ is estimated as This number grows in a logarithmic fashion with the required confidence and as we shall see from examples below, only a few copies are needed to detect entanglement with a very high confidence.

Example I: Projective witness for graph states
Consider the standard projective witness for a graph state |G〉 [35]: tailored for detection of genuine multipartite entanglement. This witness comes already in the form of (24), and it is therefore straightforward to identify the parameter γ s = 1/2 and the observable O = |G〉〈G|. We also have the local decomposition O = 2 n i=1 S i /2 n , where S i are stabilizers of state |G〉 and are in general tensor products of the Pauli operators [67]. One can therefore easily identify L = 2 n and O i = S i /2 n . The operators O i have to be shifted for α i = 1/2 n to get non-negative observables O = 2 n i=1 (S i /2 n +1/2 n ). These are already in eigenform, thus we have J i = 1, τ = 2, λ i = 2/2 n and M i = (S i + 1)/2. The sampling probabilities are 1/2 n and the separable bound is calculated from On the other hand, for the target state preparation ρ T = |G〉〈G| we have thus the entanglement value reads p e = 1. To estimate the number of copies, we can choose, for example, a confidence of 1 − δ = 0.99. Equation (29) gives us N ≈ log(1 − 0.99) −1 /D(1||3/4) ≈ 16, which is a notably small number. A naive approach of measuring all 2 n observables M i independently will quickly become unfeasible, while with the probabilistic detection we achieve the same confidence with a constant number of copies, regardless of the system size.

Example II: witness requiring two local measurements
The second example we consider is the witness tailored to detect entanglement in n-qubit cluster state |C〉 presented in Ref. [65] (an equivalent example is also presented for the GHZ state which in full analogy can be adapted here). An optimal witness decomposition for detecting genuine multipartite entanglement requiring only two measurement settings is found: with i = 1, ..., n. The observables G i are called generators of the state (in this case the cluster state |C〉), and constitute a subset of the stabilizing operators S i . To translate this witness, we can apply the procedure described in Subsection 2.3.1. Firstly, we easily identify γ s = 3 and O = We notice that O is already decomposed into two non-negative binary observables M 1 = even i 1+G i 2 and M 2 = odd i 1+G i 2 and the sampling probabilities are 1/2. The separable bound is given by On the other hand, the target state preparation returns p e = 1 and the estimated number of copies entirely matches the analysis provided in the previous example. From here we see that although the projective witness (30) involves exponential terms in the local decomposition, it performs equally well as the witness with two settings only.

Generic witness
In the last two examples, the sampling complexity was completely independent of the system size: the average number of required copies solely depends on the required confidence for entanglement detection. However, we cannot expect such size-free behaviour in the general case. The key parameter dictating the scaling behaviour is the deviation between entanglement value and separable bound p e − p s , which can become asymptotically small with the size of the system. To illustrate this, we consider the example of the following witness constructed to detect entanglement in the vicinity of the state stabilized by the set S 1 , ..., S n [68]. The translation procedure is very straightforward in this case resulting in the following separable bound where M i = (1 + S i )/2, while for the target state preparation we have p e = 1. In this case, the estimated number of copies . For large n this can be approximated with N ≈ n log δ −1 , which defines a linear growth in the system size. In the general case, supposing that 0 = p e − p s is asymptotically small in n, then we have two regimes: if p e = 1 formula (29) gives N ≈ log δ −1 0 , while for p e < 1 this scaling becomes qudratically worse N ≈ 2p e (1−p e ) log δ −1 2 0 . For a generic witness, as long as −1 0 = poly[n], the procedure remains efficient.

Experimental scenario
The theoretical framework presented above was tested in the experiment presented in Ref. [24]. The setup was designed to produce the following six-photon cluster state  (30) and (33). The binary observables M i defining the witness W 1 were sampled N = 160 times, while the M i constituting W 2 were drawn N = 150 times. The observed deviation = S [6] − 3/4 from the separable bound was plugged into equation (28) to put the lower bound on the confidence for entanglement detection. Figures 3b, c provide the experimental plots for the two witnesses. In the case of witness W 1 , the plot in Figure 3b shows that only 50 copies of the experimental state are needed to verify genuine multipartite entanglement with at least 0.97 confidence, and that 112 suffice to reach at least 0.99. In the same way, using the witness W 2 , it is visible from Figure 3c that 126 copies are enough to reach a confidence of at least 0.97. The deviation from the expected theoretical values are due to experimental imperfections that lead to a limited fidelity of F ≈ 0.75.

Related work
Probabilistic detection techniques similar to those presented here can be found in several other works. In the context of Bell's inequalities, similar kinds of probabilistic protocols are constructed for the single-shot non-locality detection [69] and entanglement detection via preparation games [70]. In the context of quantum state verification [18,[71][72][73], a single-shot entanglement verification naturally arises in bipartite states as long as the dimension of marginal systems becomes sufficiently large [74,75]. The generalisation to the GHZ states can be found in [76]. These results show a more intimate relation between our probabilistic detection and quantum state verification protocols. This is supported by the fact that our probability of success (to calculate the cost function) is usually maximised to 1 for the target state, thus the correct set of outputs does not witness only the presence of entanglement, it also indicates that the preparation state is close to the target state. Therefore, it seems that our protocols naturally extend from entanglement detection to more informative quantum state verification without significantly increasing the cost in terms of resources. Given this relation, we will review in what follows the basics of quantum state verification and its recent extension to the device-independent scenario and quantum state certification [27].

Quantum state verification and certification
The quantum state verification (QSV) is a protocol that verifies if an unknown input state is close (in fidelity) to some target state. Due to its simplicity and low complexity, it has recently attracted a lot of attention in the community, and several verification protocols have been constructed for various classes of states [20,26,73,[77][78][79] together with experimental demonstrations [71,80,81]. From the theoretical point of view, QSV plays an important role in protocols such as blind quantum computation and quantum networks [82][83][84][85][86][87][88][89].
In this section, we recall the framework for QSV as defined by [18]. The main goal is to verify if a sequence of states S N = {σ 1 , · · · , σ N } is close to the target state σ = ψ ψ by using only local measurements. The measurement strategy labelled by Ω thus consists of L different local measurements {M i|m }, where m ∈ {1, · · · , L} labels the setting and i ∈ {0, 1} the binary outcome. In the k-th round a measurement from Ω is randomly sampled (with probability p k ) and applied to the state σ k . We say that the state σ k passed the round if it returned the output i = 1. Otherwise, we say it failed. The first time a round is failed the process is aborted. The measurements are chosen such that the strategy operator Ω = m p m M 1|m is uniquely optimised for the target state: Ω ψ = +1 ψ , meaning that only target state passes all verification rounds with probability 1. Under the premise that all emitted states are either ψ σ k ψ ≤ 1 − away from target state or all of them are actually target states σ k = ψ ψ , one can derive the average number of tests N = log δ −1 ν(Ω) needed to achieve the confidence of 1−δ. The value ν(Ω) is the so-called spectral gap which is the second largest eigenvalue ofΩ.
The sampling complexity of the QSV is only up to a constant factor optimal in error , as the best strategy is achieved for the projection on target state measurement { ψ ψ , 1 − ψ ψ } resulting in ∼ log δ −1 scaling. While this is a remarkable result, the downside of the QSV scheme as proposed by [18] is its impracticality, i.e., the verification condition of all states either being 1− away from the target or all being target states. Such assumption is very hard to justify operationally and extremely hard to achieve in laboratory [80]. In our recent work, we relax this assumption and we fully adapt the protocol to device-independent (DI) quantum state verification [27]. In this case, all devices are not characterised nor trusted and all operations are treated as black-boxes [90][91][92][93]. Remark- ably, we have shown that the optimal scaling of N translates to the DI scenario. The scheme is more practical as it tolerates O( ) failure events during the verification process without losing the optimal scaling. A general drawback of QSV is that the verification process destroys the quantum resource and the conclusion is made about the resource which is fully consumed. This prevents the possibility of using it for other protocols and further processing. The solution to this problem is found in quantum state certification: a protocol in which a fragment of the resource copies is measured to authorise the rest of the copies. The pioneering quantum state certification protocols are developed in [74,75]. In these works, one explores permutation symmetry and measures all but one copy, which then serves as a certificate. The protocol is very powerful as it applies to a generic adversarial scenario, but it unfortunately consumes O(N) resources to certify a single copy only. Our new approach on DI QSV developed in Ref. [27] fully extends to quantum state certification. There a reliable certification scheme is provided for the case of independent copies to large certificates, e.g. consisting of O(N) copies. Unfortunately, the full adversarial scenario is still unresolved and remains for future investigations.

UNIVERSAL DATA RECORDS AND PARTIAL TOMOGRAPHY
We have seen that in the limit of tens of copies, one can still construct powerful techniques that extract surprisingly sophisticated conclusions on unknown quantum data. We end this review by considering the limits of such extraction, that is, what is the limit of information one can know about an arbitrarily sized state given a constant number of copies of that state? Many things we might wish to know may be formulated as some kind of partial tomographic task, for example "Is the state entangled?" or "Is the state -close to some target quantum state?". With so many possible questions for the same target, we might ask ourselves how many of them can be determined in parallel? Can it be done efficiently or rather, can we accurately extract multiple classical features from a moderate number of data samples? Suppose now we introduce another "resource" to manage in our pursuit of efficient query protocols; our own indecision. If we do not a priori know what classical information we wish to extract from our target quantum system, what choice of measurements maximises our knowledge at a later point? Since our choice is made a posteriori, then all possible questions we could ask a state are equally possible and so we must take some kind of universal data samples that best approximates the space of all possible future queries. This is certainly achievable via full state tomography. Tomography schemes abound that aim to attack the difficulty of this task through this tactic, however a seemingly unavoidable fact of estimating properties of an arbitrary density operator is the required polynomial number of measurements in the dimension d. More precisely, achieving an absolute error in the estimation of an unknown density matrix requires at least O(d 2 −2 ) [28] copies of a quantum state. This has to be combined with post-processing which requires storing and manipulation of exponentially large matrices. Such tasks is certainly beyond the scope for large quantum systems.
However, full state tomography may provide more information than actually needed. Our task may not require the computation of any feature of a quantum state but some more restricted class. Clearly, there is a resource-gain trade-off relation as further knowledge requires further resources, but one can get surprisingly far extracting interesting properties of the system while needing very little resources. The most significant development towards addressing this problem in recent times is due to Aaraonsons [28] breakthrough. Within it he describes a protocol dubbed "shadow tomography", wherein an exponentially sized list (of mean values) of binary observables on a quantum state of dimension d (mixed or otherwise) may be estimated to high precision using a measurement sample of O(log d) size.
The name is derived from the idea that one is not especially interested in the entire quantum state but rather its projections onto a fixed set of observables-the lower dimensional "shadows" of a quantum state. With this in mind, suppose we wish to estimate a set of M linear features {tr[ρE 1 ], .., tr[ρE M ]} with as few copies of ρ as possible. Rather surprisingly, shadow tomography shows that M can be exponentially large with only a polynomial resource overhead. This statement is certainly worthy of consideration given our original problem. The main result of Aaransons paper is the following theorem: (Aarronson [28]) Shadow tomography is solvable using only copies of the target state ρ where theÕ hides a polylog factor. The procedure is fully explicit.
The consequences of this should be readily apparent given the preceding review. A set of binary observables {E 1 , ..., E M } on an arbitrary quantum state can be estimated to within an absolute error with probability 1 − δ using a number of samples that grows logarithmically with the dimension and size of the estimated set. We direct interested readers to the original paper for proof of the above theorem and content ourselves here with answering why this does not immediately solve the problem of partial tomography. Though shadow tomography is theoretically efficient in most of the required categories, namely in terms of sample number, computational complexity and memory complexity, it unfortunately fails when considering the sophistication of the required measurements. The protocol requests joint measurements to be made on tensor products of the target state of a size −2 log d, which are repeatedly measured using carefully performed non-demolition measurements [94], themselves a difficult procedure to perform in experiments. It is worth noting that it is not shown that these resource demands are strictly required for the protocol and indeed this was not a stated goal of the work.
We review here two protocols that go beyond these limitations: selective quantum state tomography [29] and classical shadows [22,25]. The main emphasis here is on low-cost implementation and a universality property: we ask for the possibility of extracting on demand (a posteriori) arbitrary features (from a given class) of a quantum state from some kind of universal data record of moderate size. To illustrate this, suppose we wish for a protocol that allows for efficient estimation of a (finite) selection of observables from a continuous class -after our experiment is complete. On the surface this seems a monstrous request to make and one that can only begun to be fulfilled by a full state tomography. Ultimately, we shall see how this is done (for a class of bounded observables) with a cost that is completely dimension independent and requires a resource complexity that is log M for M different features (a linear cost in exponentially many). The general protocol is illustrated in Figure 4 and it reassembles the one defined in the introductory section with the difference being the possibility of re-using the same data (a universal record) to estimate on demand (a posteriori) a feature from some predefined (continuous) class of features. The protocol is described concretely in subsequent sections.

Selective quantum state tomography (SQST)
Our task now is to weaken the stringent requirements on the measurements required for shadow tomography while still being able to estimate many operators simultaneously. To begin, let us settle for simultaneous estimation of the unit operators A i j = | j〉〈i|, where i = 1 . . . d and d is arbitrarily large. The expectation values of these operators corresponds to the density operator element ρ i j . A naive one-by-one measurement strategy is obviously inefficient here, as estimation of another unit operator may then require an entirely new set of measurements resulting in the general cost growing with the dimension of the system d. On the other hand, if one estimates various functions from the same data sample, wherein each individual estimation is efficient in the sense of a Chernofflike bound (1), then we can ensure the accuracy of multiple estimations within the fixed overall error only at the logarithmic cost log M for M parallel estimations (this follows from a simple union bound [95] for multiple random variables). This point is the crux of the protocol -once a sufficient set of measurements have been generated for a universal data record (see Figure 4), any density matrix element ρ i j can be estimated on demand at guaranteed precision from identical data. To do this without the complexity of measurements demanded by shadow tomography requires the introduction of a special POVM based on mutually unbiased bases [96].
To construct the protocol, we shall first pick an adequate set of measurements. The set of all matrix units A i j forms a basis in the operator (Hilbert-Schmidt) space, thus the universal data record has to be constructed from an informationally complete POVM. The simplest and most practical choice is local measurements which are sufficient for information completeness in general but they are of limited applicability in the context of partial tomography [22]. Thus one needs entangled measurements in general, keeping in mind that these shall be of a low computation complexity (i.e., implementable via lowdepth quantum circuits).
The first such basis that springs to mind is one built from mutually unbiased bases (MUB)s. MUB sets are groups of orthogonal bases defined on a finite dimensional (of dimension d) Hilbert space. They hold the special property whereby any two basis elements |i, m〉 and | j, n〉 drawn from different bases -indexed as m and n -have a constant inner product | 〈i, m| j, n〉 | 2 = 1/d, ∀m = n. Here i, j = 1 . . . d index the basis elements, while n, m = 1 . . . d + 1 label the basis. While there are infinitely many complete MUBs for a given dimension, we are always free to apply a global unitary to each element of the set, transforming them into a another while maintain-  ing the inner product between elements. Due to this, we will always choose the m = 1 basis to be the computational basis and define the remaining bases in terms of this set with |α km l | = 1. The specific form of α km l is dependent on the dimension of the underlying Hilbert space, with different expressions for prime [97] and prime power [98] dimensions. To proceed we use a useful fact [98] about arbitrary operators A acting on the same space our MUB is defined upon, namely that with O (m) are constructed from the basis elements of the MUB such that Π (m) k = |k, m〉〈k, m|. The presented decomposition proofs information completeness of MUBs and we can define the corresponding POVM as{R (m k = Π (m) k /d} with k, m indexed as before.
A particularly critical example may be found in the matrix unit operators. Let A i j = | j〉〈i| with |i〉 defined in the computational basis and i = j. Their decomposition (40) adapted to the POVM elements reads Here η km i j = α km * i α km j , thus |η km i j | = 1 which is the crucial property. Since A i j = tr ρ A i j = ρ i j , measuring a particular operator element ρ i j amounts to estimating the expectation value of A i j . Given the decomposition above, the mean values A i j are equivalent to the expectation value of the random variable η (s) Practical implementation of this POVM amounts to randomly choosing one of d orthonormal basis sets (not including m = 1) to measure a copy of ρ in, each with probability 1/d of being selected. A tomography to estimate ρ i j would then proceed by the generation of N copies of ρ, each measured using this POVM. For each measurement outcome, indexed by s, we update an approximation to the above sum as the following estimator To be completely explicit, a selective quantum state tomography would proceed in experiment as follows: 1. Measure a copy of the quantum state ρ using the POVM defined by {R (m) k }, to get the measurement result (k, m). If we calculate the number of state copies N of ρ required for the estimator ρ i j to converge to ρ i j within some error and failure probability δ. Though η (s) i j is complex, we may still apply the usual concentration inequalities by considering η (s) i j as two bounded random variables such that | Re[η (s) i j and note that E[ρ i j ] = ρ i j . Following a concentration inequality approach we wish to compute the bound Pr ρ i j − E[ρ i j ] ≥ . First, we will isolate the real and complex components of the random variable η (s) i j . By the triangle inequality we have that From here, we may apply a standard Hoeffding's inequality for bounded random variables to each term individually to get We may then deduce the number of copies N = O( −2 log δ −1 ) required to estimate ρ i j with an error bound |ρ i j − ρ i j | < that occurs with probability greater than 1 − δ. This is in tandem with an O(N) complexity overhead in both the required memory and computation, given we need only to store the outcomes of each measurement and the summation may be computed piece-wise. For estimation of any ρ i j , we need also to account for the diagonal case i = j, something we neglect in the above formulation of SQST. Fortunately the estimation of the diagonal elements of ρ ii is straightforward. This stems from the fact that diagonal estimation of density operators is something of a simple case, achievable with measurement in the computational basis. For truly arbitrary estimation of the elements of a density operator we thus need to maintain two measurement records; one for the diagonal elements which gives the ρ ii directly, and another for the off diagonals ρ i j , both requiring N = O( −2 log δ −1 ) copies of the state. Finally, an additional factor must be included if multiple elements are ρ i j to be estimated, corresponding to M repetitions of step 4 in the experiment. This amounts to log M overhead which comes from the union bound resulting in N = O( −2 log δ −1 log M) repetition. Remarkably, this scaling is free of the dimension d.

Relation to full tomography and arbitrary observables
It is tempting to conclude that if one case efficiently estimate all individual elements of a density operator efficiently then one can estimate the density operator itself efficiently. This is true but only in a technical sense -while SQST will give a bounded error on individual elements with high probability, the overall error of the estimated quantum state in the usual metrics -namely trace distance -may be exponentially large. This comes from SQST estimation error being equiva-lent to the max norm ||E|| max := max i j |E i j | ≤ which is related related to the trace distance norm via This is rather unsurprising as anything else would imply a protocol that outperforms provably optimal full state tomography [28]. Of course, it is still possible to perform state tomography in the supremum norm. In a similar manner to maximum likelihood estimation, a semidefinite program may be constructed that yields positive semi definite solutions from the data record generated by SQST [29]. Though running such an optimisation program would not be computationally efficient, the required sample complexity for all d 2 elements remains efficient at log d 2 = 2 log d.
Another interesting point to investigate is the application of SQST to estimate mean values of observables going beyond matrix units | j〉〈i|.
Consider a general decomposition given in Eq. (40)

of an operator
where we intentionally separate decomposition into computational basis which gives diagonal matrix A 0 and the rest ofÃ with all 0 on the main diagonal. Furthermore, we restrict our attention to operators bounded in entrywise 1-norm ||A|| 1 = i j |a i j |, where a i j are matrix elements of A in the computational basis. Given ||A|| 1 bounded we have all elements |a i j | ≤ ||A|| 1 also bounded. As before, the estimation is broken into two stages: estimation of A 0 which is efficiently done in computational basis (since a ii are bounded) and estimation ofÃ which is performed by random sampling of MUBs (see previous section). The corresponding random variable a km is bounded, i.e., |a km | = |d tr A Π (m) k | ≤ d i j |a i j || 〈i, 1|k, m〉 〈k, m| j, 1〉 | ≤ ||A|| 1 , thus the efficiency of the estimation follows from the Hoeffding bound of Eq.
The previous analysis shows that operators bounded in entry-wise l 1 -norm can be efficiently estimated by the SQST procedure. However, these bounds are not optimal. To see this, suppose we simultaneously estimate the mean values of 4 n − 1 Pauli operators (excluding identity) A = σ 1 ⊗ ... ⊗ σ n , where σ k is one of the standard Pauli matrices. We have ||A|| 1 = d = 2 n , thus our previous analysis predicts a sample cost of N = O(4 n n), where the factor n ∼ log 4 n comes from the union bound. However, it is well known [99] that the set of 4 n −1 Pauli operators can be factored into 2 n +1 groups each composed of 2 n −1 commuting operators with their common eigenbases being MUBs. This means that a single MUB measurement can return all 2 n mean values (of commutative Paulis) at the cost of O(n) thus the estimation of all 4 n requires O(2 n n) copies (to measure all in MUBs). This scaling is known to be optimal [100]. Consequently, this is quadratically better than the estimation given by the norm || ·|| 1 analysis meaning that the derived bounds can be further improved. One way of doing this is to employ the Bernstein's inequality [101,102] which controls also the variance of the random variable thus leading to potentially better bounds. Another possibility to generically improve the scaling is to change the POVMs and type of estimator, e.g. instead of a simple linear estimator, one may use the median of means estimator [103]. This coincides with the next and final scheme in terms of sample complexity and is superior in terms of measurement complexity and efficiency for estimation of a general observable bounded in Frobenius norm. Along with SQST the next scheme called classical shadows [22,25] is an entirely new regime of partial tomography not previously possible.

Classical shadows
With shadow tomography suggesting the possibility of a sample-efficient universal algorithm and SQST demonstrating that a degree of generality can still be achieved with vastly simpler measurements, we close this review with the current state of the art in efficient quantum tomography. Considering again the protocol above, we defined an alternative scheme using a generalised measurement basis -the mutually unbiased bases, producing a partial tomography protocol that can construct many independent linear functions on a target state while remaining resource-efficient.
One now wonders why this was the case -a choice of unbiased bases as a first target for universal measurements is intuitive given that they form an informationally complete POVM and their very nature of containing minimal measurement bias, but they work unexpectedly well for an educated guess. A possible reason for this lies in a so-far unmentioned MUB property, namely that they form a t-design of degree two [104]. While a full description of t-designs is unnecessary here (see Ref. [105] for a complete treatment in the context of quantum mechanics), it is sufficient to understand that a quantum t-design is a probability distribution that approximates polynomial functions of order t over the complete distribution for some set. A simple (classical) example are the average of some polynomial function over the real sphere.
The relevance of this here is that such designs can be used to approximate the probability distributions of a generalised measurement basis. Higher order designs better reproduce the key properties of a distribution with a two design correctly producing the same expectation value and a three design correctly showing the same sample variance. The natural and immediate question is what do higher order t-designs yield? We clearly see from the Bernstein inequality that the variance of an observable plays a heavy role in terms of the efficiency of an estimator, so one may presume that a t-design that reproduces both the correct expectation value and variance of the approximated distribution will have improved performance again.
Coupled with a statistical trick known as the median-ofmeans [103], this is the strategy of Keung et al. [22] who show that through randomised Clifford measurements (a threedesign) they are able to estimate M observables at a number of samples that grows as with ||A|| max = max(||A 1 || 2 . . . , ||A L || 2 ) being the maximum two-norm (Frobenius) of the M observables to be estimated. Included within this bound are entanglement witnesses and fidelity estimation, both of which can be performed efficiently regardless of the system size. With regards to a two design, a three design (when coupled with sufficient statistical methods) is slightly more expensive in terms of gate complexity, requiring a cubic number of Clifford gates to achieve sufficient randomness over the Haar measure as compared to the linear cost of generating MUB measurements. Both may be considered computationally efficient however and one gains a powerful advantage when the use of a three-design Clifford measurement is allowed.

CONCLUDING REMARKS
In this work, we have reviewed recent approaches to answering queries of quantum states of increasing size, while avoiding an unacceptable overhead in resources. By first considering efficient tomography to be a series of queries that become exponentially unlikely to pass for all states excepting those that answer positively, we showed how this leads to hyper-efficient protocols. We demonstrated this through highperformance entanglement detection using a single copy of a quantum state; a counter-intuitive result for an estimation protocol. This was then extended by showing how the same protocol can be used for cluster states, a specific class of quantum state and the ground states of local Hamiltonian.
We proceeded to the case where a limited number of state copies is available, one can work in the few-copy regime and observe the presence of entanglement in the state with a protocol to translate any entanglement witness into a probabilistic framework. We showed that this scenario is well-suited for experimental implementations by reviewing an application to a photonic six-qubit cluster state. By demonstrating that the method provides the ability to detect quantum entanglement with very high confidence with only about hundreds of state copies, the extremely low requirements in terms of time and experimental resources were confirmed.
With experimental viability in mind, we gave a description of shadow tomography which set the stage for Selective Quantum State Tomography, showing how a special choice of POVM leads to the efficient estimation of a wide class of linear quantum functionals. This in turn leads to the current state of the art for partial tomography, a t-design based protocol using the classical shadows of a quantum state which leads to efficient estimation of an exceptionally large class of observables.
This high performance is most clearly seen in the context of possible partial tomographies performed; namely fidelity estimation (where the observable is another density operator), entanglement witnesses and entropies, correlation functions up to order two and the energies of many-body local Hamiltonians.
Beyond the methods presented in this review, it is fair and also worth mentioning novel techniques that instead employ machine learning to reduce the verification requirements. In fact, the use of machine learning for quantum applications is in general experiencing rapid progress and proving useful in tasks like entanglement detection using neural networks [106,107] or unsupervised learning [108], and quantum state tomography using neural networks [19]. It is also relevant that a comparable method (to SQST) for estimating elements of a density matrix exists in the continuous variable (CV) regime. Here, it is known that the estimation error depends directly on the energies [109,110], i.e., the estimation error for a matrix element ρ nm increases with n and m (n, m index the energy eigenstates). Notably, the same behaviour is not observed in SQST of discrete systems which forms a point of interest for developing tomographic strategies targeting CV systems.
Our main focus in this review was on sampling (in terms of measurement complexity) where the presented techniques exhibit a dimensional independence, a property that is crucial for real application. There are however a number of open questions that remain to be addressed in future work. In the context of entanglement detection one immediately realises that verification models tend to be tailored to detect entanglement in the vicinity of a target state which requires some prior knowledge of the state preparation. Which witnesses and corresponding verification procedure should one use then if there is no such prior knowledge? This is an open research topic and not many results may be found in the literature, owing to the difficult nature of this restriction. In such cases, one promising direction may be to use the method of so-called random correlations [111][112][113], which was developed for entanglement detection and try to incorporate it into the decision-theoretic framework presented here.
Another pressing issue is the assumption of "IIDness" (identical and independently distributed) samples which is highly questionable in the context of near term quantum devices given high error rates, source drifts and lack of control and manipulation. Our entanglement detection schemes surpass the IID limitation by employing random sampling techniques, but difficulties arise immediately at the next level of sophistication á la quantum state verification. One can mitigate this issue via conditional fidelities [27,114], but it remains an open question whether some nontrivial statements can be made about the full state produced by the source. A possible way out may be found in the de-Finetti reduction theorems [115], or with the help of entropy accumulation theorems [116,117] where resorting to permutational invariance is not allowed. Another option that may follow form our singlecopy framework is to fold all accessible ((non-IID) copies setting into a large single-copy and perform verification in a single-copy scenario. While this seems to be reasonable option, what remains to be clarified is: what is the class of states and properties that admit reliable single-copy verifi-cation/estimation? Our protocols reviewed here are the first steps towards answering this question. In this way, there is another conceptual issue to be addressed that concerns the operational meaning of physical quantities in a single-shot scenario.
A particularly pertinent open question, especially in the context of near term quantum devices is the trade-off between measurement complexity and the corresponding increase or decrease of efficiently estimable quantities. As noted in Refs. [22,29], the power of these techniques appears to be uniquely sourced from the choice of measurements performed. Specifically that they are two (in SQST) and three (in tomography) designs in t-design parlance [118]. In particular, when estimators in classical shadow tomography are constructed from local measurements only, i.e., a one-design, the performance of the scheme drops significantly. Such a question was considered in the original work of classical shadows [22] in the context of Pauli measurements, finding the complexity scaled unsurprisingly in the non-locality of the target observable. It also is something of the worst case scenario in that one is restricted to a fixed set of weak measurements. Instead one may introduce adaptability into the POVM implemented in the measurement phase of a scheme as was done by García-Pérez et al. [119]. Despite the optimisation introducing increased classical post-processing into the protocol, it does not compromise the circuit complexity of the POVM. It remains less powerful than a complete shadow tomography but demonstrates high performance on the limited but highly relevant class of variational quantum eigensolver (VQE) problems [120]. This is a well chosen compromise, since Clifford and MUB measurements are not trivial to implement owing to the inclusion of control operations between arbitrary subsystems, it is highly desirable to find similar reductions with perhaps different compromises being found for different problem instances. While certainly worth pursuing, this can be seen as equivalent to constructing POVMs that approximate a t-design of some order using a simpler set of generators that the Clifford group. Given that finding t-designs in the first place is already difficult, this is a challenging task.
With all these questions in mind, it appears that the time is nigh for an exciting new class of tomographic protocols, ones without the apparent drawbacks that have plagued state tomography since its inception allowing for direct probing of quantum systems in the NISQ technology regime and beyond.