- Metadata::
# Overview of Pure Exploration

# Overview of Structured Normal Means Inference

# High Level Motivation

## Interactive Structured Normal Means

## Fixed Confidence Pure Exploration

- Currently the existing lower bounds in the bandit literature use so-called [oracle lower bounds]([[Oracle Lower Bound]]) in order to give lower bounds for various pure exploration problems.
- These lower bounds are called oracle lower bounds because they describe the sampling behavior of an oracle which has knowledge of the underlying instance, $$\nu$$ and as such can sample so as to differentiate between the instance $$\nu$$ and every instance $$\tilde\nu \in \text{Alt}(\nu) = {\tilde \nu \in \mathcal S: a^*(\nu) \neq a^*(\tilde \nu)}$$.
- These idea behind these lower bounds were initially proposed in On the Complexity of Best Arm Identification in Multi-Armed Bandit Models
- The underlying idea is the following "change-of-measure" argument
- Let $$\nu, \tilde \nu$$ be two bandit models with $$K$$ arms such that for all $$i$$, the distributions $$\nu_i$$ and $$\tilde \nu_i$$ are mutually absolutely continuous. For any almost-surely finite stopping time $$\sigma$$ with respect to $$(\mathcal F_t)$$,
- $$\sum_{i = 1}^{K}\mathbb E_{\nu}[N_a(\sigma)] KL(\nu_i, \tilde\nu_i) \geq \sup_{\mathcal E \in \mathcal F_\sigma} d(\mathbb P_{\nu}(\mathcal E), \mathbb P_{\tilde \nu}(\mathcal E))$$

- where $$d(x, y)$$ is the binary relative entropy with the convention that $$d(0, 0) = d(1, 1) = 0$$.

- Let $$\nu, \tilde \nu$$ be two bandit models with $$K$$ arms such that for all $$i$$, the distributions $$\nu_i$$ and $$\tilde \nu_i$$ are mutually absolutely continuous. For any almost-surely finite stopping time $$\sigma$$ with respect to $$(\mathcal F_t)$$,
- Using this lemma, we can prove, for example, a lower bound for the fixed-confidence Best Arm Identification problem:
- Min-Max Form:
- $$\mathbb E_{\nu, \text{Alg}}[\tau] \geq T^*(\nu)d(\delta, 1 - \delta)$$
- $$\left(T^*(\nu)\right)^{-1} = \sup_{w \in \Sigma_K}\inf_{\tilde \nu \in \text{\text{Alt}}(\nu)} \left(\sum_{i = 1}^{K} w_i \text{KL}(\nu, \tilde \nu)\right)$$

- $$\mathbb E_{\nu, \text{Alg}}[\tau] \geq T^*(\nu)d(\delta, 1 - \delta)$$
- Optimization Form:
- $$\min_{\tau \in \mathbb R^{n}} \sum_{i = 1}^{K} \tau_a$$
- $$\text{subject to } \inf_{\tilde \nu \in \text{Alt}(\nu)} \tau_a \text{KL}(\nu_i, \tilde \nu_i) \geq d(\delta, 1 - \delta)$$

- "The above proposition says that the expected sample complexity $$\mathbb E_{\nu, \text{Alg}}[\tau]$$ is lower bounded by the following, non-adaptive experiment design problem: minimize the total number of samples $$\sum_{\tau_a}$$ subject to the constraint that these samples can distinguish between a null hypothesis $$H_0 = \nu$$ versus any alternative hypothesis $$H_1 = \tilde \nu$$ for every $$\tilde \nu \in \text{Alt}(\nu)$$ [sic.]"

- Min-Max Form:

- This style of lower bound is very powerful in that it can be applied to a wide range of pure exploration problems to obtain reasonable lower bounds which are asymptotically valid -- there exist algorithms whose asymptotic sample complexity matches the sample complexity of the oracle lower bound.
- The solution to the oracle lower bound problem (see here) is extensively analyzed in Sample Complexity of Partition Identification Using Multi-Armed Bandits for more general pure exploration problems beyond Best Arm Identification

- However, as a general rule of thumb, these lower bounds can be misleading in a non-asymptotic regime when there is sufficient structure in the problem. This is because the oracle can take advantage of this structure to implement unrealistic sampling strategies. We provide some examples where this phenomena might occur
**Example 1:**Constrained Best Arm Identification**Example 2:**Threshold Crossing

## Fixed Budget Pure Exploration

### Overview of Existing Results