next up previous contents
Next: Autoregressive (AR) Method Up: Bootstrap Methods Previous: Delete-d jackknife

Bootstrap

Bootstrap is the most recently developed method to estimate errors and other statistics. It requires the much greater power that modern computers can provide.

The term ``bootstrap'' derives from the phrase ``to pull oneself up by one's bootstrap'' (Adventures of Baron Munchausen, by Rudolph Erich Raspe)

Example. Consider a sample $\mathbf{x} = (x_1, x_2, \ldots x_N)$, in which xi is drawn from an empirical distribution $\hat{F}$. There are NN possible samples, called the ideal bootstrap samples.

Consider an simple case when N = 2. The original sample $\mathbf{x} = (x_1, x_2)$ yields 22 = 4 ideal bootstrap samples: $\mathbf{x}^{*1} = (x_1, x_1), \mathbf{x}^{*2} = (x_1, x_2), \mathbf{x}^{*3} = (x_2, x_1), \mathbf{x}^{*4} = (x_2, x_2)$.

However, getting all ideal bootstrap samples becomes unrealistic as N becomes a large number and the computational tasks are incredibly heavy. Therefore we normally use the Monte Carlo approach.

The bootstrap estimate of standard error is the standard deviation of the bootstrap replications:  
 \begin{displaymath}
 \widehat{se}_{\text{boot}} = \{ \sum_{b = 1}^{B} 
 [s(\mathbf{x}^{*b}) - s(\cdot)]^2 /
 (B - 1) \}^{1/2}\end{displaymath} (5)
where $s(\cdot) = \sum_{b=1}^B s(\mathbf{x}^{*b}) / B$.

Comparing (3) with (5), one can find that the factor in the jackknife's s.e. formula is roughly N times larger. This is called the inflation factor. The reason is that, unlike bootstrap samples, jackknife samples are very similar to the original sample and therefore the difference between jackknife replications is small. One can consider the special case when $\hat{\theta} = \bar{x}$ and verify (3).

Suppose $s(\mathbf{x})$ is the mean $\bar{x}$. In this case, standard probability theory tells us that as B gets very large, formula (5) approaches

\begin{displaymath}
\{ \sum_{i=1}^{n} (x_i - \bar{x})^2 / n^2 \}^{1/2}\end{displaymath}

The bootstrap algorithm for estimating standard errors (see Figure 2):

1.
Select B independent bootstrap samples $\mathbf{x}^{*1}, \mathbf{x}^{*2}, \cdots, \mathbf{x}^{*B}$, each consisting of n data values drawn with replacement from $\mathbf{x}$.
2.
Evaluate the bootstrap replication corresponding to each bootstrap sample

\begin{displaymath}
\hat{\theta}^{*} (b) = s(\mathbf{x}^{*b}), \;\;\; b = 1, 2, \cdots, B.
 \end{displaymath}

3.
Estimate the s.e. $se_F (\hat{\theta})$ by the sample standard deviation of the B replicates

\begin{displaymath}
\widehat{se}_B = \{ \sum_{b=1}^B [ \hat{\theta}^*(b) - \hat{\theta}(\cdot)]^2
 / (B - 1) \}^{1/2}
 \end{displaymath}

where $\hat{\theta}^*(\cdot) = \sum_{b=1}^B \hat{\theta}^*(b) / B$.


  
Figure 2. The bootstrap algorithm.

Other properties of bootstrap:

The payoff for heavy computation:

Example 1: Diurnal variation of TIPP. Figure 3 shows the diurnal variation of trans-ionospheric pulse pairs (TIPPs) detected by the Blackbeard instrument aboard the ALEXIS spacecraft. The data are six-hour running averages centered on each hour. The s.e. of the mean calculated by the bootstrap method shows the statistical significance of diurnal variation.

  
Figure 3. Diurnal variation of TIPP detection. Data are shown for (a) central Africa, (b) Indonesia, and (c) North America (From Zuelsdorf et al. [1998]).

Example 2: Error estimation in the mininum variance analysis. Table 1 presents the errors of nx, ny and Bn, in the minimum variance analysis problem, where $\mathbf{n}$ is the minimum variance unit vector. The magnetic field data set are artificially generated and the corresponding error according to the formulation is denoted as the ``true'' error. Next to the true errors are the error estimates determined by the bootstrap method. The two columns on the right are the error estimates by Sonnerup [1971] and its modified version given by Kawano and Higuchi [1995]. The bootstrap error estimates are in best agreement with the ``true errors''.

 
 
Table 1: Error estimation in the minimum variance analysis (From Kawano and Higuchi [1995]).
  ``true'' errors bootstrap error estimates Sonnerup's error estimates [1971] modified version of Sonnerup [1971]
nx 0.013 0.014 0.030 0.035
ny 0.047 0.047 0.101 0.119
Bn 1.9 2.0 4.1 4.8

Bootstrap software in S or S-PLUS is available via ftp at lib.stat.cmu.edu if the username statlib is given.


next up previous contents
Next: Autoregressive (AR) Method Up: Bootstrap Methods Previous: Delete-d jackknife