Mean is probably the most widely used point estimate, but it alone only conveys parts of the story. Another thing we would like to know is how good/accurate this estimate is. Standard error and confidence interval (CI) can be used to answer this question.

§Multiple sampling

Given a population, one would create M samples of size N, and calculate the mean for each sample (over N sample points). With this M means, one can calculate its mean and standard deviation; this mean is the mean estimate of the population, and the standard deviation is the standard error. (Yes, standard error is a form of standard deviation.) With this M means, one can calculate the confidence interval using percentile method, identifying the interval containing 95% data points in the middle basically.

§Bootstrapping single sample

Creating M samples could be too expensive to be practical though, so one alternative is to use bootstrapping to simulate sampling M times from a single sample. Once we have M samples, we can follow the same procedure above.

§Matlab/Octave Code

Code for two approaches is shown below. The population follows normal distribution (0, 1), so std_1, the standard deviation of the first sample, is ~1. For samples of size 100, standard error is ~0.1; for samples of size 400, standard error is ~0.05. Such relation is not accidental; standard error decreases by 1/sqrt(N). Since both standard error and confidence interval reflects the accuracy, they share the same relation: CI length decreases by 1/sqrt(N) as well. The intuition here is that standard error/CI tracks how far the estimate is from the true value, and larger sample size (N) gives us a better estimate, i.e. smaller standard error/CI.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
% just to convince octave that this is a local function
1;

function ret = ci(x)
bootstrap_x = mean(empirical_rnd(x, max(size(x)), 10000))';
ret = prctile(bootstrap_x, [2.5, 97.5]);
end

disp("multiple sampling")
for sample_size = [100, 400]
sample_means = [];
for i = 1:10000
x = randn(sample_size, 1);
if i == 1
std_1 = std(x)
mean_1 = mean(x)
end
sample_means(i) = mean(x);
end
mean_estimate = mean(sample_means)
standard_error = std(sample_means)
my_ci = ci(sample_means)
ci_len = my_ci(2) - my_ci(1)
end

disp("bootstrapping from a single sample")
for sample_size = [100, 400]
x = randn(sample_size, 1);
std_1 = std(x)
mean_1 = mean(x)
bootstrap_x = mean(empirical_rnd(x, sample_size, 10000))';
mean_estimate = mean(bootstrap_x)
standard_error = std(bootstrap_x)
my_ci = ci(bootstrap_x)
ci_len = my_ci(2) - my_ci(1)
end

Output is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
multiple sampling
std_1 = 0.94194
mean_1 = -0.026550
mean_estimate = 0.0019642
standard_error = 0.10011
my_ci =

-0.000054469 0.003909210

ci_len = 0.0039637
std_1 = 0.96719
mean_1 = 0.070110
mean_estimate = -0.00021029
standard_error = 0.049619
my_ci =

-0.00118529 0.00073309

ci_len = 0.0019184
bootstrapping from a single sample
std_1 = 0.98712
mean_1 = -0.16471
mean_estimate = -0.16467
standard_error = 0.097553
my_ci =

-0.16661 -0.16278

ci_len = 0.0038315
std_1 = 0.93220
mean_1 = 0.050283
mean_estimate = 0.050874
standard_error = 0.046385
my_ci =

0.049971 0.051774

ci_len = 0.0018032

§Quiz: How about M? Does increasing M affect the standard error/CI?

Yes and no. Increasing M means we use more samples, so we should get better estimate, but estimate of what? Larger M means we get better estimate of the estimate of the true value, in contrast to better estimate of the true value. The latter is what we are after, and it’s tracked by standard error.

In the script above, we just used M=10000, much larger than N, to ensure the estimate of the estimate is good enough.

§Reference