# Standard Error & Confidence Interval

Mean is probably the most widely used point estimate, but it alone only conveys parts of the story. Another thing we would like to know is how good/accurate this estimate is. Standard error and confidence interval (CI) can be used to answer this question.

## §Multiple sampling

Given a population, one would create M samples of size N, and calculate the mean for each sample (over N sample points). With this M means, one can calculate its mean and standard deviation; this mean is the mean estimate of the population, and the standard deviation is the standard error. (Yes, standard error is a form of standard deviation.) With this M means, one can calculate the confidence interval using percentile method, identifying the interval containing 95% data points in the middle basically.

## §Bootstrapping single sample

Creating M samples could be too expensive to be practical though, so one alternative is to use bootstrapping to simulate sampling M times from a single sample. Once we have M samples, we can follow the same procedure above.

## §Matlab/Octave Code

Code for two approaches is shown below. The population follows normal distribution (0, 1), so `std_1`

, the standard
deviation of the first sample, is ~1. For samples of size 100, standard error is ~0.1; for samples of size 400, standard
error is ~0.05. Such relation is not accidental; standard error decreases by `1/sqrt(N)`

. Since both standard error and
confidence interval reflects the accuracy, they share the same relation: CI length decreases by `1/sqrt(N)`

as well. The
intuition here is that standard error/CI tracks how far the estimate is from the true value, and larger sample size (N)
gives us a better estimate, i.e. smaller standard error/CI.

1 | % just to convince octave that this is a local function |

Output is:

1 | multiple sampling |

## §Quiz: How about M? Does increasing M affect the standard error/CI?

Yes and no. Increasing M means we use more samples, so we should get better estimate, but estimate of what? Larger M
means we get *better estimate of the estimate of the true value*, in contrast to *better estimate of the true value*.
The latter is what we are after, and it’s tracked by standard error.

In the script above, we just used `M=10000`

, much larger than `N`

, to ensure the estimate of the estimate is good
enough.