Standard Error & Confidence Interval
Mean is probably the most widely used point estimate, but it alone only conveys parts of the story. Another thing we would like to know is how good/accurate this estimate is. Standard error and confidence interval (CI) could be used to answer this question.
¶Multiple sampling
Given a population, one would create M samples of size N, and calculate the mean for each sample (over N sample points). With this M means, one can calculate its mean, and standard deviation; this mean is the mean estimate of the population, and the standard deviation is the standard error. (Yes, standard error is a form of standard deviation.) With this M means, one can calculate the confidence interval using percentile method, identifying the interval containing 95% data points in the middle basically.
¶Bootstrapping single sample
Creating M samples could be too expensive to be practical, so one alternative is to use bootstrapping to simulate sampling M times from a single sample. Once we have M samples, we could follow the same procedure above.
¶Matlab/Octave Code
Code for two approaches is shown below. The population follows normal distribution (0, 1), so std_1
, the standard deviation of the first sample is
~1. For samples of size 100, standard error is ~0.1; for samples of size 400, standard error is ~0.05. The relation is not accidental; standard error
decreases by 1/sqrt(N)
. Since both standard error and confidence interval reflects the accuracy, they share the same relation; CI length decreases
by 1/sqrt(N)
.
1 | % just to convince octave that this is a local function |
Output is:
1 | multiple sampling |