Good evening everybody,

First of all, I would like to thank you all for the feedback you provided me with on Facebook and personally. I understand that this kind of reviews have been quite appreciated, even for people who are actually only considering taking the CFA in the future.

I decided to skip the part of Quantitative Finance dedicated to probabilistic concepts and well-known probability distribution, as I reckon that it would be too basic to really be of any interest in here.

However, I though that the “Sampling & Estimation” and “Hypothesis Testing” chapters were good refreshers and could be of some value for some of you. I’ll start with the first one and carry on another day with the second.

# Sampling & Estimation

## The Central Limit Theorem

The key concept to understand for these topics is the Central Limit Theorem (CLT) which is well known to anybody who basically went through high-school. Let’s restate it for the sake of completeness of the post:

For a simple random samples of size \(n\) from a population with mean \(\mu\) and variance \(\sigma^2\), the sampling distribution of the sample mean \(\bar{x}\) approaches a normal probability distribution with mean \(\mu\) and variance \(\frac{\sigma^2}{n}\) as the sample size become large.

Or, more formally, let \(S_n = \frac{1}{n} \sum_{i=1}^n X_i\), then

$$S_n \quad \overset{n \to \infty}{\longrightarrow_d} \quad \mathcal{N}(\mu,\frac{\sigma^2}{n})$$

In the CFA, they assume that \(n\) is large enough when \(n \geq 30 \).

Note that the CLT is valid for any population probability distribution.

# Estimating the mean of a population with a sample

In the exam, you will be using the CLT to compute confidence intervals of population parameter estimates.

The main parameter you will have to estimate is the mean of a population using a given sample. The way to compute an estimate of the mean is to compute the sample mean as follows:

$$ \bar{x}=\frac{1}{n} \sum_{i=1}^n x_i $$

If you were to have several samples from a population, and you computed several estimates of the mean, the distribution of the different sample mean would be \(\bar{x} \sim \mathcal{N}(\mu,{\sigma_{\bar{x}}}^2)\) where \(\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}\). This assume that you know the population variance \(\sigma^2\). If you do not know the variance of the population, you can compute the standard error of the sample mean \(s_{\bar{x}} = \frac{s}{\sqrt{n}}\) where \(s=\sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i – \bar{x})^2}\) is the sample standard deviation.

To compute the confidence interval of a mean estimator you use the following formula:

$$\text{point estimate} \pm \text{reliability factor} \times \text{standard error} (1)$$

The point estimate is actually your estimate of the mean \(\bar{x}\). The standard error is computed as explained above. The reliability factor depends on the level of confidence \({\alpha}\) of the confidence interval with level \(\alpha\) (\(CI_\alpha\)) you wish to compute.

When the variance of the population is known, the reliability factor is taken as the point of the standard normal distribution \(z \sim \mathcal{N}(0,1)\) where the value below the curve is equal to half the level of confidence \(z_{\alpha/2}=\Phi^{-1}(\alpha/2)\). The level of confidence is divided by 2 because you take half of the confidence for each side of the curve. Looking back at the formula to compute the the confidence interval, you will see that it is easy to understand. Basically, you take your estimate as the central point of confidence interval, and the you use the standard normal distribution to determine how much of the values below the curve on each side you need to take, and you scale it with the standard deviation of the mean estimates.

When the population variance is unknown, instead of using the standard normal distribution, you would use the Student-t distribution. This distribution has a single parameter, the degrees of freedom \(\text{df}\). Basically, the Student-t distribution looks like a standard normal distribution but with fatter tails. As \(\text{df} \longrightarrow \infty\), the student-t distribution evolves towards the standard normal distribution. So, when you do not know the variance of the population, the reliability factor is a strudent-t variable with \(n-1\) degrees of freedom : \(t_{\alpha/2}\).

So the only thing that is difficult here is basically to know when to use the right reliability factor. This depends on whether the population variance is known, whether its distribution is known and whether the number of observations was large enough. The right strategy is summarized below:

Population distribution | \(n < 30\) | \(n \geq 30\) |

Normal distribution with known variance | z-statistic | z-statistic |

Normal distribution with unknown variance | t-statistic | t-statistic |

Unknown distribution with known variance | \(\emptyset\) | z-statistic |

Unknown distribution with unknown variance | \(\emptyset\) | t-statistic |

Finally, it’s important to note that when \(n \geq 30\), you could possibly always use the z-statistic, but when variance is unknown, using t-statistic is safer.

I’ll be back with the review over “Hypothesis Testing” soon.

Thanks for reading!