Sampling and Central Limit Theorem

Sampling & Central Limit Theorem | CFA Level I Quantitative Methods

In this lesson, we’ll discuss sampling methods and the central limit theorem, essential concepts for making inferences about population parameters using sample statistics. We’ll cover:

Importance of Sampling and Estimation

While we often apply descriptive statistics to understand the entire population, studying the whole population may not always be practical or possible. In such cases, we select a sample of the population to obtain statistics, which serve as estimations of the true population parameters.

Simple Random Sampling

One way to fairly sample from a population is through simple random sampling. The goal is to ensure that each element has an equal probability of being selected. For example, an analyst could use a random number generator to select accounts from a large wealth management firm’s database.

3. Systematic Sampling

In some cases, a more systematic approach may be desired. One method is systematic sampling, which involves selecting every nth element from the population. For instance, an analyst could select 10% of the accounts by choosing accounts with the same last digit of the account number.

4. Stratified Random Sampling

When simple random sampling isn’t ideal, stratified random sampling might be a better option. The population is divided into subgroups based on one or more distinguishing characteristics, and random samples are taken from each subgroup in sizes proportional to that of the population.

EXAMPLE

A firm has 100 accounts with 3 different investment styles for its accounts: 20% low risk, 50% moderate risk, and 30% high risk. The analyst wants the sample to reflect the distribution of the investment styles. How can they do this?

Using stratified random sampling, the analyst would take 2 samples from the low-risk group, 5 samples from the moderate-risk group, and 3 samples from the high-risk group. This ensures the sample reflects the distribution of the investment styles of the entire population.

Cluster Sampling

Cluster sampling is based on subsets of a population, assuming each cluster is representative of the overall population. Clusters could be accounts managed by sub-branches or grouped by geographic location. There are two types of cluster sampling:

  • One-stage cluster sampling: Selecting a number of clusters and using all observations in those clusters as the sample.
  • Two-stage cluster sampling: Randomly selecting a subset of observations from the chosen clusters.

While cluster sampling can be more time- and cost-efficient, it may yield lower accuracy compared to other sampling methods.

Probability vs. Non-Probability Sampling

Probability sampling gives each population member an equal chance of being selected, creating a representative sample. In contrast, non-probability sampling relies on factors other than probability, such as cost and ease of access, or the researcher’s subjective judgment. Non-probability sampling methods include convenience sampling and judgmental sampling.

Convenience Sampling

Convenience sampling involves selecting samples based on the data’s accessibility. While data can be collected quickly at a low cost, the sample may not be representative, limiting sampling accuracy.

Judgmental Sampling

Judgmental sampling involves handpicking samples based on a researcher’s knowledge and professional judgment. This method allows researchers to target specific populations but may be affected by researcher bias, leading to skewed results.

EXAMPLE

When auditing financial statements, seasoned auditors use judgmental sampling to select accounts or transactions that provide sufficient audit coverage. Why might they choose this method?

Judgmental sampling is suitable in time-sensitive situations or when the researcher’s expertise is crucial. In this case, auditors can apply their judgment to efficiently examine accounts and transactions that are most relevant to the audit.

Remember: Although non-probability sampling methods can be more cost-effective and efficient, probability sampling typically yields more accurate and reliable results. Understanding the trade-offs between these methods is essential for effective statistical analysis in the world of finance.

Sampling Error and Sampling Distribution

When we sample from a population, we attempt to estimate the true population mean (μ) and standard deviation (σ) by calculating the sample mean () and standard deviation (s). The difference between the sample statistic and population parameter is the sampling error.

When we take multiple random samples from a population, we find that the sample statistic is a random variable with a probability distribution called the sampling distribution.

Central Limit Theorem

The Central Limit Theorem states that for simple random samples of size n from a population with mean μ and variance σ², the sampling distribution of the sample mean approaches a normal distribution with mean μ and variance σ²/n. As we increase n, the variance of the sampling distribution gets smaller, making the distribution narrower and our estimates more accurate.

This theorem is useful for hypothesis testing and confidence intervals, and it holds even if the population distribution is not normal, as long as the sample size is at least 30.

Applying the Central Limit Theorem

EXAMPLE

A stock has an average daily return of 0.18% and a standard deviation of returns of 0.95%. An analyst takes a sample of 30 random observations. What is the mean and standard deviation of the sample distribution?

Using the Central Limit Theorem:

EXAMPLE

An analyst takes 100 random samples of daily returns for another stock, finding a sample mean of 0.23% and a sample standard deviation of 1.19%. Calculate and interpret the mean and standard deviation of the sample distribution.

Mean of the sample distribution: 0.23%

Standard deviation of the sample distribution: 1.19% / √100 = 0.12%

This means that if we took all possible combinations of samples of size 100, the mean of the sample returns would be 0.23%, and the standard deviation would be 0.12%.

Remember these three key points regarding CLT:

  1. The sampling distribution will be approximately normal when the sample size is at least 30.
  2. The mean of the sampling distribution is equal to the mean of the population.
  3. The variance of the sampling distribution is equal to the population variance divided by the sample size.

That’s it for this lesson! In the next lesson, we’ll tackle the estimation problem. See you there!

✨ Visual Learning Unleashed! ✨ [Premium]

Elevate your learning with our captivating animation video—exclusive to Premium members! Watch this lesson in much more detail with vivid visuals that enhance understanding and make lessons truly come alive. 🎬

Unlock the power of visual learning—upgrade to Premium and click the link NOW! 🌟