| Concept | Population | Sample |
|---|---|---|
| Description | Entire group of interest | Subset of the population |
| Mean | \(\mu\) | \(\bar{x}\) |
| Variance | \(\sigma^2\) | \(s^2\) |
| Proportion | \(p\) | \(\hat{p}\) |
Let \(X_1, X_2, \ldots, X_n\) come from distribution function \(f_X(x; \theta)\), if the following conditions are met:
Then the random variables \(X_1, X_2, \ldots, X_n\) are said to independent and identically distributed (iid) and a random sample.
If \(X_1, X_2, \ldots, X_n\) are iid of \(f_{X}(x; \theta)\), then
\[ f_{X_1, X_2, \ldots, X_n}(x_1, x_2, \ldots, x_n) = \prod^n_{i=1}f_{X_i}(x_i) \]
A statistic is a numerical value that describes a characteristic of a sample, (\(\theta\)).
For MATH 352:
> A statistic is any function of the sample data that does not depend on unknown population parameters.
Statistic Examples:
| Property | Meaning |
|---|---|
| Unbiasedness | \(E[\hat{\theta}] = \theta\) |
| Consistency | \(\hat{\theta_n} \to \theta\) as \(n \to \infty\) |
| Efficiency | Smallest variance among unbiased estimators |
| Sufficiency | Uses all information in the data about \(\theta\) |
Example: \(\bar{X}\) is an unbiased and consistent estimator of \(\mu\).
A sampling distribution is the probability distribution of a sample statistic (like \(\bar{X}\)) based on all possible random samples of a given size \(n\).
If: - Population mean = \(\mu\) - Population standard deviation = \(\sigma\) - Sample size = \(n\)
Then: - Mean of \(\bar{X}\): \(\mu_{\bar{X}} = \mu\) - Standard deviation (standard error): \(\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}\)
The idea of convergence concepts is to undertand how the statistics behave as \(n \rightarrow \infty\). As the sample size gets large, statistics generally behave as known distribution functions.
There are 3 types of distributions:
\(X_1, X_2, \ldots, X_n\) converges almost surely to a random variable \(X\) if, for every \(\epsilon > 0\),
\[ P(\lim_{n\rightarrow \infty} |X_n - X| < \epsilon ) = 1 \]
\(X_1, X_2, \ldots, X_n\) converges in a probability to a random variable \(X\) if, for every \(\epsilon > 0\),
\[ \lim_{n\rightarrow \infty} P(|X_n - X| < \epsilon ) = 1 \]
\(X_1, X_2, \ldots, X_n\) converges in distribution to a random variable \(X\) if,
\[ \lim_{n\rightarrow \infty} F_{X_n}(x_n) = F_{X}(x) \]
As the sample size \(n\) increases, the sample mean \(\bar{X}\) tends to get closer to the population mean \(\mu\).
\[ \bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i \to \mu \text{ as } n \to \infty \]
| Version | Type of Convergence | Description |
|---|---|---|
| Weak Law | In probability | \(\bar{X}_n\) becomes probably close to \(\mu\) |
| Strong Law | Almost surely | \(\bar{X}_n\) converges to \(\mu\) with probability 1 |
The Law of Large Numbers says that with enough data, sample proportions and means approximate population values.
Let \(X_1, X_2, \ldots, X_n\) be a sequence of iid random variables whose \(E(X_i) = \mu < \infty\) and \(Var(X_i) = \sigma^2 < \infty\). For \(\bar X_n = \frac{1}{n} \sum^n_{i=1} X_i\):
\[ \frac{\sqrt{n}(\bar X_n - \mu)}{\sigma} \rightarrow N(0,1) \]
as \(n\rightarrow \infty\).
\[ \bar{X} \overset{\cdot}{\sim} \mathcal{N}\!\left(\mu, \frac{\sigma^2}{n}\right) \]
For a sample proportion \(\hat p\) and sample size \(n\):
Normal approximation holds if:
\[ \hat p \overset{\cdot}{\sim}IQ N(p, \frac{p(1-p)}{n}) \]