Hypothesis Testing

Linear and Generalized Linear Models

Statistical Inference

Statistical Inference
Hypothesis Testing
Decision Making
Model Inference
Confidence Intervals
Power Analysis
Model Conditions

What is Statistical Inference?

Drawing conclusions about a population based on a sample
Population = entire group
Sample = subset

Two Main Types of Inference

Estimation
Hypothesis Testing

Estimation

Point Estimate: Single best guess (e.g., \(\hat \beta_1\))
Interval Estimate: Range likely to contain the true value

Hypothesis Testing

\(H_0\): No effect or difference
\(H_1\): Some effect or difference
We use sample data to support or reject \(H_0\)

Key Concepts and Tools

Sampling Distribution
Central Limit Theorem
Standard Error

p-values

Probability of observing data as extreme as this if \(H_0\) is true
Misinterpretation of p-values is common.
Emphasize: low p-value means data is unusual under \(H_0\).

Confidence Intervals

A range where we expect the true value to fall

Hypothesis Testing

Statistical Inference
Hypothesis Testing
Decision Making
Model Inference
Confidence Intervals
Power Analysis
Model Conditions

Hypothesis Tests

Hypothesis tests are used to test whether claims are valid or not. This is conducted by collecting data, setting the Null and Alternative Hypothesis.

Null Hypothesis \(H_0\)

The null hypothesis is the claim that is initially believed to be true. For the most part, it is always equal to the hypothesized value.

Alternative Hypothesis \(H_1\)

The alternative hypothesis contradicts the null hypothesis.

Example of Null and Alternative Hypothesis

We want to see if \(\beta\) is different from \(\beta^*\)

Null Hypothesis	Alternative Hypothesis
\(H_0: \beta=\beta^*\)	\(H_1: \beta\ne\beta^*\)
\(H_0: \beta\le\beta^*\)	\(H_1: \beta>\beta^*\)
\(H_0: \beta\ge\beta^*\)	\(H_1: \beta<\beta^*\)

One-Side vs Two-Side Hypothesis Tests

Notice how there are 3 types of null and alternative hypothesis, The first type of hypothesis (\(H_1:\beta\ne\beta^*\)) is considered a 2-sided hypothesis because the rejection region is located in 2 regions. The remaining two hypotheses are considered 1-sided because the rejection region is located on one side of the distribution.

Null Hypothesis	Alternative Hypothesis	Side
\(H_0: \beta=\beta^*\)	\(H_1: \beta\ne\beta^*\)	2-Sided
\(H_0: \beta\le\beta^*\)	\(H_1: \beta>\beta^*\)	1-Sided
\(H_0: \beta\ge\beta^*\)	\(H_1: \beta<\beta^*\)	1-Sided

Hypothesis Testing Steps

State \(H_0\) and \(H_1\)
Choose \(\alpha\)
Compute confidence interval/p-value
Make a decision

Rejection Region

The rejection region is the set of all test statistic values that lead to rejecting \(H_0\).
It’s defined by a significance level (\(\alpha\)) — the probability of rejecting \(H_0\), when it’s actually true.

Rejection Region

Decision Making

Statistical Inference
Hypothesis Testing
Decision Making
Model Inference
Confidence Intervals
Power Analysis
Model Conditions

Decision Making

Hypothesis Testing will force you to make a decision: Reject \(H_0\) OR Fail to Reject \(H_0\)

Reject \(H_0\): The effect seen is not due to random chance, there is a process contributing to the effect.

Fail to Reject \(H_0\): The effect seen is due to random chance. Random sampling is the reason why an effect is displayed, not an underlying process.

Decision Making: Test Statistic

\(\phi\) known

\[ ts = \frac{\hat\beta_j - \beta_j}{\mathrm{se}(\hat\beta_j)} \sim N(0,1) \]

\(\phi\) unknown

\[ ts = \frac{\hat\beta_j-\beta_j}{\mathrm{se}(\hat\beta_j)} \sim t_{n-p} \]

Decision Making: P-Value

Two-Sided Test

\[ P(T > |ts|) = \int^\infty_{ts} f(t) dt + \int_{-\infty}^{ts} f(t) dt \]

One-Sided Test \[ P(T > ts) = \int^\infty_{ts} f(t) dt \]

OR \[ P(T < ts) = \int_{-\infty}^{ts} f(t) dt \]

Rejection Region

Decision Making: P-Value

The p-value approach is one of the most common methods to report significant results. It is easier to interpret the p-value because it provides the probability of observing our test statistics, or something more extreme, given that the null hypothesis is true.

If \(p < \alpha\), then you reject \(H_0\); otherwise, you will fail to reject \(H_0\).

Significance Level \(\alpha\)

The significance level \(\alpha\) is the probability you will reject the null hypothesis given that it was true.

In other words, \(\alpha\) is the error rate that a researcher controls.

Typically, we want this error rate to be small (\(\alpha = 0.05\)).

Model Inference

Statistical Inference
Hypothesis Testing
Decision Making
Model Inference
Confidence Intervals
Power Analysis
Model Conditions

Model Inference

Model Inference is the act of conducting a hypothesis test on the entire model (line). We do this to determine if the fully explained model is significantly different from the smaller models or average.

Model inference determines if more variation is explained by including more predictors.

Model inference

We will conduct model inference to determine if different models are better at explaining variation. Both Linear and Logistic Regression have techniques to test different models.

Full and Reduced Model

Full Model

\[ Y = \beta_0 + \beta_1 X_1 + \cdots + \beta_p X_p \]

Reduced Model

\[ Y = \beta_0 + \beta_1 X_1 \]

Null and Alt Hypothesis

\(H_0\): The fully-parameterized model does not explain more variation than the reduced model.

\(H_a\): The fully-parameterized model does explain more variation than the reduced model.

Confidence Intervals

Statistical Inference
Hypothesis Testing
Decision Making
Model Inference
Confidence Intervals
Power Analysis
Model Conditions

Confidence Intervals

A confidence interval gives a range of plausible values that captures population parameter.
It reflects uncertainty in point estimates from sample data.

CI: Formula

\[ PE \pm CV \times SE \]

\(PE\): Point estimate (\(\hat \beta\))
\(CV\): Critical Value \(P(X > CV) + P(X < -CV) = \alpha\)
\(SE\): Standard Error of \(\hat \beta\)

\[ (LB = PE - CV \times SE, UB = PE + CV \times SE) \]

Interpretation

“We are 95% confident that the true mean lies between A and B.”

This does not mean there’s a 95% chance the mean is in that interval.
It means: if we repeated the sampling process many times, 95% of the intervals would contain the true value.

CI Plot

Factors Affecting CI Width

Sample size (\(n\)): larger \(n\) → narrower CI
Standard deviation (\(s\) or \(\sigma\)): more variability → wider CI
Confidence level: higher confidence → wider CI

Decision Making: Confidence Interval Approach

The confidence interval approach can evaluate a hypothesis test where the alternative hypothesis is \(\beta\ne\beta^*\). The confidence interval approach will result in a lower and upper bound denoted as: \((LB, UB)\).

If \(\beta^*\) is in \((LB, UB)\), then you fail to reject \(H_0\). If \(\beta^*\) is not in \((LB,UB)\), then you reject \(H_0\).

Power Analysis

Statistical Inference
Hypothesis Testing
Decision Making
Model Inference
Confidence Intervals
Power Analysis
Model Conditions

What is Statistical Power

Statistical Power is the probability of correctly rejecting a false null hypothesis.
In other words, it’s the chance of detecting a real effect when it exists.

Why Power Matters

Low power → high risk of Type II Error (false negatives)
High power → better chance of finding true effects
Common threshold: 80% power

Errors in Inference

Type I	Reject \(H_0\) when true	False positive
Type II	Don’t reject \(H_0\) when false	False negative
Power	\(1 - P(\text{Type II})\)	Detecting a true effect

Type I Error (False Positive)

Rejecting \(H_0\) when it is actually true
Probability = \(\alpha\) (significance level)

Type II Error (False Negative)

Failing to reject \(H_0\) when it is actually false
Probability = \(\beta\)
Power = \(1 - \beta\)

Balancing Errors

Lowering \(\alpha\) reduces Type I errors, but increases risk of Type II errors.
To reduce both:
- Increase sample size
- Use more appropriate statistical tests

What Affects Power?

Effect Size
- Bigger effects are easier to detect
Sample Size (\(n\))
- Larger samples reduce standard error
Significance Level (\(\alpha\))
- Higher \(\alpha\) increases power (but riskier!)
Variability
- Less noise in data = better power

Boosting Power

Power = Probability of rejecting \(H_0\) when it’s false
Helps avoid Type II Errors
Driven by:
- Sample size
- Effect size
- \(\alpha\)
- Variability
Aim for 80% or higher

Model Conditions

Statistical Inference
Hypothesis Testing
Decision Making
Model Inference
Confidence Intervals
Power Analysis
Model Conditions

Model Conditions

When we are conducting inference with regression models, we will have to check the following conditions:

Linearity
Independence
Probability Assumption
Equal Variances
Multicollinearity (for Multi-Regression)

Linearity

There must be a linear relationship between both the outcome variable (y) and a set of predictors (\(x_1\), \(x_2\), …).

Independence

The data points must not influence each other.

Probability Assumption

The model errors (also known as residuals) must follow a specified distribution.

Linear Regression: Normal Distribution
Logistic Regression: Binomial Distribution

Equal Variances

The variability of the data points must be the same for all predictor values.

Residuals

Residuals are the errors between the observed value and the estimated model. Common residuals include

Raw Residual
Standardized Residuals
Jackknife (studentized) Residuals
Deviance Residuals
Quantized Residuals

Influential Measurements

Influential measures are statistics that determine how much a data point affects the model. Common influential measures are

Leverages
Cook’s Distance

Raw Residuals

\[ \hat r_i = y_i - \hat y_i \]

Residual Analysis

A residual analysis is used to test the assumptions of linear regression.

QQ Plot

A qq (quantile-quantile) plot will plot the estimated quantiles of the residuals against the theoretical quantiles from a normal distribution function. If the points from the qq-plot lie on the \(y=x\) line, it is said that the residuals follow a normal distribution.

Residual vs Fitted Plot

This plot allows you to assess the linearity, constant variance, and identify potential outliers. Create a scatter plot between the fitted values (x-axis)

Penguins: Example

Heart: Example