Linear Regression

Learning Objectives

Matrix Formulation
Multiple Linear Regression
Model Assumptions

Matrix Formulation

Matrix Version of Model

\[ Y_i = \boldsymbol X_i^\mathrm T \boldsymbol \beta + \epsilon_i \]

\(Y_i\): Outcome Variable
\(\boldsymbol X_i=(1, X_i)^\mathrm T\): Predictors
\(\boldsymbol \beta = (\beta_0, \beta_1)^\mathrm T\): Coefficients
\(\epsilon_i\): error term

Data Matrix Formulation

For \(n\) data points

\[ \boldsymbol Y = \boldsymbol X^\mathrm T\boldsymbol \beta + \boldsymbol \epsilon \]

\(\boldsymbol Y = (Y_1, \cdots, Y_n)^\mathrm T\): Outcome Variable
\(\boldsymbol X=(\boldsymbol X_1, \cdots, \boldsymbol X_n)^\mathrm T\): Predictors
\(\boldsymbol \beta = (\beta_0, \beta_1)^\mathrm T\): Coefficients
\(\boldsymbol \epsilon = (\epsilon_1, \cdots, \epsilon_n)^\mathrm T\): Error terms

Least Squares Formula

\[ (Y - \boldsymbol X ^\mathrm T\boldsymbol \beta)^\mathrm T(Y - \boldsymbol X ^\mathrm T\boldsymbol \beta) \]

Estimates

\[ \hat{\boldsymbol \beta} = (\boldsymbol X ^\mathrm T\boldsymbol X)^{-1}\boldsymbol X ^\mathrm T\boldsymbol Y \]

Multiple Linear Regression

MLR

Multivariable linear regression models are used when more than one explanatory variable is used to explain the outcome of interest.

Continuous Variable

To fit an additional continuous random variable to the model, we will only need to add it to the model:

\[ Y = \beta_0 +\beta_1 X_1 + \beta_2 X_2 \]

Categorical Variable

A categorical variable can be included in a model, but a reference category must be specified.

Fitting a model with categorical variables

To fit a model with categorical variables, we must utilize dummy (binary) variables that indicate which category is being referenced. We use \(C-1\) dummy variables where \(C\) indicates the number of categories. When coded correctly, each category will be represented by a combination of dummy variables.