0.1 General Expression

$$y_{i}=\beta_{0}+\beta_{1}\times x_{i1}+\cdots+\beta_{p}\times x_{ip}+\epsilon_{i},\quad i=1,2,\cdots,n$$ $$ \begin{align*} \mathbf{y}&=(y_{1},y_{2},\cdots,y_{n})^{T} \cr \mathbf{X}&=\begin{bmatrix}1 & x_{11} & x_{12} & \cdots & x_{1p} \cr 1 & x_{21} & x_{22} & \cdots & x_{2p} \cr \vdots & \vdots & \vdots & \vdots & \vdots \cr 1 & x_{n1} & x_{n2} & \cdots & x_{np} \end{bmatrix} \cr \mathbf{\beta}&=(\beta_{0},\beta_{1},\cdots,\beta_{p})^{T} \cr \mathbf{\epsilon}&=(\epsilon_{1}, \epsilon_{2},\cdots,\epsilon_{n})^{T} \end{align*} $$

0.2 OLS Assumptions

  • The regression model is parametric linear.
  • ${x_{i1},x_{i2},\cdots,x_{ip}}$ are nonstochastic variables.
  • $E(\epsilon_{i})=0$.
  • $Var(\epsilon_{i})=\sigma^{2}$.
  • ${\epsilon_{i}}$ are independent random variables, so as to say: no autocorrelation, $cov(\epsilon_{i},\epsilon_{j})=0,i\neq j$.
  • The regression model is set correctly, without setting bias.

0.3 OLS Estimators

0.3.1 Estimators of $\hat{\beta}$

Formally, the OLS estimator of $\beta$ is defined by the minimizer of the residual sum of squares (RSS): $$\hat{\mathbf{\beta}}=arg\ min_{\beta}\ S(\mathbf{\beta})$$ $$S(\mathbf{\beta})=(\mathbf{y}-\mathbf{X\beta})^{T}(\mathbf{y}-\mathbf{X\beta})=\sum\limits_{i=1}^{n}(y_{i}-\beta_{0}-\beta_{1}\times x_{i1}-\cdots-\beta_{p}\times x_{ip})^{2}$$ Derive it we can get: $$\hat{\mathbf{\beta}}=(\mathbf{X^{T}X})^{-1}\mathbf{X^{T}y}$$

0.3.2 Properties of OLS estimators:

  • Linearity: the OLS estimators are linear estimators (linear functions of $\mathbf{y}$);
  • Unbiasedness: $E(\hat{\mathbf{\beta}})=\mathbf{\beta}$;
  • Consistent: $\hat{\mathbf{\beta}}\mathop{\rightarrow}\limits^{P}\mathbf{\beta}$ as $n\rightarrow \infty$;
  • The OLS estimators are Best Linear Unbiased Estimators (BLUEs): they have smallest variance among all linear unbiased estimators (Gauss-Markov Theorem).

0.3.3 Estimators of $\sigma^{2}$

An unbiased estimator of $\sigma^{2}$ is the residual mean squared error (MSE), which is defined as: $$ \begin{align*} s^{2} &= \frac{1}{n-(p+1)}\sum\limits_{i=1}^{n}e_{i}^{2} \cr &= \frac{1}{n-(p+1)}\sum\limits_{i=1}^{n}(y_{i}-\hat{y_{i}})^{2} \cr &= \frac{1}{n-(p+1)}(\mathbf{y-\hat{y}})^{T}(\mathbf{y-\hat{y}}) \cr &= \frac{1}{n-(p+1)}(\mathbf{y-X\hat{\beta}})^{T}(\mathbf{y-X\hat{\beta}}) \end{align*} $$

0.3.4 Standard Errors

The variance-covariance matrix of $\hat{\mathbf{\beta}}$ is: $$ Var(\hat{\mathbf{\beta}})=\sigma^{2}(\mathbf{X^{T}X})^{-1} $$ Since $\sigma^{2}$ is unknown, we replace it by $s^{2}$ to obtain its (computable) estimate: $$ \hat{Var}(\hat{\mathbf{\beta}})=s^{2}(\mathbf{X^{T}X})^{-1} $$ The standard error of $\beta_{i},i=1,2,\cdots,p$ is the square root of the $i+1_{th}$ diagonal element of $\hat{Var}(\hat{\mathbf{\beta}})$.

0.3.5 Confidence Intervals

$100(1 − α)%$ confidence intervals for $\beta_{i},i=1,2,\cdots,p$: $$ \hat{\mathbf{\beta}}\ \pm\ t_{n-p-1,1-\frac{\alpha}{2}}\times s.e.(\hat{\beta_{i}}) $$

0.3.6 Hypotheses Testing

Question: Is $x_{i}$ important for explaining / predicting $y$?

Form a hypothesis $H_{0}:\ \beta_{i}=0$ vs. $H_{1}:\ \beta_{i}\neq0$.

T-test: $$ t-ratio=\frac{\hat{\mathbf{\beta}}{i}-0}{s.e.(\hat{\mathbf{\beta{i}}})}\mathop{\sim}\limits^{H_{0}}t(n-p-1) $$ Reject $H_{0}$ if p-value of the test is small ($e.g.< 0.05$).

0.3.7 F Test

Question: $$ H_{0}: \beta_{1}=\beta_{2}=\cdots=\beta_{p}=0 $$ $$vs.$$ $$ H_{1}:\text{at least one }\beta_{i}\text{ is non-zero} $$ $$ F=\frac{(\sum\limits_{i=1}^{n}(y_{i}-\overline{y})^{2} - \sum\limits_{i=1}^{n}(y_{i}-\hat{y_{i}})^{2})/p}{(\sum\limits_{i=1}^{n}(y_{i}-\hat{y_{i}})^{2})/(n-p-1)}\mathop{\sim}\limits^{H_{0}}F(p,n-p-1) $$ This is called the analysis of variance (ANOVA).

0.3.8 Variability Partition

$$ Total\ SS=Error\ SS+Regression\ SS $$ $$ \sum\limits_{i=1}^{n}(y_{i}-\overline{y})^{2}=\sum\limits_{i=1}^{n}(y_{i}-\hat{y_{i}})^{2}+\sum\limits_{i=1}^{n}(\hat{y}_{i}-\overline{y})^{2} $$ $$ R^{2}=\frac{Regression\ SS}{Total\ SS} $$ $0\leq R^{2}\leq1$, the larger the better.

To consider the influence of model complexity, add the degree of freedom into consideration – adjusted $R^{2}$: $$ \overline{R}^{2}=1-(1-R^{2})\frac{n-1}{n-k} $$

Now we have: $$ F=\frac{(\text{Total SS }-\text{ Error SS})/p}{(\text{Error SS})/(n-p-1)}=\frac{R^{2}/p}{(1-R^{2})/(n-p-1)} $$ Degree of freedom:

SSDegree
ESSp
RSSn-p-1
TSSn-1

0.3.9 Predicting

Given $\boldsymbol{x} = \boldsymbol{x}^{*} \mathop{=}\limits^{def} (x_{1}^{*}, \cdots, x_{p}^{*})^\top$, what value would $y$ take?

Point prediction: $$ \hat{y}^{*} = \hat{\beta}_0 + \hat{\beta}_1\times x^{*}_1 + \cdots + \hat{\beta}_p\times x^{*}_p $$

$100(1 − \alpha)%$ prediction interval: $$ \hat{y}^{*}\pm t_{n-2,1- \frac{\alpha}{2}}\times s.e.(pred) $$ $$ s.e.(pred)=s\sqrt{1+(\mathbf{x}^{*})^{T}(\mathbf{X^{T}X})^{-1}\mathbf{x}^{*}} $$

0.4. Collinearity

0.4.1 Bad influences

  • Larger values of OLS estamators’ variance and standard error.
  • Wider confidence interval.
  • Not significant t-value.
  • Higher $R^{2}$ but not all t values are significant.
  • Not robust, sensitive to the small change of data.

0.4.2 Diagnoses

  • Higher $R^{2}$ but the number of significant t values is small.
  • High correlation between variables.
  • Partial correlation coefficient.
  • Subsidiary or auxiliary regression, and test $R^{2}_{i}$.
  • Variance inflation factor VIF: $VIF=\frac{1}{1-R^{2}_{i}}$.

0.4.3 Solutions

  • Delete some variables.
  • More new data.
  • Reset model.
  • Variable transformation.
  • Factor analysis / principal component analysis / ridge regression / LASSO

0.5. Heteroscedasticity / unequal variance

More frequent in corss-sectional data (due to the existence of scale effect).

0.5.1 Bad influences

  • OLS estimators are still linear.
  • OLS estimators are still unbiased.
  • OLS estimators’ variance are not the smallest, so as to say they are no longer effective.
  • The variance of OLS estimators are biased, which is the result of a biased estimation of $\hat{\sigma}^{2}$.
  • The hypothesis test based on t-test or F-test is no longer reliabel.

0.5.2 Diagnoses

  • Graph of residuals.
  • Pake test
    • OLS regression to get residuals.
    • Compute the square of residuals, and compute their ln.
    • Regression : $\ln e_{i}^{2}=B_{1}+B_{2}\ln X_{i}+v_{i}$ for every variable or for $\hat{Y}_{i}$.
    • Test zero hypothesis: $B_{2}=0$, which is equal to no heteroscedasticity.
    • If we can’t reject zero hypothesis, $B_{1}$ can be seen a give value of equal variance.
  • Glejser test
    • it is similar to Pake test, but has three regressions.
    • $|e_{i}|=B_{1}+B_{2}X_{i}+v_{i}$.
    • $|e_{i}|=B_{1}+B_{2}\sqrt{X_{i}}+v_{i}$.
    • $|e_{i}|=B_{1}+B_{2}(\frac{1}{X_{i}})+v_{i}$
    • If all $B_{2}=0$, accept the hypothesis that no unequal variance.
  • White’s general test of heteroscedasticity
    • For $Y_{i}=B_{1}+B_{2}X_{2i}+B_{3}X_{3i}+u_{i}$
    • OLS regression to get $e_{i}$
    • Regression $e_{i}^{2}=A_{1}+A_{2}X_{2i}+A_{3}X_{3i}+A_{4}X_{2i}^{2}+A_{5}X_{3i}^{2}+A_{6}X_{2i}X_{3i}+v_{i}$, so as to say regress $e_{i}^{2}$ for all original variables, higher powers of variables, cross terms of variables
    • Compute this regression’s $R^{2}$. Then $n\cdot R^{2}\sim\chi^{2}_{k-1}$, $k=6$ in this case.
    • Zero-hytothesis: no unequal variance.

0.5.3 Solutions

  • $\sigma^{2}_{i}$ is known
    • Weighted Least Squares (WLS), divide the original regression function by $\sigma_{i}$.
  • $\sigma_{i}^{2}$ is unknown
    • While $E(u_{i}^{2})=\sigma^{2}X_{i}$ or $E(u_{i}^{2})=\sigma^{2}X_{i}^{2}$, divide the original regression function by $\sqrt{X_{i}}$ or $X_{i}$. (Still WLS, $u_{i}$ is the error of original regression).
    • These methods are also called variance stabilizing transformations.
    • Standard error and t-value after White heteroscedasticity adjusted.

0.6. Autocorrelation

0.6.1 Bad Influences

  • OLS estimators are still linear and unbiased.
  • OLS estimators are no longer effective.
  • The variance of OLS estimators are biased.
  • The hypothesis test based on t-test or F-test is no longer reliabel.
  • The variance of erros $\hat{\sigma}^{2}$ is biased (usually downside biased).
  • $R^{2}$ is not reliabel.
  • The prediction variance and std are not effective.

0.6.2 Diagnoses

  • Graph.
  • Durbin-Watson d test: $$d=\frac{\sum\limits_{t=2}^{n}(e_{t}-e_{t-1})^{2}}{\sum\limits_{t=1}^{n}e_{t}^{2}}$$
    • Requests:
      • Regression model includes intercept.
      • $X$ are nonstochastic.
      • Error term $u_{i}$ follows: $u_{t}=\rho u_{t-1}+v_{t}\quad -1\leq\rho\leq1$
      • $\rho$ is called coefficient of autocorrelation. This equation is called Markov first-order autoregressive scheme, deneted as $AR(1)$.
      • Variables doesn’t include the lag term of $Y$ (not autoregressive models).
    • Large sample – $d\approx2(1-\hat{\rho}),\hat{\rho}=\frac{\sum\limits_{t=2}^{n}e_{t}e_{t-1}}{\sum\limits_{t=1}^{n}e_{t}^{2}}$. So $0\leq d\leq4$.
    • $\hat{\rho}\rightarrow -1\text{ (negative correlation)},d\rightarrow 4$.
    • $\hat{\rho}\rightarrow 0\text{ (no correlation)},d\rightarrow2$.
    • $\hat{\rho}\rightarrow 1\text{ (positive correlation)},d\rightarrow 0$.
    • There are two critical value $d_{L},d_{U}$.
DB d test

0.6.3 Solutions

  • GLS
    • Suppose error term follows $AR(1)$: $u_{t}=\rho u_{t-1}+v_{t}$.
    • OLS regress $Y_{t}^{*}=B_{1}^{*}+B_{2}^{*}X_{t}^{*}+v_{t}$.
    • $Y_{t}^{*}=Y_{t}-\rho Y_{t-1}$, the others are similar.
    • This method is called Generalized Least Squares, GLS, this equation is called Generalized Difference Equation.
    • The first instance is lost in this difference equation, we can transform it using this fomular (Prais-Winsten transformation):
    • $Y_{1}^{*}=\sqrt{1-\rho^{2}}(Y_{1}),X_{1}^{*}=\sqrt{1-\rho^{2}}(X_{1})$.
    • The estimation of $\rho$.
      • $\rho=1$: first-order difference method, suppose error term are positively correlated.
      • Estimate $\rho$ from Durbin-Watson d statistic. $d\approx2(1-\hat{\rho})\Rightarrow\hat{\rho}\approx1-\frac{d}{2}$.
      • Estimate $\rho$ from OLS residuals $e_{t}$: $e_{t}=\hat{\rho}e_{t-1}+v_{t}$.
  • Large sample method: Newey-West method
    • Also HAC std. It doesn’t advise the values of OLS estimators, but just advise their stds.