From Ordinary Least Squares (OLS) to Generalized Least Squares (GLS)

4 minute read

Published: December 17, 2024

Ordinary Least Squares (OLS) is one of the most widely used methods for linear regression. It provides unbiased estimates of the model parameters under the assumption that the error terms are independent and identically distributed (i.i.d.) with constant variance. However, real-world data often violate these assumptions. When the errors exhibit heteroskedasticity (non-constant variance) or correlation, OLS estimates remain UNBIASED (see this post) but lose their efficiency, leading to incorrect standard errors and confidence intervals.

Generalized Least Squares (GLS) is a more flexible estimation technique designed to handle cases where error terms have non-constant variance and/or correlation (see references for examples). By modeling the structure of the error covariance, GLS yields parameter estimates with better efficiency and correct inference.

Ordinary Least Squares (OLS)

Model Setup

Consider the classic linear regression model: \(\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon},\)

where:

\(\mathbf{y}\) is an \(n \times 1\) vector of observations.
\(\mathbf{X}\) is an \(n \times p\) design matrix of explanatory variables (including an intercept).
\(\boldsymbol{\beta}\) is a \(p \times 1\) vector of unknown parameters.
\(\boldsymbol{\varepsilon}\) is an \(n \times 1\) vector of error terms.

OLS assumptions:

Linearity: The relationship between predictors and outcome is linear.
Full rank: \(\mathbf{X}\) has full column rank (no perfect multicollinearity).
Exogeneity: \(\mathbb{E}[\boldsymbol{\varepsilon}|\mathbf{X}] = \mathbf{0}\).
Spherical errors: \(\text{Var}(\boldsymbol{\varepsilon}|\mathbf{X}) = \sigma^2 \mathbf{I}\).

Under these assumptions, the OLS estimator is: \(\hat{\boldsymbol{\beta}}_{\text{OLS}} = (\mathbf{X}^\top \mathbf{X})^{-1}\mathbf{X}^\top \mathbf{y}.\)

OLS gives unbiased and minimum variance linear unbiased estimators (the Gauss-Markov theorem famously states that OLS is BLUE: Best Linear Unbiased Estimator) when the error terms are homoskedastic and uncorrelated. However, if the error variance is not constant (heteroskedasticity) or the errors are correlated (autocorrelation), the OLS estimator is still unbiased but no longer BLUE.

The Need for GLS

When the assumption \(\text{Var}(\boldsymbol{\varepsilon}|\mathbf{X}) = \sigma^2 \mathbf{I}\) is violated, the covariance matrix of the errors is more general:

\[\text{Var}(\boldsymbol{\varepsilon}|\mathbf{X}) = \mathbf{\Sigma} \neq \sigma^2 \mathbf{I},\]

where \(\mathbf{\Sigma}\) is an \(n \times n\) positive-definite matrix capturing both the variance and covariance structure of the error terms.

Heteroskedasticity implies that the diagonal elements of \(\mathbf{\Sigma}\) are not all equal. Correlated errors imply that off-diagonal elements of \(\mathbf{\Sigma}\) are non-zero. In these situations, OLS is no longer efficient, and inference (standard errors, confidence intervals) based on OLS is invalid.

Generalized Least Squares (GLS)

GLS addresses these issues by incorporating the error covariance structure into the estimation. The GLS estimator is derived by a transformation that accounts for \(\mathbf{\Sigma}\):

\[\hat{\boldsymbol{\beta}}_{\text{GLS}} = (\mathbf{X}^\top \mathbf{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^\top \mathbf{\Sigma}^{-1}\mathbf{y}.\]

Key Points About GLS

Weighting by \(\mathbf{\Sigma}^{-1}\):
GLS reweights both the design matrix \(\mathbf{X}\) and the response vector \(\mathbf{y}\) using \( \mathbf{\Sigma}^{-1}\). This is intuitive: observations with higher variance receive less weight, and correlated observations are adjusted to remove the effect of that correlation.
Efficiency Gains:
If \(\mathbf{\Sigma}\) is correctly specified, GLS yields a more efficient estimator than OLS. The GLS estimator is unbiased and has lower variance, making better use of the available information.
Feasible GLS (FGLS):
In practice, \(\mathbf{\Sigma}\) is rarely known. Instead, it must be estimated, often in two steps:
- First, fit an OLS model and obtain residuals.
- Second, use these residuals to estimate \(\mathbf{\Sigma}\).
Using the estimated \(\hat{\mathbf{\Sigma}}\) in place of \(\mathbf{\Sigma}\) in the GLS formula yields the Feasible GLS (FGLS) estimator:
\[\hat{\boldsymbol{\beta}}_{\text{FGLS}} = (\mathbf{X}^\top \hat{\mathbf{\Sigma}}^{-1}\mathbf{X})^{-1}\mathbf{X}^\top \hat{\mathbf{\Sigma}}^{-1}\mathbf{y}.\]
Although FGLS is slightly more complex and can involve iterative procedures, it is commonly used and often yields better inference than OLS when heteroskedasticity or serial correlation is present.

When to Use GLS

Note that some of our research work are based on this observation when modeling driving behaviors, i.e., time series.

Heteroskedasticity: If error variances differ across observations (e.g., modeling residuals from cross-sectional data where variance depends on certain factors), GLS helps obtain consistent standard errors.
Autocorrelation: In time-series or panel data where observations are dependent over time, GLS adjusts for correlation in the error terms and provides more accurate inference.

Summary

OLS is the foundational regression method but assumes homoskedastic and uncorrelated errors.
GLS generalizes OLS by incorporating a known or estimated covariance structure of the errors, improving efficiency and inference when OLS assumptions are violated.
FGLS is a practical approach to implementing GLS when the exact error covariance structure is not known but can be estimated from the data.

In essence, GLS expands the classical linear modeling toolbox, enabling analysts and researchers to handle more realistic scenarios where errors are not independent and identically distributed.

Chengyuan Zhang, Wenshuo Wang, and Lijun Sun* (2024). Calibrating Car-Following Models via Bayesian Dynamic Regression. Transportation research part C: emerging technologies. (Accepted to ISTTT25 Special Issue) [TR PartC] [arXiv] [code] [presentation] [slides]
Chengyuan Zhang and Lijun Sun* (2023). Bayesian Calibration of the Intelligent Driver Model. IEEE Transactions on Intelligent Transportation Systems. [IEEE TITS] [arXiv] [code] [presentation] [poster]

Share on

Twitter Facebook LinkedIn

Chengyuan Zhang

From Ordinary Least Squares (OLS) to Generalized Least Squares (GLS)

Ordinary Least Squares (OLS)

Model Setup

The Need for GLS

Generalized Least Squares (GLS)

Key Points About GLS

When to Use GLS

Summary

Share on

You May Also Enjoy

Fundamental Probabilistic Graphical Models

Hidden Markov Model and Driving Behavior Modeling: From HMMs to Factorial HMMs to FHMM–IDM — a three–part primer

Introduction to Autoregressive (AR) Processes

A Detailed Introduction to Gaussian Velocity Fields (GVF) Based on Gaussian Processes

Chengyuan Zhang

Ordinary Least Squares (OLS)

Model Setup

The Need for GLS

Generalized Least Squares (GLS)

Key Points About GLS

When to Use GLS

Summary

My publications related to GLS

Share on

You May Also Enjoy

Fundamental Probabilistic Graphical Models

Hidden Markov Model and Driving Behavior Modeling: From HMMs to Factorial HMMs to FHMM–IDM — a three–part primer

Introduction to Autoregressive (AR) Processes

A Detailed Introduction to Gaussian Velocity Fields (GVF) Based on Gaussian Processes