A Hitchhiker's Guide to Linear Models

Monday, September 18th, 2017

I've been fascinated with linear mixed effects modeling since first hearing of it in Baseball Prospectus's article Moving Beyond WOWY introducing their "caught strikes above average" metric for evaluating catcher pitch framing. As an engineering physics student and electric vehicle engineer, I had never run into this technique more common to social science. Unfortunately, most resources on the internet were full of terrible econometrics jargon and impenetrable notation. Even my two favorite stats textbooks — Jim Pitman's Probability and Larry Wasserman's ironically titled All of Statistics — made no mention of mixed models (or for that matter ANOVA). 

Only after some serious internet sleuthing did I feel like I had more than a cursory understanding, piecing together pieces of information from a number of disparate sources. So I set out on writing down what I had learned all in one place in LaTex (with one notation), and in the process found myself expanding the scope towards something closer to the chapter of a textbook. I finally reached a good stopping point today, deriving the linear mixed effects estimator I had originally set out to explain, so I figured I'd publish what I have so far as the "first edition."

The result is a work-in-progress resource I've titled A Hitchhiker's Guide to Linear Modeling, which you can find in pdf form here. My aim is to precisely and concisely cover the key techniques in use today, cutting through as much discipline-specific jargon as possible in the process. Here's what I've covered so far:

  • Least squares
  • Computing least squares
  • Least norm
  • Best Linear Unbiased Estimator (BLUE)
  • Confidence Intervals
  • Generalized Least Squares
  • Least Squares Regression
  • Studentized Residuals
  • Influential Observations
  • Ridge Regression
  • Unobserved Variables
  • Fixed Effects
  • Random Effects
  • Mixed Effects

I'm planning to next cover:

  • The Bayesian Approach
  • Generalized Linear Models
  • Maximum Likelihood
  • Estimating Unknown Variances in Linear Mixed Models
  • Markov Chain Monte Carlo Methods for Linear Mixed Models
  • Generalized Linear Mixed Models

Other potential future topics:

  • Errors in variables methods
  • Robust regression
  • Instrumental variables
  • Feasible Generalized Least Squares or other methods robust to serial correlation and heteroscedasticity
  • Scoring (AIC, BIC, cross-validation)

That said, I'd also love your feedback. Is this useful? What topics would you also like to see covered? How many errors have I made at this point? Would it be more useful to mention tools and include code?

If you have any interest in a reference for linear modeling, enjoy!


Questions | Comments | Suggestions

If you have any feedback and want to continue the conversation, please get in touch; I'd be happy to hear from you! Feel free to use the form, or just email me directly at matt.e.fay@gmail.com.

Using Format