Tuesday, October 20, 2009

Psuedo R^2

So you built your fantastic logistic regression models, presented your ORs, you coefficients, analyzed the residuals, etc., and the review panel comes back asking "Where are your R-squared values?" Now you could argue that R2 isn't a logical statistic for logistic regression, but arguing with review boards is difficult and frustrating, so the best thing may be to just submit to their claims.

There are a number of "Pseudo R-Squared" measures that use the log-likelihood to calculate a fit statistic. One of them, Nagelkerke's, is provided in the Design library by Frank Harrell, in a function called lrm.

You can use this function in pretty much the same manner as glm, but if you already have a number of glm models built and don't want to change them, you can run lrm on the models to get the statistics

library(Design)
set.seed(11)
out = rbinom(1000,1,0.5)
pred1 = rnorm(1000,0,1)
pred2 = rnorm(1000,out,1)
dat = data.frame(o=out,p1=pred1,p2=pred2)
model = glm(o~p1+p2,family=binomial(link=logit),data=dat)
m2 = lrm(model)
m2$stats

The best plan is probably to use the lrm function from the start, but if you're a die-hard glm fan then this is a quick way to satisfy those R-Squared'ers