Probit and Complementary Log-Log Models for Binary Regression
Introduction to Alternatives to Logit Models:
The logit model is only one of many methods for fitting a regression model with a binary dependent variable. Two other models are also worth discussing: the probit model and the complementary log-log model. The goal of this short blog is to compare them with logit, which was discussed at Binary Logistic Regression (Click for more).
Differences in Distribution:
The observed variable y was classified as 1 or 0 depending on z score being above or below a threshold value:
Logit: The errors have a standard logistic distribution
Probit: The errors have a standard normal distribution
Complementary Log-Log:The errrors have a standard extreme value-distribution or double-exponential distribution
Probit Function:
A normal distribution has a mean of 0 and a standard deviation of 1. A standard normal variable has a cumulative distribution function. Take a look at this link. For every value of a variable , the table provides the probability that the value of a variable is less than that. The inverse of the cumulative distribution function is the probit transformation. While the probabilities range between o and 1, the probit function ranges between negative infinity and infinity.
When fitting a binary regression model, the probit and logit models will closely resemble each other and will likely provide similar findings. The logit model’s exponentiated coefficients can be interpreted as odds ratios, while the probit model may have an advantage when using multiple binary regressors in an analysis.
Complementary Log-Log Function:
The function is widely used in survival analysis. A major difference between the c log-log model and logit or probit models is that the c log-log model is asymmetrical, while the other two are symmetrical. This feature is especially important when fitting Cox-regression models that uses proportional hazards.
A more in-depth discussion on the complementary log-log function is available from the University of Alberta by clicking here. I also recommend Paul Allison’s Logistic Regression Using SAS for the explanation portion of both Probit and Complementary Log-Log Functions.
Comparison of Outputs:
While in prior work I extensively used Sci-Kit Learn, here I wanted to use the glm function of the statsmodels package so that the link functions can be specified (especially for complementary log-log). Logit and probit link functions are available as single models as well. I still used sklearn for partitioning the data.
The Logit Model:
The Probit Model:
The Complementary Log-Log Model: