Interaction Effects and Polynomials in Multiple Linear Regression
The goals of this page include:
Explain what polynomial and interaction effects are in OLS regression
Demonstrate how to automatically create polynomial and interaction terms with python
Examine whether interaction effects need to be added to a multiple OLS model
Gauge the effect of adding interaction and polynomial effects to OLS regression
Adding interaction terms to an OLS regression model may help with fit and accuracy because such additions may aid the explanation of relationships among regressors. For example, the sale price of a house may be higher if the property has more rooms. Higher property taxes may also suggest higher prices for housing. However, the price of housing may increase more dramatically when the property has more rooms in areas where taxes are also higher suggesting more affluent neighborhoods (perhaps!).
When adding polynomials and interaction terms into an OLS model, the resulting model is still linear. How is that possible? Take a look below. The equation has an interaction term following the intercept and the two main effects.
If we re-express the interaction as a new variable and the coefficient as a new coefficient, the original equation can be re-written as below, which is a linear regression model.
The resulting model is clearly linear:
Similarly, polynomial terms can be used in OLS regression, and the resulting model will still be linear. For example, the equation below has second and third degree polynomials.
We can substitute the polynomial terms with the following:
The resulting regression equation is clearly a linear model once again.
We now established that multiple linear regression models can accommodate the inclusion of interaction effects and polynomials. We can demonstrate this with a three-dimensional plot and its two-dimensional contour plot.
The surface of the regression is clearly not linear. Still, a regression model with linear parameters will always be linear, even if its generated surface is not. The interaction means that the effect produced by one variable depends on the level of another variable. The plot shows that the impact is a function of both x1 and x2. Further, a quadratic model could take a dome shape , but the value of the regression coefficients may produce a wide array of shapes: but it is still a linear model!
Let’s see how this works with the Boston Housing data. First, the data needed to be loaded and partitioned. Most of the work here was completed with the training data.
The following section automatically creates polynomial features and interactions. In fact, all combinations were created! Notice that it is possible to create only interactions and not polynomials but I wanted to do both. This needs to be completed for both the training and test regressors. In this section, I kept all original regressors and the first interaction/polynomial term, which happens to be CRIM^2. Finally, it was necessary to re-index the outcome variable in the training data because the data manipulations of the regressors automatically re-set the index of those data frames. Omitting the re-setting of the Y_train index will result in an error. Later, the whole step will be done in a loop, I just simply wanted to see if the code works.
There are 91 combinations of interaction and second degree polynomials in this data. The idea is to place each one of 91 together with the individual regressors (there are 12 of them) one-by-one, fit a model and record statistics about the model. I chose adjusted R squared, which increases only if the newly added regressor improves the model than decreases when the added regressor improves the model less than expected by chance.
The 91 resulting models can be sorted based on their adjusted R squared score. We can then observe, which interaction effect enhances the model, and which are not helping.
The following code takes each of the 91 potential interaction and polynomial terms in a loop, uses statsmodels, calculates and prints all adjusted R squared values. Unfortunately, the printed data does not contain any labels, but I first created an array and transformed it into a data frame.
Since the individual names of interaction/polynomial terms were available in the original data frame after creating them, I saved them into a data frame and concatenated it with the adjusted R squared values, which can now be sorted and plotted.
This graph shows that the adjusted R squared drops quickly until 0.74 and levels off with lower values. Note that the original adjusted R squared of the model without interaction terms was 0.730. Therefore, it is worth considering terms that result in adjusted R squared above 0.74. There aren’t a lot of those!
Now that we identified which terms to consider for fitting a model, we can see how these terms effect the OLS model collectively.
The overall adjusted R squared increased to 0.833, and the AIC/BIC both decreased compared to a model without interactions. The original model’s AIC and BIC was 2432 and 2488 respectively. So now the real work can start by sorting out multicollinearity and play with the model for a better fit and improved accuracy. Still, it seems that introducing interaction and polynomial terms was really worth the effort. As they say, if you torture data, it will confess!