ECON 320 Lab Exercise : Week 6 — Bigger $R^2$… and the Collinearity Trap 🧩¶

  • Name : [Your Name]

  • Lab Section: [Your Lab Section Here]

  • Code for week 6's attendance: [Attendance code here]

Please submit the exercise on Canvas in form of a HTML/PDF file.¶


Background¶

Ka Yan believes that including more variables always makes the model better because $R^2$ goes up. Today, you’ll use a real Wooldridge housing dataset to check that idea and see what can go wrong.

Instructions¶

  • Run the setup cell below to load the data (wooldridge.hprice1).
  • Keep answers short and clear. Aim to finish in 10–15 minutes.
In [4]:
!pip install wooldridge statsmodels pandas numpy --quiet 
import numpy as np, pandas as pd, statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor
import wooldridge as woo

# Load Wooldridge housing data
df = woo.data('hprice1').dropna().copy()
df.head()
Out[4]:
price assess bdrms lotsize sqrft colonial lprice lassess llotsize lsqrft
0 300.0 349.100006 4 6126.0 2438 1 5.703783 5.855359 8.720297 7.798934
1 370.0 351.500000 3 9903.0 2076 1 5.913503 5.862210 9.200593 7.638198
2 191.0 217.699997 3 5200.0 1374 0 5.252274 5.383118 8.556414 7.225482
3 195.0 231.800003 3 4600.0 1448 1 5.273000 5.445875 8.433811 7.277938
4 373.0 319.100006 4 6095.0 2514 1 5.921578 5.765504 8.715224 7.829630

Variable descriptions (from Wooldridge hprice1 see here):¶

  • price: house price (USD).
  • lotsize: lot size (square feet).
  • sqrft: house size (square feet of finished area).
  • bdrms: number of bedrooms.
  • assess: assessed value of the house (USD).
  • colonial: indicator for colonial‑style house (1 = colonial).
  • lprice, llotsize, lsqrft: natural logs of price, lotsize, sqrft.

Task (10–15 minutes)¶

Q1. Fit Ka Yan’s “everything” model and report fit¶

Use price as the outcome and include at least these regressors: lotsize, sqrft, bdrms.

  • Report R².
In [ ]:
# Put your answer here

Q2. Quick diagnostic: correlations & VIF (exclude const)¶

  • Print the pairwise correlation matrix for the regressors.
  • Compute VIF for each regressor.
In [5]:
# Put your answer here: Correlations among regressors


# Put your answer here: VIFs

Q3. Short write‑up¶

Answer briefly (2-3 sentences):

  1. Highest correlation pairs / highest VIFs.
  2. The single change you would made and why.
  3. What would you expect to happen to R² if you made that change? (Feel free to check in a code cell but it is not required!)
  4. What would you tell Ka Yan about her idea that “including more variables always makes the model better”?

Put your answer here:¶


End of Lab Exercise.