📘 ECON 320 Lab Problem Set 2¶
Name : [Your Name]
Lab Section: [Your Lab Section Here]
Please submit the exercise on Canvas in form of a HTML/PDF file.¶
This assignment builds on:¶
- Week 3: OLS Estimator for Simple Linear Regression
- Week 4: OLS Estimator for Multiple Linear Regression
- Week 5: Incorporating Qualitative Data
- data: J.M. Wooldridge (2019) Introductory Econometrics: A Modern Approach, Cengage Learning, 7th edition.
🎯 Learning Objectives¶
By the end of this assignment, you should be able to:
- Practice OLS in simple and multiple regression.
- Interpret estimated slopes (marginal effects).
- Incorporate qualitative data (dummy variables) in regression.
📝 Grading (Total = 10 points)¶
- Q1: Data prep & quick summary — 3 pt
- Q2: SLR — 2 pts
- Q3: MLR & dummy variables — 5 pts
📦 Download and import required libraries (please feel free to add any other libraries you may need)¶
In [ ]:
# Install quietly (run once if needed)
%pip install -q wooldridge pandas numpy matplotlib seaborn statsmodels nbconvert
In [ ]:
# Imports
import wooldridge as wr
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf
plt.rcParams['figure.figsize'] = (6,4)
print("✅ Libraries ready.")
📥 Load the dataset¶
In this problem set, we will use the
econmathdataset from thewooldridgepackage. (same as PS1)It contains information on students from a large college course in introductory microeconomics.
In [ ]:
# Show dataset description (variables, source, sample)
wr.data("econmath", description=True)
# Load DataFrame for analysis
df_raw = wr.data("econmath").copy()
df_raw.head()
❓ Q1 — Data prep & quick summary (3 pts, 3 sub-questions)¶
- Create a working copy
df = df_raw.copy()to include only the interested variables in this exercise (score,hsgpa,calculus) and report the number of rows and columns.
In [ ]:
# Put your answer here
- Report
.describe()for the working copydf.
In [ ]:
# Put your answer here
- Report the count of missing values in each column of
df.
In [ ]:
# Put your answer here
❓ Q2 — Simple Linear Regression (2 pts, 2 sub-questions)¶
Estimate the simple regression of score on hsgpa using OLS:
$$ \text{score}_i = \beta_0 + \beta_1\,\text{hsgpa}_i + u_i. $$
- Report the estimated intercept and slope (from
smf.ols("score ~ hsgpa", data=df).fit()).
In [ ]:
# Put your answe here
- Interpret the slope in plain English (units of
scoreper 1-point increase inhsgpa).
✍️ Interpret the slope here (two sentences max):
In [ ]:
# Plot scatter + fitted line (provided)
sns.regplot(x='hsgpa', y='score', data=df, scatter_kws={'alpha':0.5})
plt.title("SLR: score ~ hsgpa")
plt.show()
❓ Q3 — Multiple Linear Regression & dummy variables (5 pts, 4 sub-questions)¶
Estimate the multiple regression: $$ \text{score}_i = \beta_0 + \beta_1\,\text{hsgpa}_i + \beta_2\,\text{calculus}_i + u_i. $$
- Report the estimated coefficients (from
smf.ols("score ~ hsgpa + calculus", data=df).fit()).
In [ ]:
# Put your answer here
- Interpret $\beta_1$: the partial effect of
hsgpaholdingcalculusfixed.
✍️ Interpret the partial effect here (two sentences max):
- Produce a plot with:
- fix
calculus=1and plot predictedscorevshsgpa. (Line should be straight since slopes don’t vary withcalculushere.) [2 pts]
- fix
In [ ]:
# Put your answer here
- Compare the slope on
hsgpafrom Q2 and Q3. Which one is larger? In 1–2 sentences, explain why they might differ.
✍️ Compare the slopes here (two sentences max):
End of Problem Set.