📘 ECON 320 Lab Problem Set 1¶
Name : [Your Name]
Lab Section: [Your Lab Section Here]
Please submit the exercise on Canvas in form of a HTML/PDF file.¶
This assignment builds on:¶
- Week 1: Descriptive Statistics & Basic Python Coding
- Week 2: Understanding & Presenting Data
- data: J.M. Wooldridge (2019) Introductory Econometrics: A Modern Approach, Cengage Learning, 7th edition.
You will practice summary statistics, basic data cleaning, and choosing appropriate visualizations.
🎯 Learning Objectives¶
By the end of this assignment, you should be able to:
- Compute and interpret summary statistics.
- Practice basic data cleaning.
- Conduct different data visualizations.
- Reflect on the difference between correlation and causation.
📝 Grading (Total = 10 points)¶
- Q1: Summary statistics — 2 pt
- Q2: Data cleaning — 2 pts
- Q3: Visualizations — 4 pts
- Q5: Critical thinking — 2 pts
📦 Download and import required libraries¶
In [ ]:
%pip install wooldridge matplotlib seaborn
import wooldridge as wr
import matplotlib.pyplot as plt
import seaborn as sns
print("✅ Libraries ready.")
📥 Load the dataset¶
In this problem set, we will use the
econmathdataset from thewooldridgepackage.It contains information on students from a large college course in introductory microeconomics.
In [ ]:
# Show dataset description (variables, source, sample) for the "econmath" dataset
wr.data("econmath", description=True)
In [ ]:
# Load DataFrame for analysis
df_raw = wr.data("econmath")
df_raw.head()
In [ ]:
# Put your answer here
2. Report mean, std, min, max for score, actmth, and acteng.¶
In [ ]:
# Put your answer here
In [ ]:
# Put your answer here
2. For this assignment, assume the missing ACT scores are missing at random.¶
Drop all rows with missing ACT scores (actmth or acteng).
Report how many rows remain after dropping missing values.
In [ ]:
# Solution is provided for this step
df_clean = df_raw.dropna(subset=['actmth', 'acteng'])
print(f"Rows remaining after dropping missing ACT scores: {df_clean.shape[0]}")
3. Create a new DataFrame (name it df_analysis) based on df_clean to keep only the following relevant variables for the rest of the assignment:¶
score(test score on the introductory microeconomics course)actmth(ACT math score)acteng(ACT English/verbal score)
In [ ]:
# Put your answer here
In [ ]:
# Put your answer here
# 1) Histogram of score
2. Boxplot of score. (1 pt)¶
In [ ]:
# Put your answer here
# 2) Boxplot of score
3. Scatter plot of score vs actmth (with best-fit line). (1 pt)¶
In [ ]:
# Put your answer here
# 3) Scatter: score vs actmth
4. Scatter plot of score vs acteng (with best-fit line). (1 pt)¶
In [ ]:
# Put your answer here
# 4) Scatter: score vs acteng
- ✍️ Brief interpretation (2–4 sentences): (2 pts, open-ended)
- What do you notice about the distribution of
score?- Are
scoreand ACT scores positively related?- Any caveats about correlation ≠ causation?
✍️ Put your answer here.
End of Problem Set.