π Final Project β Suggested Datasets
Curated options with quick notes on access, scope, and typical use-cases. You're encouraged to propose other datasets, too.
You may use datasets NOT on this list. Propose a dataset that fits your question and note access details.
Expected workflow:
Start with summary statistics β do EDA (Exploratory Data Analysis: quick plots, patterns, missingness) β choose an appropriate visualization for your story.
Keep it reproducible: one notebook, clear cleaning steps, and labeled figures.
Then, later: you'll conduct an econometric analysis (e.g., a simple regression) to formalize a relationship you explored in EDA.
ποΈ Proposal due: October 8
Submit a
one-page proposal summarizing:
- Research question
- Data source(s) (and access plan)
- Proposed econometric methods (e.g., descriptive + regression idea)
- Expected challenges (data cleaning, measurement, access, etc.)
Your topic must be approved by the instructor before you begin analysis.
π Public & Free (Good for everyone)
IPUMS
Microdata on people & households across time and countries. Rich demographics, labor, education, housing.
registration required
World Bank Data
Hundreds of economic & development indicators (GDP, inflation, enrollment) by country and year.
FRED (Federal Reserve Economic Data)
US macro time series (CPI, unemployment, rates) with strong metadata and consistent updates.
Open Sports Data
Event- and season-level data for soccer & baseball. Great for cross-sectional comparisons and time series performance.
π« Education / Labor
IPEDS (US Higher Ed)
Institution-level data: enrollment, completion, costs, faculty, finances. Useful for campus-level questions.
BLS (Labor Statistics)
Employment, wages, CPI, productivity. National, state, metro breakdowns.
πΌ Finance / Corporate (via WRDS)
Access at Emory: Use the
Emory Libraries website to log in to
WRDS and request access if needed. Then use WRDS to retrieve the datasets below.
Data vendors of WRDS
CRSP β Center for Research in Security Prices
Access: WRDS via Emory Libraries
US equity prices, returns, and events. Ideal for time-series & panel analyses at the firm or market level.
WRDS
Compustat
Access: WRDS via Emory Libraries
Firm-level fundamentals (income statement, balance sheet). Great for corporate finance & accounting questions.
licensed
panel
Revelio Labs
Access: WRDS via Emory Libraries
Human Resources/Workforce database (professional profiles/ job postings/ employee sentiment reviews etc).
WRDS
β οΈ Ethics & Terms of Use: Prefer official downloads and APIs. Avoid scraping sites that prohibit it. Anonymize sensitive info, and document your data cleaning decisions.
π Great Applied Econ Papers (Optional Inspiration)
Card, D., & Krueger, A. B. (1994). Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania
American Economic Review, 84(4), 772β793.
Links: Open (author PDF) Β· NBER WP
Angrist, J. D., & Krueger, A. B. (1991). Does Compulsory School Attendance Affect Schooling and Earnings?
Quarterly Journal of Economics, 106(4), 979β1014.
Links: Publisher Β· NBER WP (PDF)
Bertrand, M., & Mullainathan, S. (2004). Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination
American Economic Review, 94(4), 991β1013.
Links: Publisher Β· NBER WP (PDF)
Duflo, E. (2001). Schooling and Labor Market Consequences of School Construction in Indonesia: Evidence from an Unusual Policy Experiment
American Economic Review, 91(4), 795β813.
Links: Publisher Β· Author PDF
Chetty, R., Friedman, J. N., & Rockoff, J. E. (2014). Measuring the Impacts of Teachers II:
Teacher Value-Added and Student Outcomes in Adulthood
Quarterly Journal of Economics, 129(1), 263β320.
Links: Publisher Β· Working Paper (PDF)
Jensen, R. (2010). The (Perceived) Returns to Education and the Demand for Schooling
Quarterly Journal of Economics, 125(2), 515β548.
Links: Publisher Β· Open (PDF)
π§ Tips for a Strong Project
- Inspect the data first: browse variables, summary stats, and quick plots to get inspiration.
- Then start with a question: e.g., βDoes higher school spending relate to test performance?β
- Pick the right structure: cross-section vs time series vs panel β choose appropriate visuals.
- Reproduce easily: keep a single notebook and document cleaning steps.
- Scope smartly: prefer one focused question over many unfocused plots.
- Robustness check(s): test whether your main finding holds under small changes in specification, variable definition, or sample. It strengthens credibility.
Mini-rubric (what weβll grade for)
- Clarity: Your question clearly states
who (population or unit),
where (geography or setting),
when (time period),
what outcome you will measure,
and what main factor/exposure you will study.
- Motivation: why the question matters. (Important! Show the reader why they should care!)
- Feasibility: data and measures are concrete; code runs.
- Design alignment: Your model/econometrics tool should match the question.
- Integrity: Be honest about what the data can and cannot show. Point out possible limitations or challenges in your results.
- Communication: clean figure/table, labels, units, readable captions.
Philosophy: βKnow what youβre doing β do what you know.β