πŸ“Š Final Project – Suggested Datasets

Curated options with quick notes on access, scope, and typical use-cases. You're encouraged to propose other datasets, too.

You may use datasets NOT on this list. Propose a dataset that fits your question and note access details.
Expected workflow:
Start with summary statistics ➜ do EDA (Exploratory Data Analysis: quick plots, patterns, missingness) ➜ choose an appropriate visualization for your story.
Keep it reproducible: one notebook, clear cleaning steps, and labeled figures.

Then, later: you'll conduct an econometric analysis (e.g., a simple regression) to formalize a relationship you explored in EDA.
πŸ—“οΈ Proposal due: October 8
Submit a one-page proposal summarizing: Your topic must be approved by the instructor before you begin analysis.

🌍 Public & Free (Good for everyone)

IPUMS

Microdata on people & households across time and countries. Rich demographics, labor, education, housing.

registration required

World Bank Data

Hundreds of economic & development indicators (GDP, inflation, enrollment) by country and year.

FRED (Federal Reserve Economic Data)

US macro time series (CPI, unemployment, rates) with strong metadata and consistent updates.

Open Sports Data

Event- and season-level data for soccer & baseball. Great for cross-sectional comparisons and time series performance.

🏫 Education / Labor

IPEDS (US Higher Ed)

Institution-level data: enrollment, completion, costs, faculty, finances. Useful for campus-level questions.

BLS (Labor Statistics)

Employment, wages, CPI, productivity. National, state, metro breakdowns.

πŸ’Ό Finance / Corporate (via WRDS)

Access at Emory: Use the Emory Libraries website to log in to WRDS and request access if needed. Then use WRDS to retrieve the datasets below.
Data vendors of WRDS

CRSP β€” Center for Research in Security Prices

Access: WRDS via Emory Libraries

US equity prices, returns, and events. Ideal for time-series & panel analyses at the firm or market level.

WRDS

Compustat

Access: WRDS via Emory Libraries

Firm-level fundamentals (income statement, balance sheet). Great for corporate finance & accounting questions.

licensed
panel

Revelio Labs

Access: WRDS via Emory Libraries

Human Resources/Workforce database (professional profiles/ job postings/ employee sentiment reviews etc).

WRDS
⚠️ Ethics & Terms of Use: Prefer official downloads and APIs. Avoid scraping sites that prohibit it. Anonymize sensitive info, and document your data cleaning decisions.

πŸ“š Great Applied Econ Papers (Optional Inspiration)

Card, D., & Krueger, A. B. (1994). Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania

American Economic Review, 84(4), 772–793.

Links: Open (author PDF) Β· NBER WP

Angrist, J. D., & Krueger, A. B. (1991). Does Compulsory School Attendance Affect Schooling and Earnings?

Quarterly Journal of Economics, 106(4), 979–1014.

Links: Publisher Β· NBER WP (PDF)

Bertrand, M., & Mullainathan, S. (2004). Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination

American Economic Review, 94(4), 991–1013.

Links: Publisher Β· NBER WP (PDF)

Duflo, E. (2001). Schooling and Labor Market Consequences of School Construction in Indonesia: Evidence from an Unusual Policy Experiment

American Economic Review, 91(4), 795–813.

Links: Publisher Β· Author PDF

Chetty, R., Friedman, J. N., & Rockoff, J. E. (2014). Measuring the Impacts of Teachers II: Teacher Value-Added and Student Outcomes in Adulthood

Quarterly Journal of Economics, 129(1), 263–320.

Links: Publisher Β· Working Paper (PDF)

Jensen, R. (2010). The (Perceived) Returns to Education and the Demand for Schooling

Quarterly Journal of Economics, 125(2), 515–548.

Links: Publisher Β· Open (PDF)

🧭 Tips for a Strong Project

Mini-rubric (what we’ll grade for)

  1. Clarity: Your question clearly states who (population or unit), where (geography or setting), when (time period), what outcome you will measure, and what main factor/exposure you will study.
  2. Motivation: why the question matters. (Important! Show the reader why they should care!)
  3. Feasibility: data and measures are concrete; code runs.
  4. Design alignment: Your model/econometrics tool should match the question.
  5. Integrity: Be honest about what the data can and cannot show. Point out possible limitations or challenges in your results.
  6. Communication: clean figure/table, labels, units, readable captions.
Philosophy: β€œKnow what you’re doing β€” do what you know.”