Data Ethics, Bias, and Fairness¶

Topics of data ethics, privacy, and fairness should be at the beginning of a data science textbook, not at the end. Indeed, next drafts should integrate ethical challenges earlier in the textbook. Data ethics is the last topic in this version for two reasons. First, an informed discussion of data ethics issues requires to first build some intuitions about the data science lifecycle and what can go wrong in that lifecycle, including different sources of bias. Learning how to evaluate models in general is a precondition for evaluating models for bias and fairness. Second, moving beyond the issues of data ethics to possible mitigation strategies also requires some exposure to approaches we covered earlier in the course, for example causal inference. Finally, although our focused discussion on data ethics comes last, we discuss issues of data design and biases (e.g., algorithmic confounding), transparency and reproducibility, and other topics of ethical data science throughout the textbook.

Key themes¶

Ethical challenges, principles, and frameworks.
Privacy and consent.
Detecting and dealing with bias and fairness in data science models.
Implications of data science biases: discrimination, profiling, social inequalities.

Learning resources¶

Jeremy Howard, Sylvain Gugger, and Rachel Thomas. Chapter 3: Data Ethics. In Deep Learning for Coders with fastai and PyTorch: AI Applications Without a PhD. Online version freely available.
Associated video lecture Getting Specific About Algorithmic Bias by Rachel Thomas. PyBay 2019. Slides are available here.

Mathew Salganik. Chapter 6: Ethics. In Bit by Bit: Social Research in the Digital Age. Online version freely available.
Associated video lectures Ethics and Computational Social Science by Mathew Salganik, Part 1 and Part 2.

Ian Foster, Rayid Ghani, Ron S. Jarmin, Frauke Kreuter, Julia Lane. Big Data and Social Science (2nd edition). Chapter 11: Bias and Fairness. Online version freely available.

Sendhil Mullainathan. 2019. Biased Algorithms Are Easier to Fix Than Biased People. New York Times.

Kelleher and Tierney. Chapter 6: Privacy and Ethics. In Data Science. MIT Press.

Mason A. Porter. Data Ethics. Video lecture and slides available here.

Pedro Saleiro, Kit T. Rodolfa, Rayid Ghani. Dealing with Bias and Fairness in Data Science Systems: A Practical Hands-on Tutorial.
Corresponding video tutorial, KDD 2020 tutorial.

Reproducible Data Science + Python + Real-World Data

Data Ethics, Bias, and Fairness¶

Key themes¶

Learning resources¶

Discussion: A-level results in 2020 England¶

Learning resources about the A-level controversy¶

Bias and Fairness in Data Science Systems¶