Data ethics refers to the moral principles and behaviors surrounding the collection, use, management, and distribution of data.
Whether you are a working with data as a part of research, building your own data tools, or conducting research on the topic of data ethics, this guide will direct you to useful resources to guide your work.
For a short introduction to major issues in data ethics, try this video from Google's Data Analytics course:
Another good place to start is "Responsible Data Science" from Laura Igual and Santi Seguí's Introduction to Data Science (2024).
Data: a collection of facts which may be recorded in text, graphics, sound, or other media. A singular fact is called a datum.
Accountability: a structure of consequences in cases of unethical usage, collection, and/or distribution of data. This may involve legal consequences, loss of employment, or other punishments imposed by a governing body or even within an organization.
Anonymity: the absence of personally identifying characteristics in data. In research involving human subjects, data anonymization is often required before the research data may be published.
Bias: human prejudice that may distort the accuracy of data collection and usage. Biased algorithms, unfair data procedures, and slanted data analyses and visualizations may result in increased discrimination and inequity.
Consent: permitting others the ability to collect, use, or distribute data about one's self. Informed consent involves the data subject having all knowledge of the ways their personal data may be collected, used, and distributed before authorizing access to their data.
Ownership: the ability to permit or deny usage of data. Ownership only exists when regulations and accountability are in place to enforce it.
Privacy: protection of personal data against unpermitted access or exposure. Subjects of personal data collection should be informed about the uses of their data and the security measures put in place to protect it. Some data privacy is protected by law.
Security: the measures put in place to ensure that data are not accessed, disclosed, altered, or destroyed in unauthorized ways. Data security may include encryption, protocols, and regular audits. Data managers should be trained in these security methods.
Transparency: clear and open communication about the ways data are collected, used, managed, and distributed.