Towards fairer datasets: filtering and balancing the distribution of the people subtree in the ImageNet hierarchy

Yang, Kaiyu; Qinami, Klint; Li, Fei-Fei; Deng, Jia; Russakovsky, Olga

Towards fairer datasets: filtering and balancing the distribution of the people subtree in the ImageNet hierarchy

Author(s): Yang, Kaiyu; Qinami, Klint; Li, Fei-Fei; Deng, Jia; Russakovsky, Olga

Download

To refer to this page use: http://arks.princeton.edu/ark:/88435/pr1df9h

Full metadata record

DC Field	Value	Language
dc.contributor.author	Yang, Kaiyu	-
dc.contributor.author	Qinami, Klint	-
dc.contributor.author	Li, Fei-Fei	-
dc.contributor.author	Deng, Jia	-
dc.contributor.author	Russakovsky, Olga	-
dc.date.accessioned	2021-10-08T19:45:52Z	-
dc.date.available	2021-10-08T19:45:52Z	-
dc.date.issued	2020-01	en_US
dc.identifier.citation	Yang, Kaiyu, Klint Qinami, Li Fei-Fei, Jia Deng, and Olga Russakovsky. "Towards fairer datasets: filtering and balancing the distribution of the people subtree in the ImageNet hierarchy." Proceedings of the Conference on Fairness, Accountability, and Transparency (2020): pp. 547-558. doi:10.1145/3351095.3375709	en_US
dc.identifier.uri	https://arxiv.org/pdf/1912.07726.pdf	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/pr1df9h	-
dc.description.abstract	Computer vision technology is being used by many but remains representative of only a few. People have reported misbehavior of computer vision models, including offensive prediction results and lower performance for underrepresented groups. Current computer vision models are typically developed using datasets consisting of manually annotated images or videos; the data and label distributions in these datasets are critical to the models' behavior. In this paper, we examine ImageNet, a large-scale ontology of images that has spurred the development of many modern computer vision methods. We consider three key factors within the person subtree of ImageNet that may lead to problematic behavior in downstream computer vision technology: (1) the stagnant concept vocabulary of WordNet, (2) the attempt at exhaustive illustration of all categories with images, and (3) the inequality of representation in the images within concepts. We seek to illuminate the root causes of these concerns and take the first steps to mitigate them constructively.	en_US
dc.format.extent	547 - 558	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartof	Proceedings of the Conference on Fairness, Accountability, and Transparency	en_US
dc.rights	Author's manuscript	en_US
dc.title	Towards fairer datasets: filtering and balancing the distribution of the people subtree in the ImageNet hierarchy	en_US
dc.type	Conference Article	en_US
dc.identifier.doi	10.1145/3351095.3375709	-
pu.type.symplectic	http://www.symplectic.co.uk/publications/atom-terms/1.0/conference-proceeding	en_US

Files in This Item:

File	Description	Size	Format
FairerDatasetsImageNet.pdf		4.43 MB	Adobe PDF	View/Download

Show Simple Item Record