Algorithm-focused technologies disproportionately make life easier for wealthy people. The people who build machine learning algorithms are typically at companies where they get to take economic advantage of deployed machine learning models for years. Training data-focused technologies, by contrast, disproportionately squeeze value out of less wealthy people, who are paid only once and don’t get to take advantage of the success of models that used their training data.
This talk will focus on methods that ensure the fair creation of training data for machine learning, whether the annotators are in-house, contracted, or work as crowdsourced workers online. It will show that contrary to the widely-held belief that training data creation is a race to the bottom in pricing, it is possible to maximize quality and fairness at the same time for almost any machine learning task.