Sensitivity strikes again: RoboCop Kaggle competition edition

It’s a sad day for those who actually wanted to learn about ML’s implications for society

28 Mar 2016

RoboCop 2014 movie poster: A robot police officer holds a gun and stands in front of his motorcycle

I used to think that complaints about excessive sensitivity described at other colleges and lamented in New York Times op-eds were exaggerations. I read the articles and basically just lumped them together with all the other complaints about millennials AKA “kids these days.”

This week, I experienced firsthand how harmful these anti-disagreement attitudes can be.

It all started when Prof. Satyen Kale assigned us a machine learning project: to train classifiers on NYPD’s open stop-and-frisk datasets.

Predictably, because this is Columbia, there’s controversy about it. ColorCode, a recently formed student group for increasing diversity in tech, accused Prof. Kale of practicing a “racist, ahistorical, and irresponsible pedagogy” by not explicitly mentioning the racism behind stop-and-frisk. They demanded “that this Machine Learning assignment be revoked, and that the professor issue an apology addressing the concerns.”

Prof. Kale is a nice guy. I told a friend that I expected him to pull the competition because he likes to keep everyone happy — and earlier tonight, he did exactly that:

Two original motivations for using this data set were (i) to illustrate the difficulties in developing any kind of “predictive policing” tool (which already exist today), and (ii) to assess how predictive modeling could help shed light on this past decision-making.

We originally thought that these challenging aspects of the data set would be of interest to the class. However, our formulation of the task was in poor taste and failed to provide adequate context.

This whole issue appears to be a drive-by misunderstanding, exacerbated by a new student group’s desire to make itself known. Regardless of whether you agree with the legality or ethics of stop-and-frisk policing (and I don’t), there is nothing wrong with asking students to work on a dataset with public policy relevance — especially at an NYC-based school that purports to give its engineering students a well-rounded liberal arts education.

When I heard about the competition in class, my first reaction was that we’d probably find the greatest reduction in uncertainty comes by looking at race, neighborhood, and then gender. And that discovery would spark a highly memorable discussion about the NYPD’s policing tactics.

So why’s this important in a computer science course? It’s because someday, somebody in this course will train a classifier that has huge real-world consequences. Maybe it decides which young couples can get mortgages (because we send tons of students to Wall Street) or whether a job applicant will be hired. They need to understand the risks: how careless application of machine learning can amplify privilege, perpetuate stereotypes, and reinforce the status quo.

Revoking assignment made some people feel better. But at what cost? The students in our class have lost a valuable opportunity to gain firsthand experience with the social implications of machine learning. That’s very troubling in a course designed for practitioners.

Arguing with people on stuff like this is tiring, and I’m probably going to lose a few friends the moment I hit publish. But it is the right thing to do. If students are reluctant to express unpopular opinions every time a disagreement like this occurs, we’ll end up in a society where programmers are infantilized when it comes to controversial politicial debates, blind to the social implications of their code. That’s not a world I want to live in.

Thanks for reading! If you’re enjoying my writing, I’d love to send you infrequent notifications for new posts via my newsletter. You’ll receive the full text of each post, plus occasional bonus content.

You can also follow me on Twitter (@kevinchen) or subscribe via RSS.