Saturday, 20 February 2021

Doing statistics by computer

Prompted by a piece in the FT (reference 1) about a diversity problem at Google, an HR rather than a technical problem, I looked up Timnit Gebru on Wikipedia, from where I got to the open access paper (plus supporting information) at reference 2.

An open access paper which is all about estimating demographic variables about income, education, race and voting (among others) from car ownership data.

The car ownership data is obtained by deep analysis of 50 million Google Street View images taken from across the US. Demographic variables are taken from the American Community Survey (ACS), for which see reference 3. You train the computer on a sample of areas using both car and demographic data, and then use the trained computer to predict the demographic variables from the car data for the other areas.

Note that the numbers involved mean that efficiency is an issue, even for Google.

Along the way, they whittle the 15,000 or so different types of car on the road in the US to around 2,500. At which level of detail the computer gets the make from the image right about a third of the time. To help the computer along with the demography that follows, they add in some car meta-data, not least price.

Along the way, they deploy a program which can ‘unwarp’ the Street View images, which I think amounts to flattening the spherical projection used by the wide-angle Street View cameras. Thus removing one source of classification confusion.

It turns out that the computer does quite a good job on some of the demographic variables, for example voting, and can produce more timely predictions for smaller areas than the ACS can manage. So deep mining of the Street View archive is perhaps a cost effective supplement to the ACS.

An example of the sort of thing that Google can afford to do in the margins which an academic institution would struggle with, not least because of difficulties with access to both the Street View data and suitable programs for handling same.

Reference 1: Google fires top AI ethicist: Removal of Margaret Mitchell comes after departure of Timnit Gebru amid debate over diversity at tech group - Richard Waters/FT - 2020. 20th February 2021.

Reference 2: Using deep learning and Google Street View to estimate the demographic makeup of neighborhoods across the United States - Timnit Gebru, Jonathan Krause, Yilun Wang, Duyun Chen, Jia Deng, Erez Lieberman Aiden, and Li Fei-Fei – 2017.

Reference 3: https://www.census.gov/content/dam/Census/programs-surveys/acs/about/ACS_Information_Guide.pdf. The American Community Survey (ACS) is a large, annual household survey filling the gap left by the decennial census retreating to a short form.

No comments:

Post a Comment