Where are all the baseball fields in the USA? The question may seem silly, but the answer is not easy. There is no official data on it. Satellite images might show the location of the fields. However, it would take some long tedious hours to analyse the pictures since there are thousands of them. This is the kind of task that could be done using artificial intelligence. It is cheaper, faster and more doable, as other journalists and I found out during a hands-on workshop during WCSJ2019.
In the workshop organised by FUSE and presented by machine learning journalist, Jeremy Merrill from Quartz, we used the Image Classification AI system from Google Console to identify the baseball fields in satellite images. The process is simple. The first step is to classify an initial sample of images as containing baseball fields or not, and then uploading them to the system. This trains the AI. The software works out for itself the patterns that can be used to identify what it’s looking for.
Once it finishes processing, it shows the average precision of the model just created. The second step is to test the model by providing it with images that were not in the initial sample. Even with a high precision rate, it may be surprising how many false positives or false negatives the model presents. The system is far from perfect, but it works. Now, why is this relevant to journalism?
Instead of asking about baseball fields, we could ask many other questions. Journalists from the Textyproject, for example, used the same strategy applying AI to satellite images to detect illegal amber mining in northern Ukraine.
The whole process it is not as simple as it may sound. The workshop gave us an idea about how to do it. However,, some coding knowledge is necessary. “These technologies became a lot easier. In the 90’s it would require a lot of money, years of studies in a PHD level. Now we can do it in a couple of weeks,” remarked Merrill.
Concerns and room for bias
Experiencing AI using my laptop in a short period of time was really exciting. It gave me a reassuring feeling that there are many opportunities to use AI in science journalism. However, as is the case with any technology, it can bring benefit and harm depending on how it is used. According to the preliminary results of a survey presented in the workshop by Charlie Beckett from the Polis project, most journalists worry that AI can reinforce the inequality between bigger and better resourced news organizations and smaller ones. The fear of being replaced by bots that write stories by themselves was also mentioned by respondents to the survey, and by people in the workshop. This may happen one day. However, I believe there are ethical questions that deserve our attention more.
AI, despite the name, is not intelligent at all. The outcomes from these kinds of systems depend on the data used, the size of the sample and on how the model is trained. There is a lot of room for bias. In a recent case, the Amazoncompany used AI to rate job candidates and improve the hiring process. The problem was that the sample given to the machine to represent excellence was mostly composed by men’s resumes. So the outcome was a model that taught itself that male candidates were preferable, penalizing women’s resumes.
Bias seems to me to be one of the most urgent of spotlights to focus on AI. This matter needs more discussion, in particular about the potential AI has in journalism, but also when reporting about this technology’s application in many other fields -- even baseball!
Opinions expressed in the blog posts are those of the author
and do not necessarily represent the views of WCSJ2019