Who’s the creator “JC Shakespeare”? – Terence Eden’s Weblog
Knowledge graphs are tricky beasts to create. Trying to extract semantic metadata from documents is a gargantuan task. Mix them together and you have a recipe for disaster.
While yak-shaving for my MSc, I found an interesting looking research paper authored by one JC Shakespeare.
As you can probably tell from that snippet, there is something a bit hinkey going on here. Here’s the page that Google Scholar has scraped:
It’s pretty easy to see what has happened here. The algorithm (whether via simple AI or complex regular expression) “knows” that a typical surname followed by a comma followed by a typical given name is almost certainly a reference.
And so “JC Shakespeare” becomes the author of a delightfully diverse set of papers.
Of course, Julius Caesar isn’t the only play which gets picked up in this way:
Remember, AI is a great tool. It can be remarkably quick at drawing nearly correct conclusions from a diverse data set. When talking about AI, we usually discuss false positives and false negatives. But we also need to ask “is this the sort of mistake a human would make?”
As it happens, Google has been making this class of mistakes for a few years:
Google Scholar has parsed this cafeteria lunch menu as an author list, and it’s delightful pic.twitter.com/jobE6Z7bpI
— Alex Klotz (@AlexanderRKlotz) April 21, 2020