2021-12-25 Limits of AI training

Miles Davis still playing on the stereo. I’m starting to think about lunch, and having more coffee.

Recently Greta Goetz sent me a link to a post by Markus Strasser:

Greta Goetz

TL;DR: I worked on biomedical literature search, discovery and recommender web applications for many months and concluded that extracting, structuring or synthesizing “insights” from academic publications (papers) or building knowledge bases from a domain corpus of literature has negligible value in industry. Close to nothing of what makes science actually work is published as text on the web – The Business of Extracting Knowledge from Academic Publications

The Business of Extracting Knowledge from Academic Publications

This matches my experience regarding fancy user interfaces:

Interactive graphs, trees, cluster visualizations, dendrograms, causal diagrams and what have you are much less satisfying than just lists, and most often lists will do. … I realized that when I sat next to a pro “reading” a paper. It goes like this: ‘Scan abstract. Read last sentence of introduction. Scan figures. Check results that discuss the figures....’ which was entirely different from my non-expert approach to reading papers.

This matches my experience regarding domain knowledge:

Sometimes being an outsider is an advantage, but in a field full of smart, creative people the majority of remaining inefficiencies are likely to come from incentives, politics and culture and not bad tooling.

I keep telling a related anecdote at work. I was working on a project where the customer wanted to automate the extension of leasing contracts. It was clumsy: print empty PDF, fill out manually, send by mail. We invited two actual car sales-managers. It turns out they weren’t interested at all because leasing contract extensions don’t pay. They wanted to sell more cars, so they wanted us to focus on that instead. I was an outsider to all of this and had no idea how it works.

I’ll have to remember the following the next time people at the office start dreaming about artificial intelligence:

I mentioned that there were major advances in relevant fields of machine learning; that current interfaces are impoverished; that an hour of searching could be collapsed into a minute in many cases. All of that is still true but for the reasons I tried to share in this essay nothing of it matters. I had to wrap my head around the fact that close to nothing of what makes science actually work is published as text on the web.

I think the same is true for many other sources of text. I remember doing some data science exploration where we wanted to help call centre agents write their replies to incoming email. We basically had two million emails that had been answered in the past, we had processes, steps, text fragments, and all we had to do was suggest some text fragments to use based on the email. Previous teams had already failed picking the correct process with the explanation that some of these emails had led to multiple processes, or that humans had picked the wrong process and had then changed it later, and on and on. Lots of excuses. And we ended up adding our own. There was so much data cleanup to do! Mails had to cleaned up: we probably wanted to remove quotes. (Do we? How do we recognise them?) Some emails were in fact faxes with a PDF attachment and broken optical character recognition (OCR). Some emails were forwarded from other portals and contained lots of unrelated text. So much useless text!

​#Programming ​#Artificial Intelligence

Comments

(Please contact me if you want to remove your comment.)

Humans produce lots of junk, yes!

That is how you have to read papers to get through them - and when you have read X number on a topic, you basically go ’No, No, No, No - Yes: Interesting, maybe, No, No...’ etc.

However, a robot wrote this: http://cosmicheroes.space/blog/index.php/2021/12/24/ai-tomb-of-horrors-rooms-4-to-6/

http://cosmicheroes.space/blog/index.php/2021/12/24/ai-tomb-of-horrors-rooms-4-to-6/

– bluetyson 2021-12-25 23:38 UTC

bluetyson

---

Might be good to get the brain going!

– Alex 2021-12-26 12:44 UTC