I’ve been intrigued by the explosion Artificial Intelligence products on the market, especially in Healthcare. First, I have to say that the technology is impressive. The ability to analyse an image and spot abnormalities suggesting potential cancers can only be a good thing. Even deploying speech recognition into a consultation or online meeting system seems to have scope for improving digital dictation systems and in some settings may help make clinicians more efficient.
I suppose the area I am most apprehensive about is where this technology could be used to change or potentially summarise the information. My thoughts on this do not necessarily just apply to applications in healthcare, but are probably more philosophical about the nature of intelligence itself and the assumptions people make about what a computer may tell them.
In general, people are quite simple beings. We are not very good at being critical of all the information we receive. If something seems plausible, then we will accept it as the working truth, until proved otherwise. We have had to evolve this way to survive, but aren’t always aware of the risks in the modern world when we are bombarded with so much information every moment. It’s like the Little Britain ‘Computer says no!’ phenomenon. If we are faced with a suggestion from a trusted device, we will just accept it, alternatively we get pop up fatigue, where if there are too many warnings, then we get bused t them and ignore them after a while, with clicking through them becoming almost automated. Another evolutionary trait is to think that others are like us. We see that in the numerous security failures when in retrospect peoples behaviour could have been suspect in planning a terrorist attack, but went unnoticed, as security services or surrounding people did not ‘see the signs’. Con artists have long taken advantage of this trait, whether through speaking with confidence about a potential investment or a fake email or phone call to obtain your banking information. Often victims feel this is their fault, but in fact it is part of being human to accept what we are told in general, unless we are really ready for the signs. This same evolutionary trait is also a risk when it comes to generative AI.
Generative AI and the Large Language Models (LLM) that power hem have take the world by storm in the last few years. Most people have played with Open AI Chat GPT, Microsoft Co-Pilot or Google Gemini. They work by using powerful analytic tools on very large amounts of training data to effectively work out the probability of words usually belonging together. Mathematically, each word is mapped in a multidimensional vector space with each dimension representing some type of context. In simple terms, we could think of the word ‘Dog’ and assume it is stored at a certain position, which can be represented by a series of numbers a bit like a simple (x,y) graph.. The complex computers behind LLMs can plot a word using several hundred dimensions (x,y,z,a,b,c,d,e,f…..). We can regard each of these letters as a likelihood of something, for example ‘f’ could be the likelihood of the word being a pet. If we looked at the dimensions for dog and cat, we may find that f,g,h are all the same. If we ask an LLM, ‘give me examples of pets’, it will look at the words we typed (or ‘prompt’) and then look for words in a similar space. It could draw on words where f,g,h are the same as ‘pet’, but the other several hundred dimensions vary slightly. This will all be based on the association of words in the training text. It does not represent any understanding of what a Dog, Cat or Pet is. This is obvious to us, but what if we were looking for peoples ‘pet hate’. True it could be coded in the other dimensions, but it could only draw on word associations that were in the training text, not new ideas. Our understanding maybe context-base. Take the example of someone asking ‘Where is the bank?’. The meaning is different if the person is holding a bank card, compared to sitting in a boat.
The point is that using vast amounts of data, LLMs can provide very plausible responses, which has 2 risks, especially in medical applications. First, the plausible response can give the impression of being like us, a person. We anthromorphise the response and treat it and trust it like we would a human. Second, the response can only be based on the training text, so the response will be biased by it and can only reproduce words based on that training and not in anyway by actual meaning . The danger is that people accept information created in this way at face value and assume it is accurate when it may not be. One of the described draw backs of LLMs is described as hallucinations, that is something that appears plausible, but is not true. Many treat these as something to fix, but as must be clear from the way the technology works, they are inevitable in a probability-based text generator. LLM chatbots have also been known to provide racist content or even plausible drug dose advise, which is dangerously wrong. This has been dealt with by building filtering technology around the output, but it is impossible to prepare for all possible outputs. This problem is a key issue in all AI-based technologies – however much training data is used, you cannot prepare for all fringe cases. A key area where this has prevented development of a technology at the pace people expected to see is ‘driverless cars’. Using lots of training data, a driverless system may be trained to stop if a child runs out in the middle of the road or if there is a car in the way, but the system does not understand what a child or tractor is. This means it cannot extrapolate in the same way you would, so what happens if it is a child, dressed as a Zebra on a pogo stick on the way to a party or what if its a 3 wheeled car with a promotional beer can shaped top on its side. I know these may sound far fetched, but the point is you would just deal with it, an AI system may not recognise the issue. As a ciinician, I find the diagnostic process intriguing. As much as 80-90% of the information comes from the history. The rapport, the words, the emotion, the context and even sub context are so important. I think there will be many clinicians that took a history and when looking purely at the words would find no problem, but something about the way a word was said, the body posture or maybe even a micro-expression suggested something about a diagnosis. With enough training data, maybe AI could start to pick out these patterns, but I suspect that its not so much to do with what is said or how it is said, but the co-produced meaning behind the interaction that gives the answer. You could argue that this is subjective and practicing medicine should be more objective, but you could also argue that what we call subjective here is. not only the important human element, but that a presentation is completely subjective. Two people presenting with a similar diagnosis may present in different ways, that is totally subjective and diagnosis maybe based on a subjective understanding of the individuals character.
Whilst it is important to recognise the limitations of current technology, especially in a world where there is so much hype, a bigger question may be how can we make sure non-technical people have enough knowledge to be able to appraise a technology well enough to assess to its true capability and safety in potential applications such as clinical settings? An interesting consideration may be a technology that is only 60% effective maybe very useful if our current effectiveness is only 40%. Human systems make lots of mistakes or errors in judgement, so an imperfect technology that is less imperfect than us could be a game changer as long as we are comfortable with the risks.
Finally, the hype is demonstrating a desire to have technology that does have a degree of trustworthy intelligence. We recognise the difference between current technology that may mimic humans in a plausible way, but how would a truly intelligent machine differ. This is an area we can explore further in another blog, but I would suggest the starting point would be to start with a theory of intelligence or at least be clear what key features such as demonstrating an understanding of meaning would be.