This symposium, organized by the Alan Turing Institute, was held at the IET, Savoy Place on 23 February and attracted around 350 delegates, including what seemed to be the entire UK machine learning research community. There were seven presentations and a panel session that I was not able to stay for.
(Note from the Editor. Two months seems to be a lifetime in LLM world and I thought twice about including this! However it does highlight the very proactive role that the Alan Turing Institute is taking on behalf of the UK AI community)
I came away with pages of notes made at the symposium but as I have worked through them I have decided not to report on a paper-by-paper basis but instead to synthesize what to me were some (certainly not all!) of the take-aways of the day.
- We are in the middle of a period of innovation. ChatGPT emerged as the result of a period of research and development, and there are many other LLMs that we need to be aware of. What we do not yet know is how close we are to an asymptotic point. There was a strong feeling that there were substantial opportunities for smaller more focused foundation models. A couple of speakers raised the issue about what the purpose was of ‘foundation models’ – a foundation for what, exactly! The point was made that the initial work on text summarization was carried out by H.P.Luhn at IBM in 1955!
- Reality is certainly creeping in. We are perhaps descending down the hype curve into the trough of despair, and that trough might well be about the computation costs in both $$$ ($600k a day was mentioned for OpenAI) and environmental impact. Smaller models might reduce the point computation costs but collectively could have an even greater impact on our climate.
- Many of the speakers referred to some of the silliness of foundation models, including hallucinations, bias (arising from the training data), short text outputs (which is a computational resource issue), susceptibility to injection attacks and the lack of source citation. One speaker spent 25 minutes giving examples but did not discuss the causes of these issues and how they could be addressed, which was disappointing.
- An issue that many remarked on was the desirability of foundation models knowing what they don’t know. It would (in my opinion) be very useful to know what is actually in the OpenAI database, other than it stops in 2021.
- There were some positive comments about the intersection of foundation models and search, where search provides the citation information. This would reassure users and would also reduce the need to continually update the training model. This is what Microsoft are doing with Bing at present and there was a strong hint that the OpenAI model will be introduced into Microsoft Office sooner rather than later, to take advantage of (for example) summarization and translation capabilities. Talking to a couple of delegates over lunch resulted in a common view that Google was going to have to work very hard to find a niche (because there are only niches left) against Microsoft and OpenAI.
- Evaluating foundation models is a massive challenge. It starts with the fact that there are three access models, black box (OpenAI), a web AI and a completely open access/transparent disclosure about the underlying technical architecture. One of the reasons for the focus on evaluation is the need to understand what we mean by ‘progress’ in foundation model development.
- Professor Percy Liang talked about the HELM benchmarking initiative at Stanford University. This currently assesses 34 models against 42 scenarios generating 57 metrics. Not all scenarios and metrics are applicable to every model. The paper on which the benchmark is based runs to 163pp but is absolutely essential reading.
Overall a very rewarding day to spend in a comfortable lecture theatre and the Alan Turing Institute should be congratulated on this initiative. Clearly the Institute is playing an important role in underpinning a UK programme of work on foundation models and there was a strong hint that the UK should have its own foundation model. My only disappointment was that several speakers were in ‘research seminar’ delivery mode, seemingly with a mission to impress rather than to communicate and inspire.