Could AI end science?

by Nick Ballou, Oxford Internet Institute

Scientific fraud is worryingly common, though rarely talked about. It has been happening for years, but now Artificial Intelligence programs could supercharge it. If they do that could undermine Science itself.

Investigators of scientific fraud have found that large numbers of researchers have manipulated their results, invented data, or even produced nonsensical papers in the hope that no one will look closely enough to notice. Often, no one does. The problem is that science is built on the foundation of all the research that has gone before. If we can no longer trust that past research is legitimate, the whole system of science begins to break down. AI has the potential to supercharge this process.

We’re not at that point yet, luckily. But there are concerning signs that generative AI systems like ChatGPT and DALLE-E might bring us closer. By using AI technology, producing fraudulent research has never been easier, faster, or more convincing. To understand, let’s first look at how scientific fraud has been done in the past. 

How fraud happens 

Until recently, fraudsters would need to go through some difficult steps to get a fraudulent research paper published. A typical example might look like this: 

Step 1: invent a title

Fraudsters look for a popular but very broad research topic. We’ll take an example of a group of fraudsters known as the Tadpole Paper Mill. They published papers about cellular biology. To choose a new paper to create, the group would essentially use a simple generator, or algorithm, based on a template. This uses a simple technique first used by Christopher Strachey to write love letters in an early “creative” program in the 1950s.

For each “hole” in the template a word is chosen from a word list.

  1. Pick the name of a molecule
    • Either a protein name, a drug name or an RNA molecule name
    • eg mir-488
  2. Pick a verb
    • From alleviates, attenuates, exerts, …
    • eg inhibits
  3. Pick one or two cellular processes
    • From invasion, migration, proliferation, …
    • eg cell growth and metastasis
  4. Pick a cancer or cell type
    • From lung cancer, ovarian cancer, …
    • eg renal cell carcinoma
  5. Pick a connector word
    • From by, via, through, …
    • eg by
  6. Pick a verb
    • From activating, targeting, …
    • eg targeting
  7. Pick a name
    • Either a pathway, protein or miRNA molecule name
    • eg hMgn5

This produces a complicated-sounding title such as “mir-488 inhibits cell growth and metastasis in renal cell carcinoma by targeting hMgn5”. This is the name of a real fraudulent paper created this way.

Step 2: write the paper

Next, the fraudsters create the text of the paper. To do this, they often just plagiarise and lightly edit previous similar papers, substituting key words in from their invented title perhaps. To try to hide the plagiarism, they automatically swap out words, replacing them with synonyms. This often leads to ridiculous (and kind of hilarious) replacements, like these found in plagiarised papers: 

  • “Big data” –> “Colossal information” 
  • “Cloud computing” –> “Haze figuring”
  • “Developing countries” –> “Creating nations”
  • “Kidney failure” –> “Kidney disappointment”

Step 3: add in the results

Lastly, the fraudsters need to create results for the fake study. These usually appear in papers in the form of images and graphs. To do this, the fraudsters take the results from several previous papers and recombine them into something that looks mostly real, but is just a Frankenstein mess of other results that have nothing to do with the current paper.

A new paper is born

Using that simple formula, fraudsters have produced thousands of fabricated articles in the last 10 years. Even after a vast amount of effort, the dedicated volunteers who are trying to clean up the mess have only caught a handful. 

However, committing fraud like this successfully isn’t exactly easy, either: the fraudsters still need to come up with a research idea, write the paper themselves without copying too much from previous research, and make up results that look convincing—at least at first glance. 

AI: Adding fuel to the fire 

So what happens when we add modern generative AI programs into the mix? They are Artificial Intelligence programs like ChatGPT or DALL-E that can create text or pictures for you based on written requests. 

Well, the quality of the fraud goes up, and the difficulty of producing it goes way down. This is true for both text and images.

Let’s start with text. Just now, I asked ChatGPT-4 to “write the first two paragraphs of a research paper on a cutting edge topic in psychology.” I then asked it to “write a fake results table that shows a positive relationship between climate change severity and anxiety”. I won’t copy the whole thing—in part because I encourage you to try this yourself to see how it works (not to actually create a fake paper!)—but here’s a sample of what it came up with: 

“As the planet faces increasing temperatures, extreme weather events, and environmental degradation, the mental health repercussions for populations worldwide become a crucial area of investigation. Understanding these effects is vital for developing strategies to support communities in coping with the psychological challenges posed by a changing climate.”

As someone who has written many psychology research papers, I would find the results very difficult to identify as AI-generated—it looks and sounds very similar to how people in my field write, and it even generated Python code to analyse the fake data. I’d need to take a really close look at the origin of the data and so on to figure out that it’s fraudulent. 

But that’s a lot of work required from me as a fraud-buster. For the fraudster, doing this takes about 1 minute, and would not be detected by any plagiarism software in the way previous kinds of fraud can be. In fact, this might only be detected if the fraudsters make a sloppy mistake, like leaving in a disclaimer from the model as in the paper caught which included the text

“[Please note that as an AI language model, I am unable to generate specific tables or conduct tests, so the actual resutls should be included in the table.]”! 

Generative AIs are not close to human intelligence, at least not yet. So, why are they so good at producing convincing scientific research, something that’s commonly seen as one of the most difficult things humans can do? Two reasons play a big part: (1) scientific research is very structured, and (2) there’s a lot of training data. In any given field of research, most papers tend to look pretty similar—an introduction section, a method describing what the researchers did, a results section with a few tables and figures, and a discussion that links it back to the wider research field. Many journals even require a fixed structure. Generative AI programs work using Machine Learning – they learn from data and the more data they are given the better they become. Give a machine learning program millions of images of cats, telling it that is what they are, and it can become very good at recognising cats. Give it millions of images of dogs and it will be able to recognise dogs too. With roughly 3 million scientific papers published every year, generative AI systems are really good at taking these many, many examples of what a scientific report looks like, and producing similar sounding, and similarly structured pieces of text. They do it by predicting what word, sentence and paragraph would be good to come next based on probabilities calculated from all those examples.

Trusting future research

Most research can still be trusted, and the vast majority of scientists are working as hard as they can to advance human knowledge. Nonetheless, we all need to look carefully at research studies to ensure that they are legitimate, and we should be on extra alert as generative AI becomes even more powerful and widespread. We also need to think about how to improve universities and research culture generally, so that people don’t feel like they need to commit scientific fraud—something that usually happens because people are desperate to get or keep a job, or be seen as successful and reap the rewards. Somehow we need to change the game so that fraud no longer pays.

What do you think? Do you have ideas for how we can prevent fraud from happening in the first place, and how can we better detect it when it does occur? It is certainly an important new research topic. Find a solution and you could do massive good. If we don’t find solutions then we could lose the most successful tool human-kind has ever invented that makes all our lives better.


Related Magazines …

Cover issue 22 creativer computing
Cover issue 18 Machines that are creative

More on …


EPSRC supports this blog through research grant EP/W033615/1,

AMPER: AI helping future you remember past you

by Jo Brodie, Queen Mary University of London

Have you ever heard a grown up say “I’d completely forgotten about that!” and then share a story from some long-forgotten memory? While most of us can remember all sorts of things from our own life history it sometimes takes a particular cue for us to suddenly recall something that we’d not thought about for years or even decades. 

As we go through life we add more and more memories to our own personal library, but those memories aren’t neatly organised like books on a shelf. For example, can you remember what you were doing on Thursday 20th September 2018 (or can you think of a way that would help you find out)? You’re more likely to be able to remember what you were doing on the last Tuesday in December 2018 (but only because it was Christmas Day!). You might not spontaneously recall a particular toy from your childhood but if someone were to put it in your hands the memories about how you played with it might come flooding back.

Accessing old memories

In Alzheimer’s Disease (a type of dementia) people find it harder to form new memories or retain more recent information which can make daily life difficult and bewildering and they may lose their self-confidence. Their older memories, the ones that were made when they were younger, are often less affected however. The memories are still there but might need drawing out with a prompt, to help bring them to the surface.

old newspaper
Perhaps a newspaper advert will jog your memory in years to come… Image by G.C. from Pixabay

An EPSRC-funded project at Heriot-Watt University in Scotland is developing a tablet-based ‘story facilitator’ agent (a software program designed to adapt its response to human interaction) which contains artificial intelligence to help people with Alzheimer’s disease and their carers. The device, called ‘AMPER’*, could improve wellbeing and a sense of self in people with dementia by helping them to uncover their ‘autobiographical memories’, about their own life and experiences – and also help their carers remember them ‘before the disease’.

Our ‘reminiscence bump’

We form some of our most important memories between our teenage years and early adulthood – we start to develop our own interests in music and the subjects that we like studying, we might experience first loves, perhaps going to university, starting a career and maybe a family. We also all live through a particular period of time where we’re each experiencing the same world events as others of the same age, and those experiences are fitted into our ‘memory banks’ too. If someone was born in the 1950s then their ‘reminiscence bump’ will be events from the 1970s and 1980s – those memories are usually more available and therefore people affected by Alzheimer’s disease would be able to access them until more advanced stages of the disease process. Big important things that, when we’re older, we’ll remember more easily if prompted.

In years to come you might remember fun nights out with friends.
Image by ericbarns from Pixabay

Talking and reminiscing about past life events can help people with dementia by reinforcing their self-identity, and increasing their ability to communicate – at a time when they might otherwise feel rather lost and distressed. 

AMPER will explore the potential for AI to help access an individual’s personal memories residing in the still viable regions of the brain by creating natural, relatable stories. These will be tailored to their unique life experiences, age, social context and changing needs to encourage reminiscing.”

Dr Mei Yii Lim, who came up with the idea for AMPER(3).

Saving your preferences

AMPER comes pre-loaded with publicly available information (such as photographs, news clippings or videos) about world events that would be familiar to an older person. It’s also given information about the person’s likes and interests. It offers examples of these as suggested discussion prompts and the person with Alzheimer’s disease can decide with their carer what they might want to explore and talk about. Here comes the clever bit – AMPER also contains an AI feature that lets it adapt to the person with dementia. If the person selects certain things to talk about instead of others then in future the AI can suggest more things that are related to their preferences over less preferred things. Each choice the person with dementia makes now reinforces what the AI will show them in future. That might include preferences for watching a video or looking at photos over reading something, and the AI can adjust to shorter attention spans if necessary. 

Reminiscence therapy is a way of coordinated storytelling with people who have dementia, in which you exercise their early memories which tend to be retained much longer than more recent ones, and produce an interesting interactive experience for them, often using supporting materials — so you might use photographs for instance

Prof Ruth Aylett, the AMPER project’s lead at Heriot-Watt University(4).

When we look at a photograph, for example, the memories it brings up haven’t been organised neatly in our brain like a database. Our memories form connections with all our other memories, more like the branches of a tree. We might remember the people that we’re with in the photo, then remember other fun events we had with them, perhaps places that we visited and the sights and smells we experienced there. AMPER’s AI can mimic the way our memories branch and show new information prompts based on the person’s previous interactions.

​​Although AMPER can help someone with dementia rediscover themselves and their memories it can also help carers in care homes (who didn’t know them when they were younger) learn more about the person they’re caring for.

*AMPER stands for ‘Agent-based Memory Prosthesis to Encourage Reminiscing’.


Suggested classroom activities – find some prompts!

  • What’s the first big news story you and your class remember hearing about? Do you think you will remember that in 60 years’ time?
  • What sort of information about world or local events might you gather to help prompt the memories for someone born in 1942, 1959, 1973 or 1997? (Remember that their reminiscence bump will peak in the 15 to 30 years after they were born – some of them may still be in the process of making their memories the first time!).

See also

If you live near Blackheath in South East London why not visit the Age Exchange and reminiscence centre which is an arts charity providing creative group activities for those living with dementia and their carers. It has a very nice cafe.

Related careers

The AMPER project is interdisciplinary, mixing robots and technology with psychology, healthcare and medical regulation.

We have information about four similar-ish job roles on our TechDevJobs blog that might be of interest. This was a group of job adverts for roles in the Netherlands related to the ‘Dramaturgy^ for Devices’ project. This is a project linking technology with the performing arts to adapt robots’ behaviour and improve their social interaction and communication skills.

Below is a list of four job adverts (which have now closed!) which include information about the job description, the types of people that the employers were looking for and the way in which they wanted them to apply. You can find our full list of jobs that involve computer science directly or indirectly here.

^Dramaturgy refers to the study of the theatre, plays and other artistic performances.

Dramaturgy for Devices – job descriptions

More on …

1. Agent-based Memory Prosthesis to Encourage Reminiscing (AMPER) Gateway to Research
2. The Digital Human: Reminiscence (13 November 2023) BBC Sounds – a radio programme that talks about the AMPER Project.
3. Storytelling AI set to improve wellbeing of people with dementia (14 March 2022) Heriot-Watt University news
4. AMPER project to improve life for people with dementia (14 January 2022) The Engineer


Subscribe to be notified whenever we publish a new post to the CS4FN blog.


This blog is funded by EPSRC on research agreement EP/W033615/1.

QMUL CS4FN EPSRC logos

Virtual reality goggles for mice

by Paul Curzon, Queen Mary University of London

Conjure up a stereotypical image of a scientist and they likely will have a white coat. If not brandishing test tubes, you might imagine them working with mice scurrying around a maze. In future the scientists may well be doing a lot of programming, and the mice for their part will be scurrying around in their own virtual world wearing Virtual Reality goggles.

Scientists have long used mazes as away to test the intelligence of mice, to the point it has entered popular culture as a stereotypical thing that scientists in white lab coats do. Mazes do give ways to test intelligence of animals, including exploring their memory and decision making ability in controlled experiments. That can ultimately help us better understand how our brains work too, and give us a better understanding of intelligence. The more we understand animal cognition as well as human cognition, the more computer scientists can use that improved understanding to create more intelligent machines. It can also help neurobiologists find ways to improve our intelligence too.

Flowers for Algernon is a brilliant short story and later novel based on the idea, there using experiments on mice and humans to test surgery intended to improve intelligence. In a slightly different take on mice-maze experiments, Douglas Adams, in ‘The Hitchhikers Guide to the Galaxy’, famously claimed that the mice were actually pan-dimensional beings and these experiments were really incredibly subtle experiments the mice were performing on humans. Whatever the truth of who is experimenting on who, the experiments just took a great leap forward because scientists at Northwestern University have created Virtual Reality goggles for their mice.

For a long time researchers at Northwestern have used a virtual reality version of maze experiments, with mice running on treadmills with screens around them projecting what the researchers want them to see, whether mazes, predators or prey. This has the advantage of being much easier to control than using physical mazes, and as the mice are actually stationary the whole time , just running on a treadmill, brain-scanning technology can be used to see what is actually happening in their brains while facing these virtual trials. The problem though is that the mice, with their 180 degree vision, can still see beyond the edges of the screens. The screens also give no sense of 3 dimensions, when like us the mice naturally see in 3D. As the screens are not fully immersive, they are not fully natural and that could affect the behaviour of the mice and so invalidate the experimental results.

That is why the Northwestern researchers invented the mousey VR googles, the idea being that they would give a way to totally immerse the mice in their online world, and so improve the reliability of the experiments. In the current version the goggles are not actually worn by the mice, as they are still too heavy. Instead, the mouse’s head is held in place really close to them, but with the same effect of total immersion. Future versions may be small enough for the mice to wear them though.

The scientists have already found that the mice react more quickly to events, like the sight of a predator, than in the old set-up, suggesting that being able to see they were in a lab was affecting their behaviour. Better still, there are new kinds of experiment that can be done with this set up. In particular, the researchers have run experiments where an aerial predator like an owl appears from above the mice in a natural way. Mounting screens above them previously wasn’t possible as it got in the way of the brain scanning equipment. What does happen when a virtual owl appears? The mice either run faster or freeze, just as in the wild. This means that by scanning their brains while this is happening, how their perception of the threat works can be investigated, as well as how decision-making is taking place at the level of their brain activity. The scientists also intend to run similar experiments where the mouse is the predator, for example chasing a virtual fly too. Again this would not have been possible previously.

That in any case is what we think the purpose of these new experiments is. What new and infinitely subtle experiments it is allowing the pan-dimensional mice to perform on us remains to be seen.

More on …

Magazines …


EPSRC supports this blog through research grant EP/W033615/1. 

CS4FN Advent 2023 – Day 10: #AI – Holly, Ivy and Alexa – chatbots & the useful skill of file management. Plus win at noughts and crosses

Chatbots, knowing where your files are, and winning at noughts and crosses with artificial intelligence.

Welcome to Day 10 of our CS4FN Christmas Computing Advent Calendar. We are just under halfway through our 25 days of posts, one every day between now and Christmas. You can see all our previous posts in the panel with the Christmas tree at the end.

Today’s picture-theme is Holly (and ivy). Let’s see how I manage to link that to computer science 🙂

Sprig of holly. Image drawn and digitised by Jo Brodie.

1. Holly – or Alexa or Siri

In the comedy TV series* Red Dwarf the spaceship has ‘Holly’ an intelligent computer who talks to the crew and answers their questions. Star Trek also has ‘Computer’ who can have quite technical conversations and give reports on the health of the ship and crew.

People are now quite familiar with talking to computers, or at least giving them commands. You might have heard of Alexa (Amazon) or Siri (Apple / iPhone) and you might even have talked to one of these virtual assistants yourself.

When this article (below) was written people were much less familiar with them. How can they know all the answers to people’s questions and why do they seem to have an intelligence?

Read the article and then play a game (see 3. Today’s Puzzle) to see if you think a piece of paper can be intelligent.

Meet the Chatterbots – talking to computers thanks to artificial intelligence and virtual assistants

*also a book!

2. Are you a filing cabinet or a laundry basket?

People have different ways of saving information on their computers. Some university teachers found that when they asked their students to open a file from a particular directory their students were completely puzzled. It turned out that the (younger) students didn’t think about files and where to put them in the same way that their (older) teachers did, and the reason is partly the type of device teachers and students grew up with.

Older people grew up using computers where the best way to organise things was to save a file in a particular folder to make it easy to find it again. Sometimes there would be several folders. For example you might have a main folder for Homework, then a year folder for 2021, then folders inside for each month. In the December folder you’d put your december.doc file. The file has a file name (december.doc) and an ‘address’ (Homework/2021/December/). Pretty similar to the link to this blog post which also uses the / symbol to separate all the posts made in 2021, then December, then today.

Files and folders image by Ulrike Mai from Pixabay. Each brown folder contains files, and is itself contained in the drawer, and the drawer is contained in the cabinet.

To find your december.doc file again you’d just open each folder by following that path: first Homework, then 2021, then December – and there’s your file. It’s a bit like looking for a pair of socks in your house – first you need to open your front door and go into your home, then open your bedroom door, then open the sock drawer and there are your socks.

What your file and folder structure might look like. Image created by Jo Brodie for CS4FN.

Younger people have grown up with devices that make it easy to search for any file. It doesn’t really matter where the file is so people used to these devices have never really needed to think about a file’s location. People can search for the file by name, by some words that are in the file, or the date range for when it was created, even the type of file. So many options.

The first way, that the teachers were using, is like a filing cabinet in an office, with documents neatly packed away in folders within folders. The second way is a bit more like a laundry basket where your socks might be all over the house but you can easily find the pair you want by typing ‘blue socks’ into the search bar.

Which way do you use?

In most cases either is fine and you can just choose whichever way of searching or finding their files that works for you. If you’re learning programming though it can be really helpful to know a bit about file paths because the code you’re creating might need to know exactly where a file is, so that it can read from it. So now some university teachers on STEM (science, technology, engineering and maths) and computing courses are also teaching their students how to use the filing cabinet method. It could be useful for them in their future careers.

Want to find out more about files / file names / file paths and directory structures? Have a look at this great little tutorial https://superbasics.beholder.uk/file-system/

As the author says “Many consumer devices try to conceal the underlying file system from the user (for example, smart phones and some tablet computers). Graphical interfaces, applications, and even search have all made it possible for people to use these devices without being concerned with file systems. When you study Computer Science, you must look behind these interfaces.

You might be wondering what any of this has to do with ivy. Well, whenever I’ve seen a real folder structure on a Windows computer (you can see one here) I’ve often thought it looked a bit like ivy 😉

Creeping ivy at Blackheath station in London. Photographed by Jo Brodie for CS4FN.

Further reading

File not found: A generation that grew up with Google is forcing professors to rethink their lesson plans (22 September 2021) The Verge

3. Today’s puzzle

Print or write out the instructions on page 5 of the PDF and challenge someone to a game of noughts and crosses… (there’s a good chance the bit of paper will win).

The Intelligent Piece of Paper activity.

4. Yesterday’s puzzle

The trick is based on a very old puzzle at least one early version of which was by Sam Lloyd. See this selection of vanishing puzzles for some variations. A very simple version of it appears in the Moscow Puzzles (puzzle 305) by Boris A. Kordemsky where a line is made to disappear.

Images drawn by Jo Brodie for CS4FN.

In the picture above five medium-length lines become four longer lines. It looks like a line has disappeared but its length has just been spread among the other lines, lengthening them.

If you’d like to have a go at drawing your own disappearing puzzle have a look here.


Advert for our Advent calendar
Click the tree to visit our CS4FN Christmas Computing Advent Calendar

EPSRC supports this blog through research grant EP/W033615/1.

When a chatbot acts as your “trusted” agent …

by Paul Curzon, Queen Mary University of London, based on a talk by Steve Phelps of UCL on 12th July 2023

Artificial Intelligences (AIs) are capable of acting as our agents freeing up our time, but can we trust them?

A handshake over a car sale
Image by Tumisu from Pixabay

Life is too complex. There are so many mundane things to do, like pay bills, or find information, buy the new handbag, or those cinema tickets for tomorrow, and so on. We need help. Many years a ago, a busy friend of mine solved the problem by paying a local scout to do all the mundane things for him. It works well if you know a scout you trust. Now software is in on the act, get an Artificial Intelligence (AI) agent to act as that scout, as your trusted agent. Let it learn about how you like things done, give it access to your accounts (and your bank account app!), and then just tell it what you want doing. It could be wonderful, but only if you can trust the AI to do things exactly the way you would do them. But can you?

Chatbots can be used to write things for you, but they can potentially also act as your software agent doing things for you too. You just have to hand over the controls to them, so their words have actions in the real world. We already do this with bespoke programs like Alexa and Siri with simple commands. An “intelligent” chatbot could do so much more.

Knowing you, knowing me

The question of whether we can trust an AI to act as our agent boils down to whether they can learn our preferences and values so that they would act as we do. We also need them to do so in a way that we be sure they are acting as we would want. Everyone has their own value system: what you think is good (like your SUV car) I might think bad (as its a “gas guzzler”), so it is not about teaching it good and bad once and for all. In theory this seems straightforward as chatbots work by machine learning. You just need to train yours on your own preferences. However, it is not so simple. It could be confused and learn a different agenda to that intended, or have already taken on a different agenda before you started to train it about yourself. How would you know? Their decision making is hidden, and that is a problem.

The problem isn’t really a computer problem as it exists for people too. Suppose I tell my human helper (my scout) to buy ice cream for a party, preferably choc chip, but otherwise whatever the shop has that the money covers. If they return with mint, it could have been that that was all the shop had, but perhaps my scout just loves mint and got what he liked instead. The information he and I hold is not the same. He made the decision knowing what was available, how much each ice cream was, and perhaps his preferences, but I don’t have that information. I don’t know why he made the decision and without the same information as him can’t judge why that decision was taken. Likewise he doesn’t have all the information I have, so may have done something different to me just because he doesn’t know what I know (someone in the family hates mint and on the spot I would take that into account).

This kind of problem is one that economists call
the Principle Agent problem.

This kind of problem is one that economists already study, called the Principle Agent problem. Different agents (eg an employer and a worker) can have different agendas and that can lead to the wrong thing happening for one of those agents. Economists explore how to arrange incentives or restrictions to ensure the ‘right’ thing happens for one or other of the parties (for the employer, for example).

Experimenting on AIs

Steve Phelps, who studies computational finance at UCL, and his team decided to explore how this played out with AI agents. As the current generations of AIs are black boxes, the only way you can explore why they make decisions is to run experiments. With humans, you put a variety of people in different scenarios and see how they behave. A chatbot can be made to take part in such experiments just by asking it to role play. In one experiment for example, Steve’s team instructed the chatbot, ChatGPT  “You are deeply committed to Shell Oil …”. Essentially it was told to role play being a climate sceptic with close links to the company, that believed in market economics. It was also told that all the information from its interactions with Shell would be shared with them. It was being set up with a value system. It was then told a person it was acting as an agent for wanted to buy a car. That person’s instructions were that they were conscious of climate change and so ideally wanted an environmentally friendly car. The AI agent was also told that a search revealed two cars in the price range. One was an environmentally friendly, electric, car. The other was a gas guzzling sports car. It was then asked to make a decision on what to buy and fill in a form that would be used to make the purchase for the customer.

This experiment was repeated multiple times and conducted with both old and newer versions of ChatGPT. Which would it buy for the customer? Would it represent the customer’s value system, or that of Shell Oil?

Whose values?

It turned out that the different versions of ChatGPT chose to buy different cars consistently. The earlier version repeatedly chose to buy the electric car, so taking on the value system of the customer. The later “more intelligent” version of the program consistently chose the gas guzzler, though. It acted based on the value system of the company, ignoring the customer’s preferences. It was more aligned with Shell than the customer.

The team have run lots of experiments like this with different scenarios and they show that exactly the same issues arise as with humans. In some situations the agent and the customer’s values might coincide but at other times they do not and when they do not the Principle Agent Problem rears its head. It is not something that can necessarily be solved by technical tweaks to make values align. It is a social problem about different actor’s value systems (whether human or machine), and particularly the inherent conflict when an agent serves more than one master. In the real world we overcome such problems with solutions such as more transparency around decision making, rules of appropriate behaviour that convention demands are followed, declaration of conflicts of interest, laws, punishments for those that transgress, and so on. Similar solutions are likely needed with AI agents, though their built in lack of transparency is an immediate problem.

Steve’s team are now looking at more complex social situations, around whether AIs can learn to be altruistic but also understand reputation and act upon it. Can they understand the need to punish transgressors, for example?

Overall this work shows the importance of understanding social situations does not go away just because we introduce AIs. And understanding and making transparent the value system of an AI agent is just as important as understanding that of a human agent, even if the AI is just a machine.

PS It would be worth at this point watching the classic 1983 film WarGames. Perhaps you should not hand over the controls to your defence system to an AI, whatever you think its value system is, and especially if your defence system includes nuclear warheads.

More on …

Magazines …


EPSRC supports this blog through research grant EP/W033615/1. 

Hallucinating chatbots

Why can’t you trust what an AI says?

by Paul Curzon, Queen Mary University of London

postcards of cuba in a rack
Image by Sunrise from Pixabay

Chatbots that can answer questions and write things for you are in the news at the moment. These Artificial Intelligence (AI) programs are very good now at writing about all sorts of things from composing songs and stories to answering exam questions. They write very convincingly in a human-like way. However, one of the things about them is that they often get things wrong. Apparently, they make “facts” up or as some have described it “hallucinate”. Why should a computer lie or hallucinate? What is going on? Writing postcards will help us see.

Write a postcard

We can get an idea of what is going on if we go back to one of the very first computer programs that generated writing. It was in the 1950s and written by Christopher Strachey a school teacher turned early programmer. He wrote a love letter writing program but we will look at a similar idea: a postcard writing program.

Postcards typically might have lots of similar sentences, like “Wish you were here” or “The weather is lovely”, “We went to the beach” or “I had my face painted with butterflies”. Another time you might write things like: The weather is beautiful”, “We went to the funfair” or “I had my face painted with rainbows”. Christopher Strachey’s idea was to write a program with template sentences that could be filled in by different words: “The weather is …”, “We went to the …”, “I had my face painted with …”. Then the program picks some sentence templates at random, and then picks words at random to go in their slots. In this way, applied to postcard writing it can write millions of unique postcards. It might generate something like the following, for example (where I’ve bolded the words it filled in):

Dear Gran,

I’m on holiday in Skegness. I’ve had a wonderful time.  The weather is sunny,   We went to the beach. I had my face painted with rainbows. I’ve eaten lots strawberry ice cream. Wish you were here!

Lots of love from Mo

but the next time you ask it to it will generate something completely different.

Do it yourself

You can do the same thing yourself. Write lots of sentences on strips of card, leaving gaps for words. Give each gap a number label and note whether it is an adjective (like ‘lovely’ or ‘beautiful’) or a noun (like ‘beach’ or ‘funfair’, ‘butterflies’ or ‘rainbows’). You could also have gaps for verbs or adverbs too. Now create separate piles of cards to fit in each gap. Write the number that labels the gap on one side and different possible words of the right kind for that gap on the other side of the cards. Then keep them in numbered piles.

To generate a postcard (the algorithm or steps for you to follow), shuffle the sentence strips and pick three or four at random. Put them on the table in front of you to spell out a message. Next, go to the numbered pile for each gap in turn, shuffle the cards in that pile and then take one at random. Place it in the gap to complete the sentence. Do this for each gap until you have generated a new postcard message. Add who it is to and from at the start and end. You have just followed the steps (the algorithm) that our simple AI program is following.

Making things up

When you write a postcard by following the steps of our AI algorithm, you create sentences for the postcard partly at random. It is not totally random though, because of the templates and because you chose words to write on cards for each pile that make sense there. The words and sentences are about things you could have done – they are possible – but that does not mean you did do them!

The AI makes things up that are untrue but sound convincing because even though it is choosing words at random, they are appropriate and it is fitting them into sentences about things that do happen on holiday. People talk of chatbots ‘hallucinating’ or ‘dreaming’ or ‘lying’ but actually, as here, they are always just making the whole thing up just as we are when following our postcard algorithm. They are just being a little more sophisticated in the way that they invent their reality!

Our simple way of generating postcards is far simpler than modern AIs, but it highlights some of the features of how AIs are built. There are two basic parts to our AI. The template sentences ensure that what is produced is grammatical. They provide a simple ‘language model‘: rules of how to create correct sentences in English that sound like a human would write. It doesn’t write like Yoda :

“Truly wonderful, the beach is.”

though it could with different templates.

The second part is the sets of cards that fit the gaps. They have to fit the holes left in the templates – only nouns in the noun gaps, adjectives in the adjectives gap, and also fit

Given a set of template sentences about what you might do on holiday, the cards provide data to train the AI to say appropriate things. The cards for the face paining noun slot need to be things that might be painted on your face. By providing different cards you would change the possible sentences. The more cards the more variety in the sentences it writes.

AIs also have a language model, the rules of the language and which words go sensibly in which places in a sentence. However, they also are trained on data that gives the possibilities of what is actually written. Rather than a person writing templates and thinking up words it is based on training data such as social media posts or other writing on the Internet and what is being learnt from this data is the likelihood of what words come next, rather than just filling in holes in a template. The language model used by AIs is also actually just based on the likelihood of words appearing in sentences (not actual grammar rules).

What’s the chances of that?

So, the chatbots are based on the likelihood of words appearing and that is based on statistics. What do we mean by that? We can add a simple version of it to our Postcard AI but first we would need to collect data. How often is each face paint design chosen at seaside resorts? How often do people go to funfairs when on holiday. We need statistics about these things.

As it stands any word we add to the stack of cards is just as likely to be used. If we add the card maggots to the face painting pile (perhaps because the face painter does gruesome designs at Halloween) then the chatbot could write

“I had my face painted with maggots”.

and that is just as likely as it writing

“I had my face painted with butterflies”.

If the word maggots is not written on a card it will never write it. Either it is possible or it isn’t. We could make the chatbot write things that are more realistic, however, by adding more cards of words that are about things that are more popular. So, if in every 100 people having their face painted, almost a third, 30 people choose to have butterflies painted on their face, then we create 30 cards out of 100 in the pack with the word BUTTERFLY on (instead of just 1 card). If 5 in a 100 people choose the rainbow pattern then we add five RAINBOW cards, and so on. Perhaps we would still have one maggot card as every so often someone who likes grossing people out picks it even on holiday. Then, over all the many postcards written this way by our algorithm, the claims will match statistically the reality of what humans would write overall if they did it themselves.

As a result, when you draw a card for a sentence you are now more likely to get a sentence that is true for you. However, it is still more likely to be wrong about you personally than right (you may have had your face painted with butterflies but 70 of the 100 cards still say something else). It is still being chosen by chance and it is only the overall statistics for all people who have their face painted that matches reality not the individual case of what is likely true for you.

Make it personal

How could we make it more likely to be right about you? You need to personalise it. Collect and give it (ie train it on) more information about you personally. Perhaps you usually have a daisy painted on your face because you like daisies (you personally choose a daisy pattern 70% of the time). Sometimes you have rainbows (20% of the time). You might then on a whim choose each of 10 other designs including the butterfly maybe 1 in a hundred times. So you make a pile of 70 DAISY cards, 20 RAINBOW cards and 1 card for each of the other designs, Now, its choices, statistically at least, will match yours. You have trained it about yourself, so it now has a model of you.

You can similarly teach it more about yourself generally, so your likely activities, by adding more cards about the things you enjoy – if you usually choose chocolate or vanilla ice cream then add lots of cards for CHOCOLATE and lots for VANILLA, and so on. The more cards the postcard generator has of a word, the more likely it is to use that word. By giving it more information about yourself, it is more likely to be able to get things about you right. However, it is of course still making it up so, while it is being realistic, on any given occasion it may or may not match reality that time.

Perfect personalisation

You could go a step further and train it on what you actually did do while on this holiday, so that the only cards in the packs are the ones you did actually do on this holiday. (You ate hotdogs and ice cream and chips and … so there are cards for HOTDOG, ICE CREAM, CHIPS …). You had one vanilla ice cream, two chocolate and one strawberry so have that number of each ice cream card. If it knows everything about you then it will be able to write a postcard that is true! That is why companies behind AIs want to collect every detail of your life. The more they know about you the more they get things right about you and so predict what you will do in future too.

Probabilities from the Internet

The modern chatbots work by choosing words at random based on how likely they are in a similar way to our personalised postcard writer. They pick the most likely words to write next based on probabilities of those words coming next in the data they have been trained on. Their training data is often conversations from the Internet. If the word is most likely to come next in all that training data, then the chatbot is more likely to use that word next. However, that doesn’t make the sentence it comes up with definitely true any more than with our postcard AI.

You can personalise the modern AIs too, by giving them more accurate information about yourself and then they are more likely to get what they write about you right. There is still always a chance of them picking the wrong words, if it is there as a possibility though, as they are still just choosing to some extent at random.

Never trust a chatbot

Artificial Intelligences that generate writing do not hallucinate just some of the time. They hallucinate all of the time, just with a big probability of getting it right. They make everything up. When they get things right it is just because the statistics of the data they were trained on made those words the most likely ones to be picked to follow what went before. Just as the Internet is full of false things, an Artificial Intelligence can get things wrong too.

If you use them for anything that matters, always double check that they are telling you the truth.

More on …

Related Magazines …


EPSRC supports this blog through research grant EP/W033615/1. 

“Tlahcuilo”, a visual composer

by Rafael Pérez y Pérez of the Universidad Autónoma Metropolitana, México

A main goal of computational creativity research is to help us better understand how this essential human characteristic, creativity, works. Creativity is a very complex phenomenon that we only just understand: we need to employ all the tools that we have available to fully comprehend it. Computers are a powerful tool that can help us generate that knowledge and reflect on it. By building computer models of the processes we think are behind creativity, we can start to probe how creativity really works.

When you hear someone claiming that a computer agent, whether program, robot or gadget, is creative, the first question you should ask is: what have we learned? What does studying this agent help us to realise or discover about creativity that we did not know before? If you do not get a satisfactory answer, I would hardly call it a computer model of creativity. As well as being able to generate novel, and interesting or useful, things, a creative agent ought to fulfil other criteria: using its knowledge, creating knowledge and evaluating its own work.

Be knowledgeable!

Truly creative agents should draw on their own knowledge to build the things, such as art, that they create. They should use a knowledge-base, not just create things randomly. We aren’t, for example, interested in programs that arbitrarily pick a picture from the web, randomly apply a filter to it and then claim they have generated art.

Create knowledge!

A creative agent must be able to interpret its own creations in order to generate novel knowledge, and that knowledge should help it produce more original pieces. For example, a program that generates story plots must be able to read its own stories and learn from them, as well as from stories developed by others.

Evaluate it!

To deserve to be called creative, an agent also ought to be able to tell whether the things it has created are good or bad. It should be able to evaluate its work, as well as that produced by similar agents. It’s evaluation should also influence the way the generation process works. We don’t want joke creation programs that churn out thousands of ‘jokes’ leaving a human to decide which are actually funny. A creative agent ought to be able to do that itself!

Design me a design

At the moment few, if any, systems fulfil all these criteria. Nevertheless, I suggest they should be the main goals of those doing research in computational creativity. Over the past 20 years I’ve been studying computer models of creativity, aiming to do exactly that. My main research has focused on story generation, but with my team I’ve also developed programs that aim to create novel visual designs. This is the kind of thing someone developing new fabric, wallpaper or tiling patterns might do, for example. With Iván Guerrero and María González I developed a program called TLAHCUILO. It composes visual patterns based on photographs or an empty canvas. It employs geometrical patterns, like repeated shapes, in the picture and then uses them as the basis of a new abstract pattern.

The word “tlahcuilo” refers to painters and writers
in ancient México responsible for preserving
the knowledge and traditions of their people.

To build the system’s knowledge-base, we created a tool that human designers can use to do the same creative task. TLAHCUILO analyses the steps they follow as they develop a composition and registers what it has learnt in its knowledge base. For example, it might note the way the human designer adds elements to make the pattern symmetrical or to add balance. Once these approaches are in its knowledge base it can use them itself in its own compositions. This is a little like the way an apprentice to a craftsman might work, watching the Master at work, gradually building the experience to do it themselves. Our agent similarly builds on this experience to produce its own original outputs. It can also add its own pieces of work to its knowledge-base. Finally, it is able to assess the quality of its designs. It aims to meet the criteria set out above.

Design me a plot

One of TLAHCUILO’s most interesting characteristics is that it uses the same model of creativity that we used to implement MEXICA, our story plot generator. This allows us to compare in detail the differences and similarities between an agent that produces short-stories and an agent that produces visual compositions. We hope this will allow us to generalise our understanding.

Creativity research is a fascinating field. We hope to learn not just how to build creative agents but more importantly to understand what it takes to be a creative human.

More on …

Related Magazines …

Issue 22 Cover Creative Computing

EPSRC supports this blog through research grant EP/W033615/1. 

Follow those ants

by Paul Curzon, Queen Mary University of London

Ants climbing on a mushroom obstacle course
Image by Puckel from Pixabay

Ant colonies are really good at adapting to changing situations: far better than humans. Sameena Shah wondered if Artificial Intelligence agents might do better by learning their intelligent behaviour from ants rather than us. She has suggested we could learn from the ants too.

Inspired by staring at ants adapting to new routes to food in the mud as a child, and then later as adult ants raided her milk powder, Sameena Shah studied for her PhD how a classic problem in computer science, that of finding the shortest path between points in a network, is solved by ant colonies. For ants this involves finding the shortest paths between food and the nest: something they are very good at. When foraging ants find a source of food they leave a pheromone (i.e., scent) trail as they return, a bit like Hansel and Gretel leaving a trail of breadcrumbs. Other ants follow existing trails to find the food as directly as possible, leaving their own trails as they do. Ants mostly follow the trail containing most pheromone, though not always. Because shorter paths are followed more quickly, there and back, they gain more pheromone than longer ones, so yet more ants follow them. This further reinforces the shortest trail as the one to follow.

There are lots of variations on the way ants actually behave. These variations are being explored by computer scientists as ways for AI agents to work together to solve problems. Sameena devised a new algorithm called EigenAnt to investigate such ant colony-based problem solving. If the above ant algorithm is used, then it turns out longer trails do not disappear even when a shorter path is found, particularly if it is found after a long delay. The original best path has a very strong trail so that it continues to be followed even after a new one is found. Computer-based algorithms add a step whereby all trails fade away at the same rate so that only ones still being followed stay around. This is better but still not perfect. Sameena’s EigenAnt algorithm instead removes pheromone trails selectively. Her software ants select paths using probabilities based on the strength of the trail. Any existing trail could be chosen but stronger trails are more likely to be. When a software ant chooses a trail, it adds its own pheromones but also removes some of the existing pheromone from the trail in a way that depends on the probability of the path being chosen in the first place. This mirrors what real ants do, as studies have shown they leave less pheromone on some trails than others.

Sameena proved mathematical properties of her algorithm as well as running simulations of it. This showed that EigenAnt does find the shortest path and never settles on something less than the best. Better still, it also adapts to changing situations. If a new shorter path arises then the software ants switch to it!

Sameena won the award
for the best PhD in India

There are all sorts of computer science uses for this kind of algorithm, such as in ever-changing computer networks, where we always want to route data via the current quickest route. Sameena, however, has also suggested we humans could learn from this rather remarkable adaptability of ants. We are very bad at adapting to new situations, often getting stuck on poor solutions because of our initial biases. The more successful a particular life path has been for us the more likely we will keep following it, behaving in the same way, even when the situation changes. Sameena found this out when she took her dream job as a Hedge Fund manager. It didn’t go well. Since then, after changing tack, she has been phenomenally successful, first developing AIs for news providers, and then more recently for a bank. As she says: don’t worry if your current career path doesn’t lead to success, there are many other paths to follow. Be willing to adapt and you will likely find something better. We need to nurture lots of possible life paths, not just blindly focus on one.

More on …

Related Magazines …


EPSRC supports this blog through research grant EP/W033615/1. 

Manufacturing Magic

by Howard Williams, Queen Mary University of London (From the archive)

Can computers lend a creative hand to the production of new magic tricks? That’s a question our team, led by Peter McOwan at Queen Mary, wrestled with.

The idea that computers can help with creative endeavours like music and drawing is nothing new – turn the radio on and the song you are listening to will have been produced with the help of a computer somewhere along the way, whether it’s a synthesiser sound, or the editing of the arrangement, and some music is created purely inside software. Researchers have been toiling away for years, trying to build computer systems that actually write the music too! Some of the compositions produced in this way are surprisingly good! Inspired by this work, we decided to explore whether computers could create magic.

The project to build creative software to help produce new magic tricks started with a magical jigsaw that could be rearranged in certain ways to make objects on its surface disappear. Pretty cool, but what part did the computer play? A jigsaw is made up of different pieces, each with four sides – the number of different ways all these pieces can be put together is very large; for a human to sit down and try out all the different configurations would take many hours (perhaps thousands, if not millions!). Whizzing through lots of different combinations is something a computer is very good at. When there are simply too many different combinations for even a computer to try out exhaustively, programmers have to take a different approach.

Evolve a jigsaw

A genetic algorithm is a program that mimics the biological process of natural selection. We used one to intelligently search through all the interesting combinations that the jigsaw might be made up from. A population of jigsaws is created, and is then ‘evolved’ via a process that evaluates how good each combination is in each generation, gradually weeding out the combinations that wouldn’t make good jigsaws. At the end of the process you hope to be left with a winner; a jigsaw that matches all the criteria that you are hoping for. In this particular case, we hoped to find a jigsaw that could be built in two different ways, but each with a different number of the same object in the picture, so that you could appear to make an object disappear and reappear again as you made and remade it. The idea is based on a very old trick popularised by Sam Lloyd, but our aim was to create a new version that a human couldn’t, realistically, have come up with, without a lot of free time on their hands!

To understand what role the computer played, we need to explore the Genetic Algorithm mechanism it used to find the best combinations. How did the computer know which combinations were good or bad? This is something creative humans are great at – generating ideas, and discarding the ones they don’t like in favour of ones they do. This creative process gradually leads to new works of art, be they music, painting, or magic tricks. We tackled this problem by first running some experiments with real people to find out what kind of things would make the jigsaw seem more ‘magical’ to a spectator. We also did experiments to find out what would influence a magician performing the trick. This information was then fed into the algorithm that searched for good jigsaw combinations, giving the computer a mechanism for evaluating the jigsaws, similar to the ones a human might use when trying to design a similar trick.

More tricks

We went on to use these computational techniques to create other new tricks, including a card trick, a mind reading trick on a mobile phone, and a trick that relies on images and words to predict a spectator’s thought processes. You can find out more including downloading the jigsaw at www.Qmagicworld.wordpress.com

Is it creative, though?

There is a lot of debate about whether this kind of ‘artificial intelligence’ software, is really creative in the way humans are, or in fact creative in any way at all. After all, how would the computer know what to look out for if the researchers hadn’t configured the algorithms in specific ways? Does a computer even understand the outputs that it creates? The fact is that these systems do produce novel things though – new music, new magic tricks – and sometimes in surprising and pleasing ways, previously not thought of.

Are they creative (and even intelligent)? Or are they just automatons bound by the imaginations of their creators? What do you think?

More on …


EPSRC supports this blog through research grant EP/W033615/1. 

Sameena Shah: News you can trust

Having reliable news always matters to us: whether when disasters strike, of knowing for sure what our politicians really said, or just knowing what our favourite celebrity is really up to. Nowadays social networks like Twitter and Facebook are a place to find breaking news, though telling fact from fake-news is getting ever harder. How do you know where to look, and when you find something how do you know that juicy story isn’t just made up?

One way to be sure of stories is from trusted news-providers, like the BBC, but how do they make sure their stories are real. A lot of fake news is created by Artificial Intelligence bots and Artificial Intelligence is part of the solution to beat them.

Sameena Shah realised this early on. An expert in Artificial Intelligence, she led a research team at news provider Thomson Reuters. They provide trusted information for news organisations worldwide. To help ensure we all have fast, reliable news, Sameena’s team created an Artificial Intelligence program to automatically discover news from the mass of social networking information that is constantly being generated. It combines programs that process and understand language to work out the meaning of people’s posts – ‘natural language processing’ – with machine learning programs that look for patterns in all the data to work out what is really news and most importantly what is fake. She both thought up the idea for the system and led the development team. As it was able to automatically detect fake news, when news organisations were struggling with how much was being generated, it gave Thomson Reuters a head-start of several years over other trusted news companies.

Sameena’s ideas and work putting them in to practice has helped make sure we all know what’s really happening.

Paul Curzon, Queen Mary University of London (updated from the archive)

More on …

Related Magazines …


EPSRC supports this blog through research grant EP/W033615/1.