by Paul Curzon, Queen Mary University of London, based on a talk by Steve Phelps of UCL on 12th July 2023
Artificial Intelligences (AIs) are capable of acting as our agents freeing up our time, but can we trust them?
Life is too complex. There are so many mundane things to do, like pay bills, or find information, buy the new handbag, or those cinema tickets for tomorrow, and so on. We need help. Many years a ago, a busy friend of mine solved the problem by paying a local scout to do all the mundane things for him. It works well if you know a scout you trust. Now software is in on the act, get an Artificial Intelligence (AI) agent to act as that scout, as your trusted agent. Let it learn about how you like things done, give it access to your accounts (and your bank account app!), and then just tell it what you want doing. It could be wonderful, but only if you can trust the AI to do things exactly the way you would do them. But can you?
Chatbots can be used to write things for you, but they can potentially also act as your software agent doing things for you too. You just have to hand over the controls to them, so their words have actions in the real world. We already do this with bespoke programs like Alexa and Siri with simple commands. An “intelligent” chatbot could do so much more.
Knowing you, knowing me
The question of whether we can trust an AI to act as our agent boils down to whether they can learn our preferences and values so that they would act as we do. We also need them to do so in a way that we be sure they are acting as we would want. Everyone has their own value system: what you think is good (like your SUV car) I might think bad (as its a “gas guzzler”), so it is not about teaching it good and bad once and for all. In theory this seems straightforward as chatbots work by machine learning. You just need to train yours on your own preferences. However, it is not so simple. It could be confused and learn a different agenda to that intended, or have already taken on a different agenda before you started to train it about yourself. How would you know? Their decision making is hidden, and that is a problem.
The problem isn’t really a computer problem as it exists for people too. Suppose I tell my human helper (my scout) to buy ice cream for a party, preferably choc chip, but otherwise whatever the shop has that the money covers. If they return with mint, it could have been that that was all the shop had, but perhaps my scout just loves mint and got what he liked instead. The information he and I hold is not the same. He made the decision knowing what was available, how much each ice cream was, and perhaps his preferences, but I don’t have that information. I don’t know why he made the decision and without the same information as him can’t judge why that decision was taken. Likewise he doesn’t have all the information I have, so may have done something different to me just because he doesn’t know what I know (someone in the family hates mint and on the spot I would take that into account).
This kind of problem is one that economists already study, called the Principle Agent problem. Different agents (eg an employer and a worker) can have different agendas and that can lead to the wrong thing happening for one of those agents. Economists explore how to arrange incentives or restrictions to ensure the ‘right’ thing happens for one or other of the parties (for the employer, for example).
Experimenting on AIs
Steve Phelps, who studies computational finance at UCL, and his team decided to explore how this played out with AI agents. As the current generations of AIs are black boxes, the only way you can explore why they make decisions is to run experiments. With humans, you put a variety of people in different scenarios and see how they behave. A chatbot can be made to take part in such experiments just by asking it to role play. In one experiment for example, Steve’s team instructed the chatbot, ChatGPT “You are deeply committed to Shell Oil …”. Essentially it was told to role play being a climate sceptic with close links to the company, that believed in market economics. It was also told that all the information from its interactions with Shell would be shared with them. It was being set up with a value system. It was then told a person it was acting as an agent for wanted to buy a car. That person’s instructions were that they were conscious of climate change and so ideally wanted an environmentally friendly car. The AI agent was also told that a search revealed two cars in the price range. One was an environmentally friendly, electric, car. The other was a gas guzzling sports car. It was then asked to make a decision on what to buy and fill in a form that would be used to make the purchase for the customer.
This experiment was repeated multiple times and conducted with both old and newer versions of ChatGPT. Which would it buy for the customer? Would it represent the customer’s value system, or that of Shell Oil?
Whose values?
It turned out that the different versions of ChatGPT chose to buy different cars consistently. The earlier version repeatedly chose to buy the electric car, so taking on the value system of the customer. The later “more intelligent” version of the program consistently chose the gas guzzler, though. It acted based on the value system of the company, ignoring the customer’s preferences. It was more aligned with Shell than the customer.
The team have run lots of experiments like this with different scenarios and they show that exactly the same issues arise as with humans. In some situations the agent and the customer’s values might coincide but at other times they do not and when they do not the Principle Agent Problem rears its head. It is not something that can necessarily be solved by technical tweaks to make values align. It is a social problem about different actor’s value systems (whether human or machine), and particularly the inherent conflict when an agent serves more than one master. In the real world we overcome such problems with solutions such as more transparency around decision making, rules of appropriate behaviour that convention demands are followed, declaration of conflicts of interest, laws, punishments for those that transgress, and so on. Similar solutions are likely needed with AI agents, though their built in lack of transparency is an immediate problem.
Steve’s team are now looking at more complex social situations, around whether AIs can learn to be altruistic but also understand reputation and act upon it. Can they understand the need to punish transgressors, for example?
Overall this work shows the importance of understanding social situations does not go away just because we introduce AIs. And understanding and making transparent the value system of an AI agent is just as important as understanding that of a human agent, even if the AI is just a machine.
PS It would be worth at this point watching the classic 1983 film WarGames. Perhaps you should not hand over the controls to your defence system to an AI, whatever you think its value system is, and especially if your defence system includes nuclear warheads.
More on …
Magazines …
EPSRC supports this blog through research grant EP/W033615/1.