As Easy As A Bee Sees

A bee sitting on the leaves of a blossom tree in Blackheath
Bee on blossom by Jodiepedia, Public Domain Dedication (CC0) via Flickr.

If it weren’t for the bees we would be in trouble. In the worst case, life on Earth could go the way of Mars. No plants, no animals, no life. Bees are the main way that flowers get pollinated. As the bees sup the nectar they carry pollen from flower to flower, allowing new generations of flowers to grow. But the way a flower looks to our eyes isn’t the same way a bee sees it. For example, bee vision works into the ultraviolet part of the spectrum and under the correct lighting in a laboratory the wonderful, normally invisible, patterns that bees can see are revealed. Biologists all over the world have been collecting information about the sorts of patterns that particular flowers display. This display is called a spectral profile, and Samia Faruq, a computer science undergraduate at Queen Mary, University of London has done her bit to help these scientists peer into the world of the bees.

Her project involved creating a massive online database containing worldwide spectral profile information, so scientists can search this information easily. They can also combine information to help discover new facts using a method called clustering, where the computer pulls together all the data with similar properties.

Samia enjoyed the project: “I met and worked with amazing biologists during the project. It was great to find out what they needed and to be able to create it for them. I got the chance to collaborate and publish material together with them too. To know it will be used in their research is also very rewarding.”

Peter W McOwan, Queen Mary University of London


More on

Related Magazines …

A hoverfly on a leaf

Ancient Egyptian Binary

A segment of the Rhind Mathematical Papyrus,
A segment of the Rhind Mathematical Papyrus, unknown (c. 2000 B.C), Public domain, via Wikimedia Commons

What are the origins of binary? It is the way of representing numbers (and all other data) that underpins all digital computers which might suggest it is a very modern idea. Binary may be linked to modern technology, but it goes back a long way. Leibniz designed machines based on it hundreds of years ago. he was inspired by its use in I, Ching from thousands of years ago. Even the Ancient Egyptians used a form of binary around 4000 years ago.

A papyrus called the Rhind Mathematical Papyrus found near Luxor is from around 1550 BC makes use of binary. It is actually a copy of a much older, long lost document from about 4000 years ago. It shows how to solve a variety of mathematical problems including through arithmetic, algebra and geometry. It does not introduce binary explicitly but does give a way to do multiplication that uses a binary representation and is the basis of binary multiplication.

Any number can be made up by adding together powers of 2 (ie adding some combination of 1, 2, 4, 8, 16, …). So for example, 6 is just 2+4; 7 is 1+2+4; 11 is 1+2+8; 13 is 1+4+8, and so on. That mathematical fact is the basis for the binary representation of numbers. It means that any number can be represented as binary because binary involves replacing a number by 1s and 0s to indicate which powers of 2 to include in the addition. 13 is 1101 in binary. Each column in the binary number stands for a power of 2.

8 4 2 1
---------
1 1 0 1 = 13 because
(8x1) + (4x1) + (2x0) + (1x1)
= 8 + 4 + 1
= 13

The first 1 in the binary says DO include 8 in the addition, the second 1 says DO include 4 in the addition, the 0 says DO NOT include 2 and the final 1 says DO include 1 in the addition, giving 8+ 4 + 1.

The Egyptians used this idea as the basis of an algorithm to make multiplication easier.

To multiply, say, 13 by 123, you note that 13 is 8 + 4 + 1 (1101 in binary), so 123 x 13 = 123 x (8 + 4 + 1). You therefore do the following series of multiplications, adding the results:

1 x 123 =    123
4 x 123 = 492
8 x 123 = 984
+ -------
1599

The Ancient Egyptians were effectively converting one of the two numbers being multiplied to binary to do the multiplication. This way means you do not need to learn all the different times tables as we all do at school. Hang on though doesn’t it mean you still have to do lots of hardish multiplications like 8 x 123? In fact, all you need to be able to do is double numbers, so know your 2 times table! Why? Because each row can be calculated by doubling the previous number, if you work out all the rows rather than miss out the ones not needed in the final addition. So doing the above again but including 2 x 123 but writing it out of the way so we don’t add it in:

1 x 123 =    123
2 x 123 = 246 (above answer x 2)
4 x 123 = 492 (above answer x 2)
8 x 123 = 984 (above answer x 2)
-----
1599

They used a similar algorithm to do division too, that involved multiplying one number by all the powers of 2 in this way. Perhaps you can work it out.

Doubling in binary is actually very easy, you just shift the number one place, adding a 0 on the end (as we do multiplying by 10 in decimal), which is why this trick of turning all multiplication into doubling is a good thing to have the ALU of a computer do to multiply!

The Ancient Egyptians may not have used binary to explicitly write numbers, and missed the trick of turning both numbers into binary to make doubling easy, but they did use binary and converted numbers to it to make arithmetic easier. That is why if you were an Ancient Egyptian administrator, having a copy of the Rhind Mathematical Papyrus would have helped you pass your exams and then do the job.

More on …

Subscribe to be notified whenever we publish a new post to the CS4FN blog.


When an app becomes part of prayer

How do you mix religion and technology? Riasat Islam a Computer Science lecturer at Queen Mary University of London tells us about his research as part of a team investigating how technology can best support faith.

Around one in four people in the world are Muslim. That is about two billion people and many now use mobile apps as part of everyday religious life. These apps can show prayer times, provide Qur’an reading, list dates for fasting, suggest supplications, or help find the Qibla: the direction of the Kaaba in Makkah, which Muslims face during prayer.

This may sound like a small corner of the app world, but it is not. Some Islamic lifestyle apps have reached tens of millions of users. Muslim Pro, one of the best-known examples, reports more than 190 million downloads worldwide. Its parent company, Bitsmedia, has also raised US$20 million in an early round of funding. So Islamic apps are not tiny side projects. They are part of a large digital ecosystem used by millions of people, but they can still go unnoticed in mainstream technology research.

That is what made us interested. Our research asked a simple question:

How should technology be designed when it supports something as personal as faith?

We reviewed 11 popular Islamic lifestyle apps and interviewed Muslim app users about their experiences. We didn’t only look at what features the apps offered. We wanted to understand how those features supported religious practice, learning, motivation, and connection.

Many apps were good at providing information. Prayer times, Qur’an text, Qibla tools, supplications, Islamic dates, and reminders were common. These can be genuinely useful, especially when someone is travelling, studying, working, or living in a place where prayer times and mosque access are not part of everyday public life.

Is information enough?

But information alone is not always enough. A reminder can tell someone that it is time to pray. A tracker can record Qur’an reading or fasting days. A calendar can list important dates. These are useful features, but they do not automatically help someone understand, reflect, or grow.

That is where Human-Computer Interaction, or HCI, becomes important. HCI studies how people interact with technology. It asks whether technology fits into people’s lives, supports their goals, and respects what matters to them. For Islamic lifestyle apps, and for the growing area of Islamic Computing, this matters because the technology is entering a sensitive space: faith, worship, identity, learning, habit, and community.

Reminders

One issue is reminders. A prayer reminder can be helpful at the right moment. But if it becomes just another phone alert, it may fade into the background. If it is too forceful, it may feel uncomfortable. Good design means thinking carefully about timing, tone, and context.

Tracking

Another issue is tracking. Some apps let users track prayers, Qur’an reading, or fasting. This can support consistency, but it can also reduce spiritual practice to streaks, badges, or numbers. Worship is not the same as a fitness challenge. A better design might support reflection: helping users set personal goals, continue learning, or return gently after missing a routine.

Community also matters

Some apps let users share Islamic quotes or images. That can be useful, but it is not the same as learning with others or asking questions in a trusted space. Many Muslims learn religion through teachers, family, mosques, study circles, and scholars. Apps could do more to support trusted learning and connection, while also handling privacy and misinformation carefully.

Thinking more widely

The wider point is not only about Islamic apps. Computer scientists now design technology for health, education, wellbeing, accessibility, relationships, and faith. In these areas, success is not just whether the software works. The deeper question is whether it supports people well.

  • Does it respect the user’s values?
  • Does it help them understand?
  • Does it support meaningful progress?
  • Does it connect them to trustworthy help?
  • Does it fit into real life?

A prayer app can tell you the time. A better-designed Islamic lifestyle app might help you practise, learn, reflect, and connect, without getting in the way of the spiritual life it is trying to support.

Riasat Islam, Queen Mary University of London

More on

Getting Technical

  • Read Riasat’s team’s journal paper
    • Kabir, M., Kabir, M. R. and Islam, R. (2025). Islamic Lifestyle Applications: Meeting the Spiritual Needs of Modern Muslims. International Journal of Human–Computer Interaction. DOI: 10.1080/10447318.2025.2595545. (Taylor & Francis Online)
  • How the Global Religious Landscape Changed From 2010 to 2020 [EXTERNAL]
    • Hackett, C., Stonawski, M., Tong, Y., Kramer, S., Shi, A. F. and Fahmy, D. (2025). How the Global Religious Landscape Changed From 2010 to 2020. Pew Research Center. DOI: 10.58094/fj71-ny11. (Pew Research Center)

Subscribe to be notified whenever we publish a new post to the CS4FN blog.


Your magnetic personality

Since their discovery over a century ago, X-rays have become invaluable in the medical world, allowing doctors to see inside our bodies. Whilst the basic technology of taking medical X-rays is unchanged – essentially taking a photograph of the shadow left when X-rays are shone through the body – X-rays have entered the computer age. From digital X-rays that work like digital cameras dispensing with the need for film to tomography that allows 3D X-ray pictures of the body to be created, computer technology has revolutionised the way we doctors peer into our bodies.

One of the most exciting ways to use computers to look into our bodies is called magnetic resonance imaging (MRI). While X-rays are very useful they only work when dense materials in the body absorb the X-rays energy (this loss of energy as they pass through the body is called attenuation). But lots of interesting stuff in our innards isn’t dense or filled with attenuating materials. We are after all flesh and blood, and those don’t show up on X-rays. Enter the magnet, or to be precise the proton. Water makes up the majority of structures in our body, and water has an interesting property. In a high magnetic field the protons in the hydrogen act like atomic magnets, and line up with (align with) the applied external magnetic field, like soldiers all standing to attention.

Wobbly Protons

We can then apply a radio wave; this radio wave has a magnetic element to it (that’s why it’s called an electromagnetic wave), and if we apply the radio wave with just the right frequency those aligned protons start to wobble. It’s like pushing a kid on a swing. If you keep pushing at the right time during the swing they will go higher and higher. It’s called resonance. Similarly if we hit the protons with the correct resonant frequency they start to spin round and round and round, and fall over. Take away the radio wave and the protons start to align themselves with the magnetic field again, like soldiers trying to regain their dignity, but as they do they give off energy in the form of a radio wave that we can pick up and measure.

Back to attention

Now rather than put the same magnetic field over all the protons we put a slope on the field, a gradient, with different magnetic fields in different places. Each proton then has its own little magnetic environment, and so its own resonant frequency. Now since we know what the magnetic field is at different places (as we put the slope on) we can fire the right radio wave frequency to unbalance only protons in certain places. After the pulse, as these protons try to realign, the strength of the signal they give out is in proportion to the number of protons in that area. In effect we can measure the amount of water in a particular area.

Magnets in the body

So how does this let us image inside bodies? In an MRI (Magnetic Resonance Image) scanner we have a big external magnetic field, and on top of this we add a smaller magnetic gradient across the body. That means each location across the body has a different local magnetic field. We then apply a radio pulse, at a range of resonant frequencies, and for each pulse we measure how much signal we get back. From this we can reconstruct an image of the water (proton) distribution across the body. We can also use tomographic techniques: by simply rotating the magnetic gradient around the body, we produce a 3D slice. The water content is different in different body tissues, so the scan shows us all the soft interesting stuff, where it is and what shape it’s in. Better still the way that the protons realign after they have been toppled is dependant on the chemicals round them, so we can even get some data on that too by looking at the speed at which they realign and give off their energy.

Blood in the brain

Blood contains a great deal of water, and blood that is oxygenated (contains oxygen) has different magnetic properties to blood that is deoxygenated. So by looking at where blood is giving up its oxygen we can see for example which parts of the brain are active when you look at colours, or listen to sounds. This is called functional imaging or fMRI, and through the wonders of computers and new algorithms we can now see not just structure inside the body, but also how it is working.

Paul Curzon, Queen Mary University of London


Related Magazine …

Silly Sequences and Data Compression

Here is a fun challenge for you: can you find the rule for and next term of this sequence?

1
11
21
1211
111221
312211

This is not just some maths exercise – it also gives a clue to a clever way of compressing data! Have you figured it out? What if I told you this is called the “look-and-say” sequence? Say each row out loud in turn…

Here’s the answer: the next term in the sequence is 13112221. Each term describes the previous one. The last row, 312211, read aloud becomes “one three, one one, two twos, two ones”, which when written down with numbers is 13112221!

The sequence doesn’t necessarily need to start with 1: try picking a different starting number (a seed) and see where you end up!

If you’re perplexed with this sequence, you’re in good company. The look-and-say sequence, while not invented by him, was analysed in-depth by the mathematician John Conway after it was introduced to him at a party: a party where maths is discussed openly – sign me up! 

Conway, is famous for his Game of Life [EXTERNAL]: a cellular automaton which uses a few simple rules to create seemingly living patterns. We won’t go into detail on this here, but interestingly though, if you look more at Conway’s Game of Life, you might start to see some similar features – compare the look-and-say sequence starting 22… and a 2×2 block on the Life grid.

In his investigations, Conway discovered some very interesting properties of the look-and-say sequence. One of these is the interesting fact that no digits other than 1, 2, and 3 will appear in the sequence (unless the seed number contains such a digit or a run of more than three of the same digit). 

Take the sequence above starting 1, 11, etc. – this will never contain a digit 4 or above. Can you figure out why?

We can understand this by going backwards – what would we need to get a digit 4 in one of the terms in the sequence? Well, we’d need to have four of the same digit in a row (e.g. 1111). But this is impossible, because the number which would generate this would be 11 (read as  “one one followed by  one one” so written in the next round as 1111 as we want). However 11 if generated is actually written in the next term as 21 (“two ones”) not 1111.

Compress it?

You’re perhaps wondering how this links into computer science? Imagine a black-and-white image stored with 0s and 1s where 0 is a white pixel and 1 is a black pixel: 0000111111001111… Storing this as a long sequence might seem a bit inefficient though, so let’s try applying a look-and-say methodology to this.

We would end up with 40 (“four zeros”), 61, 20, 41 or written in full 40612041. Using just two digits to store data is especially convenient, as since we know that the 1s and 0s will always alternate once compressed, we can even remove them and just store the count of each: 4624…, a much shorter sequence to store compared to our original.

This style of compression is called Run-Length Encoding and is especially useful when you have files with long sequences of identical data (like the black-and-white binary image in our example).

Of course, it’s not universally efficient. As we’ve seen with the ever-growing look-and-say sequences, if the data doesn’t contain many repeating sequences, the file size may even get bigger: sometimes known as negative compression.

This is yet another example of how mathematics and computer science can go hand-in-hand to solve real problems! Keep looking out for interesting patterns – you never know what you might discover!

Daniel Gill, Queen Mary University of London

More on…

Subscribe to be notified whenever we publish a new post to the CS4FN blog.


The glove that controls your cords…

Ariana Grande, has added something new to her sell out stadium tours. She is controlling her vocals using gloves. Yep, gloves! To add reverb to her voice, Ariana pinches her thumb and forefinger. She changes background sounds by a sweep of the hand.

Imogen Heap, a Grammy award winning UK recording artist with a passion for technology, is behind the gesture control gloves that Florida born pop diva Ariana is wowing audiences across the world with.

Using technology to augment and change vocals is not new, sound engineers with banks of buttons and sliders have manipulated and improved performances for years, but now the artist can do it for themselves, using wearable tech with with bluetooth to control their sounds live.

So puff out your chest, robin and hear the humans notch up the sound gymnastics, we are not just limited to our vocal cords. Have a go at making wearables that control sound yourself. Maybe try Sonic Pi with a BBC micro:bit and search for the BBC’s ‘Strictly micro:bit live lesson’ for more on making your own wearable tech.

Jane Waite, QMUL / Raspberry Pi

More on …


Related Magazine …


Subscribe to be notified whenever we publish a new post to the CS4FN blog.


How do you sleep? (Like a parrot or a tortoise?) #Fitbit

Google’s Fitbit is a smart wristwatch which doesn’t just tell you the time but can also monitor your movements and your heart beat. A particular time of day when your heart beat slows down and you move much less is at night when you’re fast asleep in bed. 

Not everyone sleeps well though. Some people struggle to get to sleep and then wake up often during the night and so they feel tired during the day. The FitBit’s “Sleep Profiles” is an AI-supported sleep tracking tool (available to Premium subscribers) that may be able to help them. If the sleeper regularly wears their watch in bed it can monitor their sleep and build up a picture of how long it takes them to fall asleep, how often they wake up and offer some suggestions on how to get a better night’s rest. 

So far Google has analysed 22 billion hours of sleep data from Fitbit users (who all agree to share their information so that they and everyone else can benefit from that shared knowledge). They used unsupervised machine learning to find out more about the data. This method gives an artificial intelligence lots of information but doesn’t tell it what to do with it. Instead they asked the AI to cluster groups of data together for the scientists to analyse and interpret. The result was six clusters of data showing the most common different ways that people sleep. 

To make it easy for users to understand what the data meant, and how closely their own sleep pattern matched one of the clusters, Fitbit named each cluster after an animal. They took a bit of care over selecting animals to use as they wanted people to have more positive associations (no one wants to be called a sloth for example!) and came up with bear 🐻 tortoise 🐢dolphin 🐬giraffe 🦒parrot 🦜and hedgehog 🦔. People’s ‘sleep animals’ don’t stay the same though (just like our sleep) and you might be a dolphin one month and a tortoise the next. Tortoise-sleepers spend longer in bed but also take longer to fall asleep, and dolphin-sleepers sleep very lightly and tend to spend more time awake in bed.

Elena Perez, one of the product managers for Fitbit, said that parents of little children had told her that they’d seen the icon of the sleeping animal appear on their parents’ watch and knew that it was time to go to bed. Sweet dreams…

Did you know?

Dolphins and many birds use ‘unihemispheric sleep’ which means that one half of their brain (like humans their brains are also divided into two hemispheres) falls asleep first and the other stays awake. Then the hemispheres swap over!

Jo Brodie, Queen Mary University of London


More on…


Related Magazine …

Subscribe to be notified whenever we publish a new post to the CS4FN blog.


Faster fiber

Polina Bayvel, Professor of Optical Communications, at UCL, and her team have just set a new speed record for sending data over real-world optical cable. They managed to send about 10 times more data than the best commercial services. Remarkable this was without changing the cable or other core infrastructure: the record was set over existing fiber running through a city centre with all the interference that causes, and with all the grime, wear and tear that comes with real use.

What was the secret? Commercial fibre optics typically use wavelengths of 850, 1300 and 1550 nanometers. That is infrared light, which some animals can see, including some snakes, fish and insects (and vampire bats). However, we need special cameras that convert it to the visible range before we can see infrared. What we can do though is create lasers that send pulses of infrared at these wavelengths. We can also design hardware that turns infrared pulses back into data. Polina’s team developed special hardware that could send data over a much larger range of frequencies of light than the existing commercial systems. It used a range of wavelengths of light between 1264 and 1618 nanometers. By mixing these higher wavelengths together they could send more data at the same time – but that is only useful if their hardware could extract the separate signals from the mixed up mess of them, back at the end. The test showed that their hardware could do that in the real-world conditions of sending data from their lab in central London out to a data centre at Canary Wharf over the existing cables, and back, so around 10 miles in total.

It means that in future we will be able to send far more data over existing cable networks with no need to replace the cables, so avoiding the extra time and costs (never mind the road works). The speed of 450 terabits per second is enough to stream 50 million films at the same time. No one actually needs to do that of course. However, our technologies do seem to voraciously use up whatever capacity we create, and with the ever-increasing use of AI tools and their need for masses of data, it may well be this ability to send more data is needed sooner than we might think.

Paul Curzon, Queen Mary University of London

More on …

Subscribe to be notified whenever we publish a new post to the CS4FN blog.


The Hidden Code in Toy Adverts

A boy in blue and girl in pink playing on a beach
Image by Ben Kerckx from Pixabay

The music in a toy commercial isn’t just background noise. It tells you who the advert is for, and a machine learning model can hear it (even when you barely notice the difference). Luca Marinelli tells us more.

Next time you’re watching TV, try muting the adverts and then turning the sound back on. You’ll probably notice something odd. The music in adverts for dolls and playsets sounds completely different from the music in adverts for action figures and toy cars. One sounds smooth and tuneful. The other sounds loud and chaotic. But here’s the question: is that just your imagination, or is the difference real and measurable? For my PhD research at Queen Mary University of London I decided to find out using machine learning.

I collected over 600 toy commercials from a UK retailer’s YouTube channel, split into three groups: ads aimed at girls, ads aimed at boys, and ads aimed at mixed audiences. Then I fed the soundtracks into a computer program and had it extract dozens of measurements from each one. Not “does this sound nice?” (computers can’t answer that) but more precise numerical values like “how rough does the sound spectrum look”, “how regular is the beat” or “how clearly does this audio sit in a musical key”. Think of it as turning every piece of music into a long list of numbers that each describe a property of it.

Then I trained a type of machine learning model called a classifier, to look at those numbers and predict: is this intended as a girls’ ad, a boys’ ad, or a mixed one? The classifier got it right a remarkable 91% of the time when comparing girls-only and boys-only ads. That’s not luck. That’s a genuine, detectable pattern hidden in the sound. But which measurements were actually doing the work? This is where the research gets interesting, and where a technique called SHAP (Shapley Additive exPlanations) comes in. SHAP is a way of asking a machine learning model to explain its own decisions. Instead of just getting a yes/no answer, you can ask: “which features pushed you towards saying this was a girls’ ad, and which ones pushed you the other way?” It’s a bit like asking a judge not just for a verdict, but for their full reasoning.

What SHAP revealed was striking. Ads targeting girls consistently had higher harmonicity, meaning the sounds fit together into clear, pleasant musical patterns, and more rhythmic regularity, meaning the beat was steady and predictable. Their audio spectrum (a kind of fingerprint of all the frequencies present) was also broader and smoother. Boys’ ads, by contrast, scored higher on spectral roughness (sounds that are abrasive) and spectral entropy (a measure of how chaotic or unpredictable the sound is). They were also simply louder. In plain terms: girls’ ads sound harmonious and organised. Boys’ ads sound noisy, aggressive, and jagged. And a machine learning model can tell the difference with 91% accuracy just from the audio alone, without seeing a single frame of video. These patterns almost certainly aren’t accidental. Marketers are making deliberate choices about music to signal who a product is “for”. The sound itself carries a hidden message.

We showed how AI can be used to hold up a mirror to human behaviour. When we use explainable AI we can spot patterns in the world that are so familiar we’ve stopped noticing them. The music in a toy advert might seem trivial, but if an algorithm can reliably predict the intended audience just from the soundtrack, that tells us something important: gender stereotypes aren’t just visible, they’re audible too.

Luca Marinelli, Queen Mary University of London

More on …

Getting Technical…

Subscribe to be notified whenever we publish a new post to the CS4FN blog.


Humanity’s Last Exam

Generative Artificial Intelligences (GenAI) can now pass exams we set for humans and even do better than many humans. They can do that even without being able to think in a way a human does, and certainly without being conscious. They are learning to reason and are combining that with having hoovered up all the knowledge we have generated and recorded whether on the web or elsewhere. In effect, they use it to predict what comes next. In an exam what comes next after a question is the answer, so that is what they generate. But how good are they at doing that, really? As good as a good school student? As good as a university student? A PhD student? A Professor? Better than any human? Is there any question we could come up with, as examiners representing the human race, that a GenAI couldn’t answer? The SafeAI Benchmark Competition “Humanity’s Last Exam” is an attempt to find out.

Computer systems including AI-based ones, are typically evaluated based on benchmark questions that assess their intelligence and performance. They are the equivalent of big standardised exams. However, as AI models have rapidly advanced, existing benchmarks have become too easy. The “Humanity’s Last Exam” competition aimed to change this by collecting a new benchmark set of exceptionally difficult questions. The aim was to push artificial intelligence to its limits by challenging it with truly expert-level questions. To stack the deck in our favour any AI aiming to pass needed to be an expert in every subject, not just one or two!

Experts from across the disciplines were challenged to come up with questions in their area that they thought an AI would not be able to answer. The competition was a big success. It attracted more than 1,000 researchers and other experts. They submitted questions (with the correct answers), spanning over 100 different subjects. From all these suggested questions a solid set were selected in three stages. 

First, came AI Evaluation: five of the best AI models of late 2024 attempted each question. If all failed it, then the question advanced to the next stage. Second came Expert Review: human experts refined and assessed the questions and answers. They had to make sure that the questions had a known answer that they were sure was correct. The questions also had to be clear. They couldn’t be ambiguous so that more than one answer might be considered correct. Finally, came the Final Selection: a panel of experts and organisers made the final call of which questions were actually to be used.

Out of over 70,000 submitted questions to stage 1, only 2,500 made it into the final benchmark, with the top 50 declared as winners, with the person submitting the question earning a prize. In addition, they were invited to become co-authors of the research paper accompanying the competition.

Two computer scientists from QMUL, Søren Riis and Marc Roth contributed multiple questions to the competition, and despite how many questions failed to make the grade, both were joint winners. Moreover, one of Marc’s questions was selected to be featured in the Nature paper about the results. 

But what does a good question look like? To see, lets look at one of Marc’s selected questions. It concerned the process of “discovering” a network, meaning visiting all the nodes of an unknown network. What does this involve? Imagine a mouse is placed in a maze and starts to explore it. The maze is a kind of network with nodes (the junctions) and edges (the paths between them). The mouse, as it explores, is discovering that network. Suppose it does it randomly. Whenever it reaches a junction, it chooses one of the outgoing directions totally at random and continues exploring in that direction. We are interested in several things: how long will it take a mouse, on average, to explore the entire maze? How often will any specific location be visited by the mouse? And how likely it is for the mouse to be at any specific location at the end of its exploration?

The AIs were asked about a variation of this in which the mouse uses a specific but cleverer random strategy as given in the question, rather than just choosing a direction to go in totally at random at each junction. The AIs had to predict the behaviour of a mouse following this new strategy on different types of mazes. Surprisingly perhaps, even the best AIs at the time of the competition (2024) were unable to solve the problem correctly. They all claimed that the updated strategy does not lead to any difference in the overall behaviour compared to the original naive random strategy, in terms of the things of interest (like time taken). This is wrong as there are actually clear differences in the behaviour resulting  from the two strategies. That was something that Marc himself was able to correctly work out: Humans: 1 (well at least if you are Marc), AIs: 0

The first version of the overall benchmark (so AI exam) was set and finalised in early 2025. The best two AIs (Open AI o1 and Deepseek R1) got about 8% of the questions right. One year later, Gemini 3 Pro achieved a staggering 38.3%! Its true performance might be even better since the benchmark set might still contain some ambiguous questions with no clear right answer and some questions where the given expert answers are only partially incomplete or incorrect. This is mainly believed to be a possibility in the areas of text-only chemistry and biology questions: so more work for the chemists and biologists!

Because of the need to continue to work on the questions to make sure they are definitely correct and unambiguous, the “Humanities Last Exam” team has now switched to working on the questions on a rolling basis, aiming to improve the questions over the coming years. The AIs are not going to be free from taking exams for some time come! But it may not be long before humanity runs out of questions. In the meantime, anyone thinking that human examiners need to just come up with better questions to avoid the problem of students asking AIs to answer questions for them had better think again. Even the best experts in the world are struggling to find questions no AI can answer. And if they can’t answer them this year, there is always next year, or the year after…

Marc Roth and Paul Curzon, Queen Mary University of London

More on…

Getting Technical

Subscribe to be notified whenever we publish a new post to the CS4FN blog.