Why ‘Correct’ Computers Can Be So Wrong

A dark, brooding vending machine — Image by Alexander Antropov from Pixabay

Have you ever followed a recipe perfectly, only to find the cake tastes… well, a bit weird? You followed every rule, measured every ingredient, and checked every step. The recipe said you did it right. But your taste buds, and your grandma’s disappointed sigh, told you something was definitely wrong. In the world of computer science, we have this problem all the time. We call it the difference between verification and validation.

Verification is like checking the recipe. Did you follow the rules exactly as written? A computer program called a “verifier” can look at the code a programmer wrote and check it against the “recipe” (the formal specification). If the code matches the recipe, it gets a big green checkmark. It’s officially “correct.”

But validation is like the taste test. Does the final cake actually taste good? Does it make people happy? This is about asking: did we make the right thing?

The Case of the Annoying Vending Machine

Imagine a new high-tech vending machine. Its specification (its recipe) says: “If a user inserts money and the selected item is out of stock, return the money.”

A programmer builds the machine. A verifier checks it. You put in a pound, press “B2” for salt and vinegar crisps, and it’s out of stock. The machine correctly spits your pound coin back out onto the floor.

Is the machine correct? According to the recipe, yes! Verification passed. Is it the right machine? No! It’s incredibly annoying!

Your expectation as a human was probably completely different. You expected it to say, “Sorry, B2 is out of stock. Please choose something else.” You expected it to hold onto your money and let you make another choice. Cheese and Onion Crisps are just as good. The machine followed the rules, but it completely misunderstood what you, the user, actually wanted. It was correct, but wrong.

This is the frustrating gap where most “bad” software lives. It follows its own strange rules perfectly but seems to have no common sense about what people actually expect.

Teaching the Genie to Read Minds (Almost!)

So how do we fix this? Our research introduces an idea called Semantic Expectation Logic (SEL). It sounds complicated, but the core idea is simple: what if we could teach the computer to understand not just its own recipe, but also the collection of fuzzy, unwritten expectations that people have in their heads?

Think of it like upgrading from a rule-following genie to a mind-reading one.

A rule-following genie gives you exactly what you wish for. If you wish to “be on a flight”, it might put you on the wing of a 747. Technically correct, but not what you expected!

SEL is our attempt to build a “mind-reading” genie for software. We do it in three steps:

Write Down the Rules of the World: We tell the computer about fundamental truths. For the vending machine, a rule might be “Money can only be returned after a transaction is finished or explicitly cancelled by the user.”
Ask People What They Expect: Instead of guessing, we show people what the machine does in different situations. “The machine just spat your coin on the floor. Is that what you expected?” We collect all these “yes” and “no” answers, along with why. We call this “mining expectations”.
Check for Gaps: Our SEL tool then looks at what the machine actually does and compares it to both the “Rules of the World” and the “Mined Expectations”.

It might find:

A Bug: The machine sets itself on fire. (It violates a Rule of the World: “Vending machines should not set themselves on fire.”)
A Validation Failure (The “Aha!” Moment): The machine follows its own rules perfectly but violates a strong expectation. Our tool would flag this: “Warning! 98% of people expected the machine to ask for another choice, but it just returned the money. This is a Surprise!”

We even created a “Surprise Factor” metric that gives a score from 0% to 100% for how surprising a piece of software is. A low score means the software behaves as people expect. A high score means you’ve built a technically “correct” but incredibly annoying vending machine.

By making expectations a central part of the testing process, we can start building software that isn’t just correct according to its own bizarre logic, but is also correct in a way that actually makes sense to the people using it. We can build the cake that not only follows the recipe but also tastes great.

Vasileios Klimis, Queen Mary University of London

Getting Technical

Read the research paper:
- A Logic for Validating Stakeholder Expectations

Subscribe to be notified whenever we publish a new post to the CS4FN blog.

The Proof of Love

A pseudocode poem in a red heart:
Violets are violet
if roses are red and violets are blue
then
Life is sweet
else
I love you — Image by CS4FN

For Valentine’s day we created this card with a pseudocode love poem. But what does it mean? Is it a statement of love or not? Now that Valentines Day is gone and logic and rational thinking are starting to reassert themselves, here is a logical argument of what it means: an informal proof of love.

We are using our own brand of poetic pseudocode here, and what matters is the formal semantics of the language (ie mathematical meaning).

What we will do is simplify the code to code that is mathematically equivalent but far simpler, explaining the semantics as we do. We will by reasoning in a similar way to doing algebra, just replacing equals with equals.

First let’s write the poem out in more program looking notation, making clearer what are statements (actions) and expressions (things that evaluate to a value. In the English reading version we are using the verb TO BE in both ways which confuses things a little.

Let’s make clear the difference between variables (that hold values and values). The colours are all values (data that can be stored) – so we will write them in capitals: RED, BLUE, VIOLET. The flowers are variables (places to store values) so we will leave them as lowercase: violets, roses. Then the title becomes an assignment (written more formally as :=). It sets the value of variable violets to value VIOLET. (We are pedantic and believe the colour of violet flowers is violet not blue!)

What comes after the keyword IF is an expression – it evaluates to a boolean (TRUE value or FALSE value) so is referred to as a boolean expression. You can think of boolean expressions as tests – true or false questions. We will use == to indicate a true/false question about whether two things are the same value.

The other two lines are statements – think of them as print statements that print a message as the action they do.

The algorithm of the poem then is:

violets := VIOLET;
IF ((roses == RED) AND (violets == BLUE))
THEN
    PRINT "Life is Sweet"
ELSE
    PRINT "I love you"

Now let us simplify the algorithm to an equivalent version a step at a time. In the assignment of the first line, the variable violets is set to value VIOLET and then not changed, so we can substitute the value (a colour in uppercase) VIOLET for variable violets (in lowercase) where it appears, and remove the assignment. This gives a simpler version of the algorithm that does exactly the same:

IF ((roses == RED) AND (VIOLET == BLUE))
THEN
    PRINT "Life is Sweet"
ELSE
    PRINT "I love you"

Now in the boolean expression, we have the subexpression

(VIOLET==BLUE)

but these are two different values and this is asking whether they are the same or not. It evaluates to true if they are the same and FALSE if they are different. It is therefore equivalent to FALSE so the whole expression becomes:

(roses==RED) AND FALSE

The original algorithm is equivalent to

IF ((roses == RED) AND FALSE)
THEN
    PRINT "Life is Sweet"
ELSE
    PRINT "I love you"

Now whatever X is a boolean expression (X & FALSE) will always evaluate to FALSE because it needs both to be TRUE for the whole to be TRUE. The whole boolean expression therefore simplifies to FALSE and the algorithm to:

IF (FALSE)
THEN
    PRINT "Life is Sweet"
ELSE
    PRINT "I love you"

Notice how this means it does not matter what colour the roses are at all. Whether roses are red, white, blue or something else, the algorithm result will not be affected.

The semantics of an IF statement is that it evaluates exactly one of its two statements. Where its test (boolean expression) evaluates to TRUE, it executes the first THEN branch (here the Life is Sweet branch) and ignores the other ELSE branch (the I love you branch). Where its test evaluates to FALSE it instead ignores the THEN branch completely and executes the ELSE branch.

Here the test is FALSE, so it ignores the THEN branch and is equivalent to the ELSE branch. The algorithm as a whole is exactly equivalent to the far simpler algorithm:

PRINT "I love you"

Going back to the poem, it is therefore logically completely equivalent to a poem
I Love You

We have given a mathematical proof of love! The idea though is that only computer scientists with a formal semantics (ie maths meaning) of the pseudocode language used would see it!

The Alien Cookbook

An alien looking on distraught that two bowls of soup are different, one purple, one green. — Image by CS4FN from original soup bowls (thick and thin) by OpenClipart-Vectors and alien image by Clker-Free-Vector-Images from Pixabay

How to spot a bad chef when you’ve never tasted the food (OR How to spot a bad quantum simulator when you do not know what the quantum circuit it is simulating is supposed to do.)

Imagine you’re a judge on a wild cooking competition. The contestants are two of the best chefs in the world, Chef Qiskit and Chef Cirq. Today’s challenge is a strange one. You hand them both a mysterious, ancient cookbook found in a crashed spaceship. The recipe you’ve chosen is called “Glorp Soup”. The instructions are very precise and scientific: “… Heat pan to 451 degrees. Stir counter-clockwise for exactly 18.7 seconds. … Add exactly 3 grams of powdered meteorite (with the specified composition). …” The recipe is a perfectly clear algorithm, but since no human has ever made Glorp Soup, nobody knows what it’s supposed to taste, look, or smell like. Both chefs go to their identical kitchens with the exact same alien ingredients. After an hour, they present their dishes.

Chef Qiskit brings out a bowl of thick, bubbling, bright purple soup that smells like cinnamon.
Chef Cirq brings out a bowl of thin, clear, green soup that smells like lemons.

Now you have a fascinating situation. You have no idea which one is the “real” Glorp Soup. Maybe it’s supposed to be purple, or maybe it’s green. But you have just learned something incredibly important: at least one of your expert chefs made a mistake. They were given the exact same, precise recipe, but they produced two completely different results. You’ve found a flaw in one of their processes without ever knowing the correct answer.

This powerful idea is called Differential Testing.

Cooking with Quantum Rules

In our research, the “alien recipes” we use are called quantum circuits. These are the step-by-step instructions for a quantum computer. And the “chefs” are incredibly complex computer programs called quantum simulators, built by places like Google and IBM.

Scientists give these simulators a recipe (a circuit) to predict what a real quantum computer will cook up. These “dishes” could be the design for a new medicine or a new type of battery. If the simulator-chef gets the recipe wrong, the final result could be useless or even dangerous. But how do you check a chef’s work when the recipe is for a food you’ve never tasted? How do you test a quantum simulator when you do not know exactly what a quantum circuit should do.

FuzzQ: The Robot Quantum Food Critic

We can’t just try one recipe, one quantum circuit. We need to try thousands. So we built a robot “quantum food critic”, a program we call FuzzQ. FuzzQ’s job is to invent new “alien recipes” ie quantum circuits and see if the two “chefs” cook the same dish (i.e. different simulators do the same thing when simulating it). This process of trying out thousands of different, and sometimes very weird, recipes is called Fuzzing.

Here’s how our quantum circuit food critic works:

It writes a recipe: FuzzQ uses a rulebook for “alien cooking” to invent a new, unique, and often very strange quantum circuit.
It gives the recipe to both chefs: It sends the exact same quantum circuit to “Chef Qiskit” (the Qiskit simulator) and “Chef Cirq” (the Cirq simulator).
It tastes the soup: FuzzQ looks at the final result from both. If they’re identical, it assumes they’re correct. But if they do different things, so one did the equivalent of make a purple, bubbling soup and the other made the equivalent of a clear, green soup, FuzzQ sounds the alarm. It has found a bug!

We had FuzzQ invent and “taste-test”, so check the results of, over 800,000 different quantum recipes.

The Tale of the Two Ovens

Our robot critic found 8 major types of quantum “cooking” errors. One of the most interesting was for a simple instruction called a “SWAP”, which was discovered by looking at how the two chefs used their high-tech “ovens”.

Imagine both chefs have an identical oven with two compartments, a Top Oven and a Bottom Oven. They preheat them according to the recipe: the Top Oven to a very hot 250°C, and the Bottom Oven to a low 100°C. The recipe then has a smart-oven command:

“Logically SWAP the Top Oven and Bottom Oven.”

Both chefs press the button to do the “SWAP”.

Chef Cirq’s oven works as expected. It starts the long process of cooling the top oven and heating the bottom one.
Chef Qiskit’s oven, however, is a “smarter” model. It takes a shortcut. It doesn’t change the temperatures at all but just swaps the labels on its digital display so that the one at the top previously labelled the Top Oven is now labelled as the Bottom Oven, and vice versa. The screen now lies, showing Top Oven: 100°C and Bottom Oven: 250°C, even though the physical reality is the opposite: the one at the top is still the incredibly hot, 250°C and the one below it is still 100°C.

The final instruction is:

“Place the delicate soufflé into the physical TOP OVEN.”

Chef Cirq opens his top oven (ie the one positioned above the other and labelled Top Oven), which is now correctly at 100°C, having cooled down, and bakes a perfect soufflé.
Chef Qiskit, trusting his display, opens his top oven (ie the one positioned above the other but internally now labelled Bottom Oven) and puts his soufflé inside. But that physical oven that is at the top is still at 250°C. A few minutes later, he has a burnt, smoky crisp.

Our robot judge, FuzzQ, doesn’t need to know how to bake. It just looks at the two final soufflés. One is perfect, and the other is charcoal. The results are different, so FuzzQ sounds the alarm: “Disagreement found!”

This is how we found the bug. We didn’t need to know the “correct temperature”. We only needed to see that the two expert simulators, when given the same instructions, produced two wildly different outcomes. Knowing something now is amiss, further investigation of what each quantum simulator did with those identical instructions, can determine what actually went wrong and the problematic quantum simulator improved. By finding these disagreements, we’re helping to make sure the amazing tools of quantum science are trustworthy.

Vasileios Klimis, Queen Mary University of London

Getting Technical …

Read the OOPSLA academic paper

Subscribe to be notified whenever we publish a new post to the CS4FN blog.

This blog is funded by EPSRC on research agreement EP/W033615/1.

	Music AI Kriss Kross… on The day the music didn’t …
	Music AI Kriss Kross… on Separate your stems
	Musical Algorithms… on You’ll be Bach! – create…
	The art of animatron… on I’m (not) a little …
	The Decline and Fall… on Victorian volunteers needed…

Category: Verification and Validation

Why ‘Correct’ Computers Can Be So Wrong

The Case of the Annoying Vending Machine

Teaching the Genie to Read Minds (Almost!)

More on …

Getting Technical

The Proof of Love

More on …

The Alien Cookbook

Cooking with Quantum Rules

FuzzQ: The Robot Quantum Food Critic

The Tale of the Two Ovens

More on …

Getting Technical …