Margaret Hamilton: Apollo Emergency! Take a deep breath, hold your nerve and count to 5

Buzz Aldrin standing on the moon. Image by Neil Armstrong, NASA via Wikimedia Commons – Public Domain

You have no doubt heard of Neil Armstrong, first human on the moon. But have you heard of Margaret Hamilton? She was the lead engineer, responsible for the Apollo mission software that got him there, and ultimately for ensuring the lunar module didn’t crash land due to a last minute emergency.

Being a great software engineer means you have to think of everything. You are writing software that will run in the future encountering all the messiness of the real world (or real solar system in the case of a moon landing). If you haven’t written the code to be able to deal with everything then one day the thing you didn’t think about will bite back. That is why so much software is buggy or causes problems in real use. Margaret Hamilton was an expert not just in programming and software engineering generally, but also in building practically dependable systems with humans in the loop. A key interaction design principle is that of error detection and recovery – does your software help the human operators realise when a mistake has been made and quickly deal with it? This, it turned out, mattered a lot in safely landing Neil Armstrong and Buzz Aldrin on the moon.

As the Lunar module was in its final descent dropping from orbit to the moon with only minutes to landing, multiple alarms were triggered. An emergency was in progress at the worst possible time. What it boiled down to was that the system could only handle seven programs running at once but Buzz Aldrin had just set an eighth running. Suddenly, the guidance system started replacing the normal screens by priority alarm displays, in effect shouting “EMERGENCY! EMERGENCY”! These were coded into the system, but were supposed never to be shown, as the situations triggering them were supposed to never happen. The astronauts suddenly had to deal with situations that they should not have had to deal with and they were minutes away from crashing into the surface of the moon.

Margaret Hamilton was in charge of the team writing the Apollo in-flight software, and the person responsible for the emergency displays. She was covering all bases, even those that were supposedly not going to happen, by adding them. She did more than that though. Long before the moon landing happened she had thought through the consequences of if these “never events” did ever happen. Her team had therefore also included code in the Apollo software to prioritise what the computer was doing. In the situation that happened, it worked out what was actually needed to land the lunar module and prioritised that, shutting down the other software that was no longer vital. That meant that despite the problems, as long as the astronauts did the right things and carried on with the landing, everything would ultimately be fine.

There was still a potential problem though, When an emergency like this happened, the displays appeared immediately so that the astronauts could understand the problem as soon as possible. However, behind the scenes the software itself that was also dealing with them, by switching between programs, shutting down the ones not needed. Such switchovers took time In the 1960s Apollo computers as computers were much slower than today. It was only a matter of seconds but the highly trained human astronauts could easily process the warning information and start to deal with it faster than that. The problem was that, if they pressed buttons, doing their part of the job continuing with the landing, before the switchover completed they would be sending commands to the original code, not the code that was still starting up to deal with the warning. That could be disastrous and is the kind of problem that can easily evade testing and only be discovered when code is running live, if the programmers do not deeply understand how their code works and spend time worrying about it.

Margaret Hamilton had thought all this through though. She had understood what could happen, and not only written the code, but also come up with a simple human instruction to deal with the human pilot and software being out of synch. Because she thought about it in advance, the astronauts knew about the issue and solution and so followed her instructions. What it boiled down to was “If a priority display appears, count to 5 before you do anything about it.” That was all it took for the computer to get back in synch and so for Buzz Aldrin and Neil Armstrong to recover the situation, land safely on the moon and make history.

Without Margaret Hamilton’s code and deep understanding of it, we would most likely now be commemorating the 20th July as the day the first humans died on the moon, rather than being the day humans first walked on the moon.

– Paul Curzon, Queen Mary University of London

More on …

Subscribe to be notified whenever we publish a new post to the CS4FN blog.

This blog is funded by EPSRC on research agreement EP/W033615/1.

You cannot be serious! …Wimbledon line calls go wrong

Image by Felix Heidelberger from Pixabay (cropped)

The 2025 tennis championships are the first time Wimbledon has completely replaced their human line judges with an AI vision and decision system, Hawk-Eye. After only a week it caused controversy, with the system being updated, after it failed to call a glaringly out ball in a Centre Court match between Brit Sonay Kartal and Anastasia Pavlyuchenkova. Apparently it had been switched off by mistake mid-game. This raises issues inherent in all computer technology replacing humans: that they can go wrong, the need for humans-in-the-loop, the possibility of human error in their use, and what you do when they do go wrong.

Perhaps because it is a vision system rather than generative AI there has been little talk of whether Hawk-Eye is 100% accurate or not. Vision systems do not hallucinate in the way generative AI does, but they are still not infallible. The opportunity for players to appeal has been removed, however: in the original way Hawk-Eye was used humans made the call and players could ask for Hawk-Eye to check. Now, Hawk-Eye makes a decision and basically that is it. A picture is shown on screen of a circle relative to the line, generated by Hawk-Eye to ‘prove’ the ball was in or out as claimed. It is then taken as gospel. Of course, it is just reflecting Hawk-Eye’s decision – what it “saw” – not reality and not any sort of actual separate evidence. It is just a visual version of the call shouted. However, it is taken as though it is absolute proof with no argument possible. If it is aiming to be really, really dependable then Hawk-Eye will have multiple independent systems sensing in different ways and voting on the result – as that is one of the ways computer scientists have invented to program dependability. However, whether it is 100% accurate isn’t really the issue. What matters is whether it is more accurate, making fewer mistakes, than human line judges do. Undoubtedly it is, so is therefore an improvement and some uncaught mistakes are not actually the point.

However, the mistake in this problem call was different. The operators of the system had switched it off mistakenly mid-match due to “human error”. That raises two questions. First, why was it designed to that a human could accidentally turn it off mid-match – don’t blame that person as it should not have been possible in the first place. Fix the system so it can’t happen again. That is what within a day the Lawn Tennis Association claim to have done (whether resiliently remains to be seen).

However, the mistake begs another question. Wimbledon had not handed the match over to the machines completely. A human umpire was still in charge. There was a human in the loop. They, however, had no idea the system was switched off we were told until the call for a ball very obviously out was not made. If that is so, why not? Hawk-Eye supposedly made two calls of “Stop”. Was that its way of saying “I am not working so stop the match”? If it was such an actual message to the umpire it is not a very clear way to make it, and guarantees to be disruptive. It sounds a lot like a 404 error message, added by a programmer for a situation that they do not expect to occur!

A basic requirement of a good interactive system is that the system state is visible – that it is not even switched on should have been totally obvious in the controls the umpire had well before the bad call. That needs to be fixed too, just in case there is still a way Hawk-Eye can still be switched off. It begs the question of how often has the system been accidentally switched off, or powered down temporally for other reasons, with no one knowing, because there was no glaringly bad call to miss at the time.

Another issue is the umpire supposedly did follow the proper procedure which was not to just call the point (as might have happened in the past given he apparently knew “the ball was out!”) but instead had the point replayed. That was unsurprisingly considered unfair by the player who lost a point they should have won. Why couldn’t the umpire make a decision on the point? Perhaps, because humans are no longer trusted at all as they were before. As suggested by Pavlyuchenkova there is no reason why there cannot be a video review process in place so that the umpire can make a proper decision. That would be a way to add back in a proper appeal process.

Also, as was pointed out, what happens if the system fully goes down, does Wimbledon now have to just stop until Hawk-Eye is fixed: “AI stopped play”. We have lots of situations over many decades as well as recently of complex computer systems crashing. Hawk-Eye is a complex system so problems are likely possible. Programmers make mistakes (and especially when doing quick fixes to fix other problems as was apparently just done). If you replace people by computers, you need a reliable and appropriate backup that can kick into place immediately from the outset. A standard design principle is that programs should help avoid humans making mistakes, help them quickly detect them when they do and help them recover.

A tennis match is not actually high stakes by human standards. No one dies because of mistakes (though a LOT of money is at stake), but the issues are very similar in a wide range of systems where people can die – from control of medical devices, military applications, space, aircraft and nuclear power plant control…all of which computers are replacing humans. We need good solutions, and they need to be in place before something goes wrong not after. An issue as systems are more and more automated is that the human left in the loop to avoid disaster has more and more trouble tracking what the machine is doing as they do less and less, so making it harder to step in and correct problems in a timely way (as was likely the case with the Wimbledon umpire). The humans need to not just be a little bit in the loop but centrally so. How you do that for different situations is not easy to work out but as tennis has shown it can’t just be ignored. There are better solutions than Wimbledon are using but to even consider them you have to first accept that computers do make mistakes so know there is a problem to be solved.

– Paul Curzon, Queen Mary University of London

Maria Cunitz: astronomer and algorithmic thinker

When did women first contribute to the subject we now call Computer Science: developing useful algorithms, for example? Perhaps you would guess Ada Lovelace in the Victorian era so the mid 1800s? She corrected one of Charles Babbage’s algorithms for the computer he was trying to build. Think earlier. Two centuries or so earlier! Maria Cunitz improved an algorithm published by the astronomer Kepler and then applied it to create a work more accurate than his.

A stary sky with the milky way — Image by Rene Tittmann from Pixabay

Very few women, until the 20th century were given the opportunities to take part in any kind of academic study. They did not get enough education, and even if they did were not generally welcome in the circles of mathematicians and natural philosophers. Maria, who was Polish from an educated family of doctors and scientists, was tutored and supported in becoming a polymath with an interest in lots of subjects from history to mathematics. Her husband was a doctor who also was interested in astronomy something that became a shared passion with him teaching her the extra maths she needed. They lived at the time of the 30 years war that was waged across most of Europe. It was a spat turned into a war about religion between catholic and protestant countries. In Poland, where they lived, it was not safe to be a protestant. The couple had a choice of convert or flee, so left their home taking sanctuary in a convent.

This actually gave Cunitz a chance to pursue an astronomical ambition based on the work of Johannes Kepler. Kepler was famous for his three Laws of Planetary Motion published in the early 1600s on how the planets orbit the sun. It was based on the new understanding from Copernicus that the planets rotated around the sun and so the Earth was not the centre of everything. Kepler’s work gave a new way to compute the positions of the planets,

Cunitz had a detailed understanding of Kepler’s work and of the mathematics behind it, She therefore spent her time in the convent computing tables that gave the positions of all the planets in the sky. This was based on a particular work of Kepler called the Rudolphine Tables. It was one of his great achievements stemming from his planetary laws. Such astronomical tables became vital for navigating ships at sea, as the planetary positions could be used to determine longitude. However, at the time, the main use was for astrology as casting someone’s horoscope required knowledge of the precise positions of the planets. In creating the tables, Cunitz was acting as an early human computer, following an algorithm to compute the table entries. It involved her doing a vast amount of detailed calculation.

Kepler himself spent years creating his version of the tables. When asked to hurry up and complete the work he said: “I beseech thee, my friends, do not sentence me entirely to the treadmill of mathematical computations…” He couldn’t face the role of being a human computer! And yet a whole series of women who came after him dedicated their lives to doing exactly that, each pushing forward astronomy as a result. Maria herself took on the specific task he had been reluctant to complete in working out tables of planetary positions.

Kepler had published his algorithm for computing the tables along with the tables. Following his algorithm though was time consuming and difficult, making errors likely. Maria realised it could be improved upon, making it simpler to do the calculations for the tables and making it more likely they were correct. In particular, Kepler was using logarithms for the calculations. but she realised that was not necessary. Sacrificing some accuracy in the results for the sake of the avoidance of larger errors, the version she followed was even simpler. By the use of algorithmic thinking she had avoided at least some of the chore that Kepler himself had dreaded. This is exactly the kind of thing good programmers do today, improving the algorithms behind their programs so the programs are more efficient. The result was that Maria produced a set of tables that was more accurate than Kepler’s, and in fact the most accurate set of planetary tables ever produced to that point in time. She published them privately as a book “Urania Propitia’ in 1650. Having a mastery of languages as well as maths and science, she, uniquely, wrote it in both German and Latin.

Women do not figure greatly in the early history of science and maths just because societal restrictions, prejudices and stereotypes meant few were given the chance. Those who were like Maria Cunitz, showed their contributions could be amazing. It just took the right education, opportunities, and a lot of dedication. That applies to modern computer science too, and as the modern computer scientist, Karen Spärck Jones, responsible for the algorithm behind search engines said: “Computing is too important to be left to men.”

– Paul Curzon, Queen Mary University of London

Magazines …

Front cover of CS4FN issue 29 - Diversity in Computing

Subscribe to be notified whenever we publish a new post to the CS4FN blog.

This page and talk are funded by EPSRC on research agreement EP/W033615/1.

Oh no! Not again…

What a mess. There’s flour all over the kitchen floor. A fortnight ago I opened the cupboard to get sugar for my hot chocolate. As I pulled out the sugar, it knocked against the bag of flour which was too close to the edge… Luckily the bag didn’t burst and I cleared it up quickly before anyone found out. Now it’s two weeks later and exactly the same thing just happened to my brother. This time the bag did burst and it went everywhere. Now he’s in big trouble for being so clumsy!

Flour cascading everywhere
Cropped image of that by Anastasiya Komarova from Pixabay — Image by Anastasiya Komarova from Pixabay (cropped)

In safety-critical industries, like healthcare and the airline industry, especially, it is really important that there is a culture of reporting incidents including near misses. It also, it turns out, is important that the mechanisms of reporting issues is appropriately designed, and that there is a no blame culture especially so that people are encouraged to report incidents and do so accurately and without ambiguity.

Was the flour incident my brother’s fault? Should he have been more careful? He didn’t choose to put the sugar in a high cupboard with the flour. Maybe it was my fault? I didn’t choose to put the sugar there either. But I didn’t tell anyone about the first time it happened. I didn’t move the sugar to a lower cupboard so it was easier to reach either. So maybe it was my fault after all? I knew it was a problem, and I didn’t do anything about it. Perhaps thinking about blame is the wrong thing to do!

Now think about your local hospital.

James is a nurse, working in intensive care. Penny is really ill and is being given insulin by a machine that pumps it directly into her vein. The insulin is causing a side effect though – a drop in blood potassium level – and that is life threatening. They don’t have time to set up a second pump, so the doctor decides to stop the insulin for a while and to give a dose of potassium through a second tube controlled by the same pump. James sets up the bag of potassium and carefully programs the pump to deliver it, then turns his attention to his next task. A few minutes later, he glances at the pump again and realises that he forgot to release the clamp on the tube from the bag of potassium. Penny is still receiving insulin, not the potassium she urgently needs. He quickly releases the clamp, and the potassium starts to flow. An hour later, Penny’s blood potassium levels are pretty much back to normal: she’s still ill, but out of danger. Phew! Good job he noticed in time and no-one else knows about the mistake!

Two weeks later, James’ colleague, Julia, is on duty. She makes a similar mistake treating a different patient, Peter. Except that she doesn’t notice her mistake until the bag of insulin has emptied. Because it took so long to spot, Peter needs emergency treatment. It’s touch-and-go for a while, but luckily he recovers.

Julia reports the incident through the hospital’s incident reporting system, so at least it can be prevented from happening again. She is wracked with guilt for making the mistake, but also hopes fervently that she won’t be blamed and so punished for what happened

Don’t miss the near misses

Why did it happen? There are a whole bunch of problems that are nothing to do with Julia or James. Why wasn’t it standard practice to always have a second pump set up for critically ill patients in case such emergency treatment is needed? Why can’t the pump detect which bag the fluid is being pumped from? Why isn’t it really obvious whether the clamp is open or closed? Why can’t the pump detect it. If the first incident – a ‘near miss’ – had been reported perhaps some of these problems might have been spotted and fixed. How many other times has it happened but not reported?

What can we learn from this? One thing is that there are lots of ways of setting up and using systems, and some may well make them safer. Another is that reporting “near misses” is really important. They are a valuable source of learning that can alert other people to mistakes they might make and lead to a search for ways of making the system safer, perhaps by redesigning the equipment or changing the way it is used, for example – but only if people tell others about the incidents. Reporting near-misses can help prevent the same thing happening again.

The above was just a story, but it’s based on an account of a real incident… one that has been reported so it might just save lives in the future.

Report it!

The mechanisms used to do it, as well as culture around reporting incidents can make a big difference to whether incidents are reported. However, even when incidents are reported, the reporting systems and culture can help or hinder the learning that results.

Chrystie Myketiak at Queen Mary analysed actual incident reports for the kind of language used by those writing them. She found that the people doing the reporting used different strategies in they way they wrote the reports depending on the kind of incident it was. In situations where there was no obvious implication that a person made a mistake (such as where sterilization equipment had not successfully worked) they used one kind of language. Where those involved were likely to be seen to be responsible, so blamed, (eg when a wrong number had been entered in a medical device, for example) they used a different kind of language.

In the former, where “user errors” might have been involved, those doing the reporting were more likely to write in a way that hid the identity of any person involved, eg saying “The pump was programmed” or writing about he or she rather than a named person. They were also more likely to write in a way that added ambiguity. For example, in the user error reports it was less clear whether the person making the report was the one involved or whether someone else was writing it such as a witness or someone not involved at all.

Writing in the kinds of ways found, and the fact that it differed to those with no one likely to be blamed, suggests that those completing the reports were aware that their words might be misinterpreted by those who read them. The fact that people might be blamed hung over the reporting.

The result of adding what Christie called “precise ambiguity” might mean important information was inadvertently concealed making it harder to understand why the incident happened so work out how best to avoid it. As a result, patient safety might then not be improved even though the incident was reported. This shows one of the reasons why a strong culture of no-fault reporting is needed if a system is to be made as safe as possible. In the airline industry, which is incredibly safe, there is a clear system of no fault reporting, with pilots, for example, being praised for reporting near-misses of plane crashes rather than being punished for any mistake that led to the near miss.

This work was part of the EPSRC funded CHI+MED research project led by Ann Blandford at UCL looking at design for patient safety. In separate work on the project, Alexis Lewis, at Swansea University, explored how best to design the actual incident reporting forms as part of her PhD. A variety of forms are used in hospitals across the UK and she examined more than 20 different ones. Many had features that would make it harder than necessary for nurses and doctors to report incidents accurately even if they wanted to openly so that hospital staff would learn as much as possible from the incidents that did happen. Some forms failed to ask about important facts and many didn’t encourage feedback. It wasn’t clear how much detail or even what should be reported. She used the results to design a new reporting form that avoided the problems and that could be built into a system that encourages the reporting of incidents . Ultimately her work led to changes to the reporting form and process used within at least one health board she was working with.

People make mistakes, but safety does not come from blaming those that make them. That just discourages a learning culture. To really improve safety you need to praise those that report near misses, as well as ensuring the forms and mechanisms they must use to do so helps them provide the information needed.

Updated from the archive, written by the CHI+MED team.

Magazines …

Subscribe to be notified whenever we publish a new post to the CS4FN blog.

EPSRC supports this blog through research grant EP/W033615/1.

Much ado about nothing

A blurred image of a hospital ward — Image by Tyli Jura from Pixabay

The nurse types in a dose of 100.1 mg [milligrams] of a powerful drug and presses start. It duly injects 1001 mg into the patient without telling the nurse that it didn’t do what it was told. You wouldn’t want to be that patient!

Designing a medical device is difficult. It’s not creating the physical machine that causes problems so much as writing the software that controls everything that that machine does. The software is complex and it has to be right. But what do we mean by “right”? The most obvious thing is that when a nurse sets it to do something, that is exactly what it does.

Getting it right is subtler than that though. It must also be easy to use and not mislead the nurse: the human-computer interface has to be right too. It is the software that allows you to interact with a gadget – what buttons you press to get things done and what feedback you are given. There are some basic principles to follow when designing interfaces. One is that the person using it should always be clearly told what it is doing.

Manufacturers need ways to check their devices meet these principles: to know that they got it right.

It’s not just the manufacturers, though. Regulators have the job of checking that machines that might harm people are ‘right’ before they allow them to be sold. That’s really difficult given the software could be millions of lines long. Worse they only have a short time to give an answer.

Million to one chances are guaranteed to happen.

Problems may only happen once in a million times a device is used. They are virtually impossible to find by having someone try possibilities to see what happens, the traditional way software is checked. Of course, if a million devices are bought, then a million to one chance will happen to someone, somewhere almost immediately!

Paolo Masci at Queen Mary University of London, has come up with a way to help and in doing so found a curious problem. He’s been working with the US regulator for medical devices – the FDA – and developed a way to use maths to find problems. It involves creating a mathematical description of what critical parts of the interface program do. Properties, like the user always knowing what is going on, can then be checked using maths. Paolo tried it out on the code for entering numbers of a real medical device and found some subtle problems. He showed that if you typed in certain numbers, the machine actually treated them as a number ten times bigger. Type in a dose of 100.1 and the machine really did set the dose to be 1001. It ignored the decimal point because on such a large dose it assumed small fractions are irrelevant. However another part of the code allows you to continue typing digits. Worse still the device ignores that decimal point silently. It doesn’t make any attempt to help a nurse notice the change. A busy nurse would need to be extremely vigilant to see the tiny decimal point was missing given the lack of warning.

A useful thing about Paolo’s approach is that it gives you the button presses that lead to the problem. With that you can check other devices very quickly. He found that medical devices from three other manufacturers had exactly the same problem. Different teams had all programmed in the same problem. None had thought that if their code ignored a decimal point, it ought to warn the nurse about it rather than create a number ten times bigger. It turns out that different programmers are likely to think the same way and so make the same mistakes (see ‘Double or Nothing‘).

Now the problem is known, nurses can be warned to be extra careful and the manufacturers can update the software. Better still they and the regulators now have an easy way to check their programmers haven’t made the same mistake in future devices. In future, whether vigilant or not, a nurse won’t be able to get it wrong.

Paul Curzon, Queen Mary University of London

Related Magazine …

CS4FN Issue 17 – Machines making medicine safer

Subscribe to be notified whenever we publish a new post to the CS4FN blog.

This blog is funded by EPSRC on research agreement EP/W033615/1.

Double or nothing: an extra copy of your software, just in case

Ariane 5 on the launchpad — Ariane 5 on the launch pad. Photo Credit: (NASA/Chris Gunn) Public Domain via Wikimedia Commons.

If you spent billions of dollars on a gadget you’d probably like it to last more than a minute before it blows up. That’s what happened to a European Space Agency rocket. How do you make sure the worst doesn’t happen to you? How do you make machines reliable?

A powerful way to improve reliability is to use redundancy: double things up. A plane with four engines can keep flying if one fails. Worried about a flat tyre? You carry a spare in the boot. These situations are about making physical parts reliable. Most machines are a combination of hardware and software though. What about software redundancy?

You can have spare copies of software too. Rather than a single version of a program you can have several copies running on different machines. If one program goes wrong another can take over. It would be nice if it was that simple, but software is different to hardware. Two identical programs will fail in the same way at the same time: they are both following the same instructions so if one goes wrong the other will too. That was vividly shown by the maiden flight of the Ariane 5 rocket. Less than 40 seconds from launch things went wrong. The problem was to do with a big number that needed 64 bits of storage space to hold it. The program’s instructions moved it to a storage place with only 16 bits. With not enough space, the number was mangled to fit. That led to calculations by its guidance system going wrong. The rocket veered off course and exploded. The program was duplicated, but both versions were the same so both agreed on the same wrong answers. Seven billion dollars went up in smoke.

Can you get round this? One solution is to get different teams to write programs to do the same thing. The separate teams may make mistakes but surely they won’t all get the same thing wrong! Run them on different machines and let them vote on what to do. Then as long as more than half agree on the right answer the system as a whole will do the right thing. That’s the theory anyway. Unfortunately in practice it doesn’t always work. Nancy Leveson, an expert in software safety from MIT, ran an experiment where different programmers were given programs to write. She found they wrote code that gave the same wrong answers. Even if it had used independently written redundant code it’s still possible Ariane 5 would have exploded.

Redundancy is a big help but it can’t guarantee software works correctly. When designing systems to be highly reliable you have to assume things will still go wrong. You must still have ways to check for problems and to deal with them so that a mistake (whether by human or machine) won’t turn into a disaster.

Paul Curzon, Queen Mary University of London

Related Magazine …

CS4FN Issue 17 – Machines making medicine safer

More on …

Subscribe to be notified whenever we publish a new post to the CS4FN blog.

This blog is funded by EPSRC on research agreement EP/W033615/1.

Negligent nurses? Or dodgy digital? – device design can unintentionally mask errors

Magicians often fool their audience into ‘looking over there’ (literally or metaphorically), getting them to pay attention to the wrong thing so that they’re not focusing on what the magician is doing and can enjoy the trick without seeing how it was done. Computers, phones and medical devices let you interact with them using a human-friendly interface (such as a ‘graphical user interface’) which make them easier to use, but which can also hide the underlying computing processes from view. Normally that’s exactly what you want but if there’s a problem, and one that you’d really need to know about, how well does the device make that clear? Sometimes the design of the device itself can mask important information, sometimes the way in which devices are used can mask it too. Here is a case where nurses were blamed but it was later found that the medical devices involved, blood glucose meters, had (unintentionally) tripped everyone up. A useful workaround seemed to be working well, but caused problems later on.

Negligent nurses? Or dodgy digital?

by Harold Thimbleby, Swansea University and Paul Curzon, Queen Mary University of London

Blood glucose meter image by Katherine Ab from Pixabay

It’s easy to get excited about new technology and assume it must make things better. It’s rarely that easy. Medical technology is a case in point, as one group of nurses found out. It was all about one simple device and wearable ID bracelets. Nurses were taken to court, blamed for what went wrong.

The nurses taken to court worked in a stroke unit and were charged with wilfully neglecting their patients. Around 70 others were also disciplined though not sent to court.

There were problems with many nurses’ record-keeping. A few were selected to be charged by the police on the rather arbitrary basis that they had more odd records than the others.

Critical Tests

The case came about because of a single complaint. As the hospital, and then police, investigated, they found more and more oddities, with lots of nurses suddenly implicated. They all seemed to have fabricated their records. Repeatedly, their paper records did not tally with the computer logs. Therefore, the nurses must have been making up the patient records.

The gadget at the centre of the story was a portable glucometer. Glucometers allow the blood-glucose (aka blood sugar) levels of patients to be tested. This matters. If blood-sugar problems are not caught quickly, seriously ill patients could die.

Whenever they did a test, the nurses recorded it in the patient’s paper record. The glucometer system also had a better, supposedly infallible, way to do this. The nurse scanned their ID badge using the glucometer, telling it who they were. They then scanned the patient’s barcode bracelet, and took the patient’s blood-sugar reading. They finally wrote down what the glucometer said in the paper records, and the glucometer automatically added the reading to that patient’s electronic record.

Over and over again, the nurses were claiming in the notes of patients that they had taken readings, when the computer logs showed no reading had been taken. As machines don’t lie, the nurses must all be liars. They had just pretended to take these vital tests. It was a clear case of lazy nurses colluding to have an easy life!

What really happened?

In court, witnesses gave evidence. A new story unfolded. The glucometers were not as simple as they seemed. No-one involved actually understood them, how the system really worked, or what had actually happened.

In reality the nurses were looking after their patients … despite the devices.

The real story starts with those barcode bracelets that the patients wore. Sometimes the reader couldn’t read the barcode. You’ve probably seen this happen in supermarkets. Every so often the reader can’t tell what is being scanned. The nurses needed to sort it out as they had lots of ill patients to look after. Luckily, there was a quick and easy solution. They could just scan their own ID twice. The system accepted this ‘double tapping’. The first scan was their correct staff ID. The second scan was of their staff card ID instead of the patient ID. That made the glucometer happy so they could use it, but of course they weren’t using a valid patient ID.

As they wrote the test result in the patient’s paper record no harm was done. When checked, over 200 nurses sometimes used double tapping to take readings. It was a well-known (at least by nurses), and commonly used, work-around for a problem with the barcode system.

The system was also much more complicated than that anyway. It involved a complex computing network, and a lot of complex software, not just a glucometer. Records often didn’t make it to the computer database for a variety of reasons. The network went down, manually entered details contained mistakes, the database sometimes crashed, and the way the glucometers had been programmed meant they had no way to check that the data they sent to the database actually got there. Results didn’t go straight to the patient record anyway. It happened when the glucometer was docked (for recharging), but they were constantly in use so might not be docked for days. Indeed, a fifth of the entries in the database had an error flag indicating something had gone wrong. In reality, you just couldn’t rely on the electronic record. It was the nurses’ old fashioned paper records that really were the ones you could trust.

The police had got it the wrong way round! They thought the computers were reliable and the nurses untrustworthy, but the nurses were doing a good job and the computers were somehow failing to record the patient information. Worse, they were failing to record that they were failing to record things correctly! … So nobody realised.

Disappearing readings

What happened to all the readings with invalid patient IDs? There was no place to file them so the system silently dropped them into a separate electronic bin of unknowns. They could then be manually assigned, but no way had been set up to do that.

During the trial the defence luckily noticed an odd discrepancy in the computer logs. It was really spiky in an unexplained way. On some days hardly any readings seemed to be taken, for example. One odd trough corresponded to a day the manufacturer said they had visited the hospital. They were asked to explain what they had done…

The hospital had asked them to get the data ready to give to the police. The manufacturer’s engineer who visited therefore ‘tidied up’ the database, deleting all the incomplete records…including all the ones the nurses had supposedly fabricated! The police had no idea this had been done.

Suddenly, no evidence

When this was revealed in court, the judge ruled that all the prosecution’s evidence was unusable. The prosecution said, therefore, they had no evidence at all to present. In this situation, the trial ‘collapses’: the nurses were completely innocent, and the trial immediately stopped.

The trial had already blighted the careers of lots of good nurses though. In fact, some of the other nurses pleaded guilty as they had no memory of what had actually happened but had been confronted with the ‘fact’ that they must have been negligent as “the computers could not lie”. Some were jailed. In the UK, you can be given a much shorter jail sentence, or maybe none at all, if you plead guilty. It can make sense to plead guilty even if you know you aren’t — you only need to think the court will find you guilty. Which isn’t the same thing.

Silver bullets?

Governments see digitalisation as a silver bullet to save money and improve care. It can do that if you get it right. But digital is much harder to get right than most people realise. In the story here, not getting the digital right — and not understanding it — caused serious problems for lots of nurses.

It takes skill and deep understanding to design digital things to work in a way that really makes things better. It’s hard for hospitals to understand the complexities in what they are buying. Ultimately, it’s nurses and doctors who make it work. They have to.

They shouldn’t be automatically blamed when things go wrong because digital technology is hard to design well.

Related Magazine …

Issue 25 – Technology worn out (and about)

Subscribe to be notified whenever we publish a new post to the CS4FN blog.

This blog is funded by EPSRC on research agreement EP/W033615/1.

Mary Clem: getting it right

by Paul Curzon, Queen Mary University of London

Mary Clem was a pioneer of dependable computing long before the first computers existed. She was a computer herself, but became more like a programmer.

A tick on a target of red concentric zeros — Image by Paul Curzon

Back before there were computers there were human computers: people who did the calculations that machines now do. Victorian inventor, Charles Babbage, worked as one. It was the inspiration for him to try to build a steam-powered computer. Often, however, it was women who worked as human computers especially in the first half of the 20th century. One was Mary Clem in the 1930s. She worked for Iowa State University’s statistical lab. Despite having no mathematical training and finding maths difficult at school, she found the work fascinating and rose to become the Chief Statistical Clerk. Along the way she devised a simple way to make sure her team didn’t make mistakes.

The start of stats

Big Data, the idea of processing lots of data to turn that data into useful information, is all the rage now, but its origins lie at the start of the 20th century, driven by human computers using early calculating machines. The 1920s marked the birth of statistics as a practical mathematical science. A key idea was that of calculating whether there were correlations between different data sets such as rainfall and crop growth, or holding agricultural fairs and improved farm output. Correlation is the the first step to working out what causes what. it allows scientists to make progress in working out how the world works, and that can then be turned into improved profits by business, or into positive change by governments. It became big business between the wars, with lots of work for statistical labs.

Calculations and cards

Originally, in and before the 19th century, human computers did all the calculations by hand. Then simple calculating machines were invented, so could be used by the human computers to do the basic calculations needed. In 1890 Herman Hollerith invented his Tabulator machine (his company later became computing powerhouse, IBM). The Tabulator machine was originally just a counting machine created for the US census, though later versions could do arithmetic too. The human computers started to use them in their work. The tabulator worked using punch cards, cards that held data in patterns of holes punched in to them. A card representing a person in the census might have a hole punched in one place if they were male, and in a different place if they were female. Then you could count the total number of any property of a person by counting the appropriate holes.

Mary was being more than a computer,
and becoming more like a programmer

Mary’s job ultimately didn’t just involve doing calculations but also involved preparing punch cards for input into the machines (so representing data as different holes on a card). She also had to develop the formulae needed for doing calculations about different tasks. Essentially she was creating simple algorithms for the human computers using the machines to follow, including preparing their input. Her work was therefore moving closer to that of a computer operator and then programmer’s job.

Zero check

She was also responsible for checking calculations to make sure mistakes were not being made in the calculations. If the calculations were wrong the results were worse than useless. Human computers could easily make mistakes in calculations, but even with machines doing calculations it was also possible for the formulae to be wrong or mistakes to be made preparing the punch cards. Today we call this kind of checking of the correctness of programs verification and validation. Since accuracy mattered, this part of he job also mattered. Even today professional programming teams spend far more time checking their code and testing it than writing it.

Mary took the role of checking for mistakes very seriously, and like any modern computational thinker, started to work out better ways of doing it that was more likely to catch mistakes. She was a pioneer in the area of dependable computing. What she came up with was what she called the Zero Check. She realised that the best way to check for mistakes was to do more calculations. For the calculations she was responsible for, she noticed that it was possible to devise an extra calculation, whereby if the other answers (the ones actually needed) have been correctly calculated then the answer to this new calculation is 0. This meant, instead of checking lots of individual calculations with different answers (which is slow and in itself error prone), she could just do this extra calculation. Then, if the answer was not zero she had found a mistake.

A trivial version of this general idea when you are doing a single calculation is to just do it a second time, but in a different way. Rather than checking manually if answers are the same, though, if you have a computer it can subtract the two answers. If there are no mistakes, the answer to this extra check calculation should be 0. All you have to do is to look for zero answers to the extra subtractions. If you are checking lots of answers then, spotting zeros amongst non-zeros is easier for a human than looking for two numbers being the same.

Defensive Programming

This idea of doing extra calculations to help detect errors is a part of defensive programming. Programmers add in extra checking code or “assertions” to their programs to check that values calculated at different points in the program meet expected properties automatically. If they don’t then the program itself can do something about it (issue a warning, or apply a recovery procedure, for example).

A similar idea is also used now to catch errors whenever data is sent over networks. An extra calculation is done on the 1s and 0s being sent and the answer is added on to the end of the message. When the data is received a similar calculation is performed with the answer indicating if the data has been corrupted in transmission.

A pioneering human computer

Mary Clem was a pioneer as a human computer, realising there could be more to the job than just doing computations. She realised that what mattered was that those computations were correct. Charles Babbages answer to the problem was to try to build a computing machine. Mary’s was to think about how to validate the computation done (whether by a human or a machine).

Related Magazines …

EPSRC supports this blog through research grant EP/W033615/1.

Recognising (and addressing) bias in facial recognition tech

A unit containing four sockets, 2 USB and 2 for a microphone and speakers. — Happy, though surprised, sockets Photo taken by Jo Brodie in 2016 at Gladesmore School in London.

Some people have a neurological condition called face blindness (also known as ‘prosopagnosia’) which means that they are unable to recognise people, even those they know well – this can include their own face in the mirror! They only know who someone is once they start to speak but until then they can’t be sure who it is. They can certainly detect faces though, but they might struggle to classify them in terms of gender or ethnicity. In general though, most people actually have an exceptionally good ability to detect and recognise faces, so good in fact that we even detect faces when they’re not actually there – this is called pareidolia – perhaps you see a surprised face in this picture of USB sockets below.

How about computers? There is a lot of hype about face recognition technology as a simple solution to help police forces prevent crime, spot terrorists and catch criminals. What could be bad about being able to pick out wanted people automatically from CCTV images, so quickly catch them?

What if facial recognition technology isn’t as good at recognising faces as it has sometimes been claimed to be, though? If the technology is being used in the criminal justice system, and gets the identification wrong, this can cause serious problems for people (see Robert Williams’ story in “Facing up to the problems of recognising faces“).

“An audit of commercial facial-analysis tools
found that dark-skinned faces are misclassified
at a much higher rate than are faces from any
other group. Four years on, the study is shaping
research, regulation and commercial practices.”
The unseen Black faces of AI algorithms
(19 October 2022) Nature

In 2018 Joy Buolamwini and Timnit Gebru shared the results of research they’d done, testing three different commercial facial recognition systems. They found that these systems were much more likely to wrongly classify darker-skinned female faces compared to lighter- or darker-skinned male faces. In other words, the systems were not reliable. (Read more about their research in “The gender shades audit“).

“The findings raise questions about
how today’s neural networks, which …
(look for) patterns in huge data sets,
are trained and evaluated.”
Study finds gender and skin-type bias
in commercial artificial-intelligence systems
(11 February 2018) MIT News

Their work has shown that face recognition systems do have biases and so are not currently at all fit for purpose. There is some good news though. The three companies whose products they studied made changes to improve their facial recognition systems and several US cities have already banned the use of this tech in criminal investigations. More cities are calling for it too and in Europe, the EU are moving closer to banning the use of live face recognition technology in public places. Others, however, are still rolling it out. It is important not just to believe the hype about new technology and make sure we do understand their limitations and risks.

Jo Brodie and Paul Curzon, Queen Mary University of London

The red sock of doom – trying to catch mistakes before they happen

Washing machine mistake — Washing machine with red sock in white washing. Image by Dominic Furniss from Errordiary and Flikr, provided for this article CC BY-NC-SA 2.0

A red sock in with your white clothes wash – guess what happened next? What can you do to prevent it from happening again? Why should a computer scientist care? It turns out that red socks have something to teach us about medical gadgets.

How can we stop red socks from ever turning our clothes pink again? We need a strategy. Here are some possibilities.

Don’t wear red socks.
Take a ‘how to wash your clothes’ course.
Never make mistakes.
Get used to pink clothes.

Let’s look at them in turn – will they work?

Don’t wear red socks: That might help but it’s not much use if you like red socks or if you need them to match your outfit. And how would it help when you wear purple, blue or green socks? Perhaps your clothes will just turn green instead.

Take a ‘how to wash your clothes’ course: Training might help: you’d certainly learn that a red sock and white clothes shouldn’t be mixed, you probably did know that anyway, though. It won’t stop you making a similar mistake again.

Never make misteaks: Just never leave a red sock in your white wash. If only! Unfortunately everyone makes mistakes – that’s why we have erasers on pencils and a delete key on computers – this idea just won’t work.

Get used to pink clothes: Maybe, but it’s not ideal. It might not be so great turning up to school in a pink shirt if everyone else is wearing a white one.

What if the problem’s more serious?

We can probably live with pink clothes, but what happens if a similar mistake is made at a hospital? Not socks, but medicines. We know everyone makes mistakes so how do we stop those mistakes from harming patients? Special machines are used in hospitals to pump medicine directly into a patient’s arm, for example, and a nurse needs to tell it how much medicine to give – if the dose is wrong the patient won’t get better, and might even get worse.

What have we learned from our red sock strategies? We can’t stop giving patients medicine and we don’t want to get used to mistakes so our first and fourth strategies won’t work. We can give nurses more training but everyone makes mistakes even when trained, so the third suggestion isn’t good enough either and it doesn’t stop someone else making the same mistake.

We need to stop thinking of mistakes as a problem that people make and instead as a problem that systems thinking can solve. That way we can find solutions that work for everyone. One possibility is to check whether changes to the device might make mistakes less likely in the first place.

Errors? Or arrows?

Most medical machines are controlled with a panel with numbered keys (a number keypad) like on mobile phones, or up and down arrows (an arrow keypad) like you sometimes get on alarm clocks. CHI+MED researchers have been asking questions like: which way is best for entering numbers quickly, but also which is best for entering numbers accurately? They’ve been running experiments where people use different keypads, are timed and their mistakes are recorded. The researchers also track where people are looking while they use the keypads. Another approach has been to create mathematical descriptions of the different keypads and then mathematically explore how bad different errors might be.

It turns out that if you can see the numbers on a keypad in front of you it’s very easy to type them in quickly, though not always correctly! You need to check the display to see if you have actually put in the right ones. Worse, mistakes that are made are often massive – ten times too much or more. The arrow keypads are a little slower to use but because people are already looking at the display (to see what numbers are appearing) they can help nurses be more accurate, not only are fewer mistakes made but those that are made tend to be smaller.

Smart machines help users

A medical device that actively helps users avoid mistakes helps everyone using it (and the patients it’s being used on!). Changing the interface to reduce errors isn’t the only solution though. Modern machines have ‘intelligent drug libraries’ that contain information about the medicines and what sort of doses are likely and safe. Someone might still mistakenly tell the machine to give too high a dose but now it can catch the error and ask the nurse to double-check. That’s like having a washing machine that can spot a brightly coloured sock in a white wash and that refuses to switch on till it has been removed.

Building machines with a better ability to catch errors (remember, we all make mistakes) and helping users to recover from them easily is much more reliable than trying to get rid of all possible errors by training people. It’s not about avoiding red socks, or errors, but about putting better systems in place to make sure that we find them before we press that big ‘Start’ button.

Magazines …

Machines making medicine safer

You can find a copy of this article on pages 4 and 5 in issue 17 (Machines Making Medicine Safer) of CS4FN 17. This article was originally published on CHI+MED

Subscribe to be notified whenever we publish a new post to the CS4FN blog.

This blog is funded by EPSRC on research agreement EP/W033615/1.

	Music AI Kriss Kross… on The day the music didn’t …
	Music AI Kriss Kross… on Separate your stems
	Musical Algorithms… on You’ll be Bach! – create…
	The art of animatron… on I’m (not) a little …
	The Decline and Fall… on Victorian volunteers needed…

More on …

More on …

More on …

Magazines …

Now think about your local hospital.

Don’t miss the near misses

Report it!

More on …

Magazines …

Million to one chances are guaranteed to happen.

More on …

Related Magazine …

Related Magazine …

More on …

Negligent nurses? Or dodgy digital?

Critical Tests

What really happened?

Disappearing readings

Suddenly, no evidence

Silver bullets?

Related Magazine …

The start of stats

Calculations and cards

Zero check

Defensive Programming

A pioneering human computer

More on …

Related Magazines …

More on

Further reading

More technical articles

Let’s look at them in turn – will they work?

What if the problem’s more serious?

Errors? Or arrows?

Smart machines help users

More on …

Magazines …