3D models in motion

by Paul Curzon, Queen Mary University of London
based on a 2016 talk by Lourdes Agapito

Wireframe mesh image by Gerd Altmann from Pixabay

The cave paintings in Lascaux, France are early examples of human culture from 15,000 BC. There are images of running animals and even primitive stop motion sequences – a single animal painted over and over as it moves. Even then, humans were intrigued with the idea of capturing the world in motion! Computer scientist Lourdes Agapito is also captivated by moving images. She is investigating whether it’s possible to create algorithms that allow machines to make sense of the moving world around them just like we do. Over the last 10 years her team have shown, rather spectacularly, that the answer is yes.

People have been working on this problem for years, not least because the techniques are behind the amazing realism of CGI characters in blockbuster movies. When we see the world, somehow our brain turns all that information about colour and intensity of light hitting our eyes into a scene we make sense of – we can pick out different objects and tell which are in front and which behind, for example. In the 1950s psychophysics* researcher Gunnar Johansson showed how our brain does this. He dressed people in black with lightbulbs fastened around their bodies. He then filmed them walking, cycling, doing press-ups, climbing a ladder, all in the dark … with only the lightbulbs visible. He found that people watching the films could still tell exactly what they were seeing, despite the limited information. They could even tell apart two people dancing together, including who was in front and who behind. This showed that we can reconstruct 3D objects from even the most limited of 2D information when it involves motion. We can keep track of a knee, and see it as the same point as it moves around. It also shows that we use lots of ‘prior’ information – knowledge of how the world works – to fill in the gaps.

Shortcuts

Film-makers already create 3D versions of actors, but they use shortcuts. The first shortcut makes it easier to track specific points on an actor over time. You fix highly visible stickers (equivalent to Johansson’s light bulbs) all over the actor. These give the algorithms clear points to track. This is a bit of a pain for the actors, though. It also could never be used to make sense of random YouTube or CCTV footage, or whatever a robot is looking at.

The second shortcut is to surround the action with cameras so it’s seen from lots of angles. That makes it easier to track motion in 3D space, by linking up the points. Again this is fine for a movie set, but in other situations it’s impractical.

A third shortcut is to create a computer model of an object in advance. If you are going to be filming an elephant, then hand-create a 3D model of a generic elephant first, giving the algorithms something to match. Need to track a banana? Then create a model of a banana instead. This is fine when you have time to create models for anything you might want your computer to spot.

It is all possible for big budget film studios, if a bit inconvenient, but it’s totally impractical anywhere else.

No Shortcuts

Lourdes took on a bigger challenge than the film industry. She decided to do it without the shortcuts: to create moving 3D models from single cameras, applied to any traditional 2D footage, with no pre-placed stickers or fixed models created in advance.

When she started, a dozen or so years ago, making any progress looked incredibly difficult. Now she has largely solved the problem. Her team’s algorithms are even close to doing it all in real time, so making sense of the world as it happens, just like us. They are able to make really accurate models down to details like the subtle movements of their face as a person talks and changes expression.

There are several secrets to their success, but Johansson’s revelation that we rely on prior knowledge is key. One of the first breakthroughs was to come up with ways that individual points in the scene like the tip of a person’s nose could be tracked from one frame of video to the next. Doing this well relies on making good use of prior information about the world. For example, points on a surface are usually well-behaved in that they move together. That can be used to guess where a point might be in the next frame, given where others are.

The next challenge was to reconstruct all the pixels rather than just a few easy to identify points like the tip of a nose. This takes more processing power but can be done by lots of processors working on different parts of the problem. Key to this was to take account of the smoothness of objects. Essentially a virtual fine 3D mesh is stuck over the object – like a mask over a face – and the mesh is tracked. You can then even stick new stuff on top of the mesh so they move together – adding a moustache, or painting the face with a flag, for example, in a way that changes naturally in the video as the face moves.

Once this could all be done, if slowly, the challenge was to increase the speed and accuracy. Using the right prior information was again what mattered. For example, rather than assuming points have constant brightness, taking account of the fact that brightness changes, especially on flexible things like mouths, mattered. Other innovations were to split off the effect of colour from light and shade.

There is lots more to do, but already the moving 3D models created from YouTube videos are very realistic, and being processed almost as they happen. This opens up amazing opportunities for robots; augmented reality that mixes reality with the virtual world; games, telemedicine; security applications, and lots more. It’s all been done a little at a time, taking an impossible-seeming problem and instead of tackling it all at once, solving simpler versions. All the small improvements, combined with using the right information about how the world works, have built over the years into something really special.

*psychophysics is the “subfield of psychology devoted to the study of physical stimuli and their interaction with sensory systems.”

This article was first published on the original CS4FN website and a copy appears on pages 14 and 15 in “The women are (still) here”, the 23rd issue of the CS4FN magazine. You can download a free PDF copy by clicking on the magazine’s cover below, along with all of our free material.

Another article on 3D research is Making sense of squishiness – 3D modelling the natural world (21 November 2022).

Related Magazine …

CS4FN Issue 23 – The women are (still) here

EPSRC supports this blog through research grant EP/W033615/1.

Hidden Figures: NASA’s brilliant calculators

Full Moon with a blue filter — Full Moon image by PIRO from Pixabay

NASA Langley was the birthplace of the U.S. space program where astronauts like Neil Armstrong learned to land on the moon. Everyone knows the names of astronauts, but behind the scenes a group of African-American women were vital to the space program: Katherine Johnson, Mary Jackson and Dorothy Vaughan. Before electronic computers were invented ‘computers’ were just people who did calculations and that’s where they started out, as part of a segregated team of mathematicians. Dorothy Vaughan became the first African-American woman to supervise staff there and helped make the transition from human to electronic computers by teaching herself and her staff how to program in the early programming language, FORTRAN.

The women switched from being the computers to programming them. These hidden women helped put the first American, John Glenn, in orbit, and over many years worked on calculations like the trajectories of spacecraft and their launch windows (the small period of time when a rocket must be launched if it is to get to its target). These complex calculations had to be correct. If they got them wrong, the mistakes could ruin a mission, putting the lives of the astronauts at risk. Get them right, as they did, and the result was a giant leap for humankind.

See the film ‘Hidden Figures’ for more of their story.

Paul Curzon, Queen Mary University of London

from the archive

Magazines …

Front cover of CS4FN issue 29 - Diversity in Computing

Subscribe to be notified whenever we publish a new post to the CS4FN blog.

This blog is funded by EPSRC on research agreement EP/W033615/1.

Executable Biology

Computing cancer using computational modelling

(From the archive)

Cells Mitosis — Image by Colin Behrens from Pixabay

Can a robot get cancer? Silly question. Our bodies are made of cells. Robots aren’t. Cells are the basic building blocks of life and come in lots of different forms from long thin nerve cells that allow us to sense the world, to round blood cells that carry oxygen around our bodies. Cancer occurs when cells go rogue and start reproducing in an uncontrolled way. A computer can’t get cancer, but you can allow virtual diseases to attack virtual cells inside a computer. Doing that may just help find cures. That is what Jasmin Fisher, who leads a research group at Microsoft Research in Cambridge, has devoted her career to.

Becoming a medic isn’t the only way to help save lives!

Computational Modelling is changing the way the sciences are done. It is the idea that you can run experiments on virtual versions of things you are investigating. A computer model is essentially just a program that simulates the phenomena of interest. For example, by writing a program that simulates the laws of Physics, you can use it to run virtual Physics experiments about the motion of the planets, say. If your virtual planets do follow the paths real planets do, then you have evidence the laws are right. If they don’t your laws (or the models) need to change. You can also make predictions such as when an eclipse will happen. If you are right it suggests the laws you coded are good descriptions of reality. If wrong, back to the drawing board.

Jasmin has been pioneering this idea with the stuff of life and death. She focusses on modelling cells and the specific ways that we think cancer attacks them. It gives a way of exploring what is going on at the level of the molecules inside cells, and so how well new medicines might, or might not, work. Experiments can be done quickly and easily on the programmed models by running simulations. That means the real experiments, taking up expensive lab time, can focus on things that are most likely to be successful. Jasmin’s work has helped researchers design more effective actual experiments because they start with a better understanding of what is going on. One of the most important questions she is studying is how cells end up becoming what they are, and how this differs between normal cells and cancer cells. Understand this and we will be much closer to understanding how to stop cancer.