Briefly Noted

Things Read, Seen or Heard Elsewhere

Samsung's AI lab releases video showing how it can manipulate a single image to emulate someone talking. An AI startup creates a near perfect reproduction of a popular podcaster's voice. It's only just the beginning.

Samsung Labs’ AI team just released videos demonstrating that they can emulate someone talking from a single still image. Add that to the recent Joe Rogan voice emulation released by AI startup Dessa and it’s easy to imagine a near future of incredibly authentic looking (and sounding) forgery in disinformation campaigns.

The Samsung video below starts about four minutes in and uses single photos of historical figures and then paintings to demonstrate the technology’s capabilities to generate realistic speech gestures. You can rewind to the beginning to get the full science and technology around the machine learning used to create the effect, along with more sophisticated emulations.1

Now here’s the Joe Rogan voice simulation:

Technology’s going to do what technology’s going to do: rifle through pandora’s box to figure out the possible.

This is all fun and games when integrated into a phone app, or game, or utilized to generate characters in a future film. But we know enough about state and non-state actors to fear what this might mean for society, security and political futures.

Here’s Dessa in a blog post accompanying the release of their Rogan voice simulation (emphasis mine):

As AI practitioners building real-world applications, we’re especially cognizant of the fact that we need to be talking about the implications of this.

Because clearly, the societal implications for technologies like speech synthesis are massive. And the implications will affect everyone. Poor consumers and rich consumers. Enterprises and governments.

Right now, technical expertise, ingenuity, computing power and data are required to make models like RealTalk perform well. So not just anyone can go out and do it. But in the next few years (or even sooner), we’ll see the technology advance to the point where only a few seconds of audio are needed to create a life-like replica of anyone’s voice on the planet.

It’s pretty f*cking scary.

I write this as a garden variety manipulated video of Nancy Pelosi is making the rounds. In this case, the video’s simply slowed to make her appear to drunkenly slur her words. Give it time. A few months, a few years, and disinformation campaigns will generate words never spoken over video never shot.

F*cking scary, indead.


Venture Beat, Samsung’s AI animates paintings and photos without 3D modeling
ArXiv (PDF), Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
Motherboard, This AI-Generated Joe Rogan Voice Sounds So Real It’s Scary
Lawfare, Deep Fakes: A Looming Crisis for National Security, Democracy and Privacy?

Lead Image: Volcan Fuego, Antigua, Guatemala by Ben Turnbull

  • Venture Beat has a relatively genteel primer on the technology being used. If you want to jump deep into the technology, visit arXiv to read the research(PDF) that accompanies the video.

Self-driving cars are only as smart as the artificial intelligence controlling them. A new study indicates that the darker your skin, the more likely you are to be hit by one. There's a simple, and unfortunate, reason why.

A new study exploring how self-driving cars avoid pedestrian collisions draws an unfortunate conclusion: the darker your skin, the more likely you are to be hit.

The study, published in February, suggests an inherent algorithmic bias in the technologies guiding these vehicles. This stems from data sets that are used to train the systems.

As Sigal Samuel explains at Vox:

[T]he authors of the self-driving car study note that a couple of factors are likely fueling the disparity in their case. First, the object-detection models had mostly been trained on examples of light-skinned pedestrians. Second, the models didn’t place enough weight on learning from the few examples of dark-skinned people that they did have.

More heavily weighting that sample in the training data can help correct the bias, the researchers found. So can including more dark-skinned examples in the first place.

Basically, algorithmic systems learn from the datasets fed to them and then extrapolate and learn from that data. If the data isn’t diverse, the system won’t learn diversity. See, for example, Google circa 2015 when its Photos app identified black people as “gorillas.”

There’s a caveat to the study though. One that illustrates another problem and provides a lesson. The study’s authors didn’t have access to the actual object-detection models used by companies making self-driving cars. These are carefully guarded trade secrets, the special sauce artificial intelligence formula of machine learning.

Instead, the authors used what’s available to researchers studying such issues, in this case, the Berkeley Driving Dataset.

So, problem: companies developing self-driving cars guard their data. This is more or less understandable but leads to public health and legal conundrums. When the public doesn’t have access to how the cars make life and death decisions – such as whether to hit the cat or dog, or the old man or young child in a split second incident where a collision is bound to happen – how is it to determine a host of issues ranging from when and where these vehicles can operate to who’s culpable in the event of a tragic accident.

Let’s back up a minute though. Self-driving cars use a few main systems to perform as they do. The Economist explains it like so:

The computer systems that drive cars consist of three modules. The first is the perception module, which takes information from the car’s sensors and identifies relevant objects nearby… Cameras can spot features such as lane markings, road signs and traffic lights. Radar measures the velocity of nearby objects. LIDAR determines the shape of the car’s surroundings in fine detail, even in the dark. The readings from these sensors are combined to build a model of the world, and machine-learning systems then identify nearby cars, bicycles, pedestrians and so on. The second module is the prediction module, which forecasts how each of those objects will behave in the next few seconds. Will that car change lane? Will that pedestrian step into the road? Finally, the third module uses these predictions to determine how the vehicle should respond (the so-called “driving policy”): speed up, slow down, or steer left or right.

Here’s a helpful illustration from The New York Times:

Now, the lesson. Other industries such as pharmaceuticals guard their special sauce but also, ostensibly, go through rigorous trials before being brought to market.

Our software, our AI, needs to do something similar too. States are implementing rules and regulations around autonomous vehicles but these are largely mechanical.

Whether the lesson is learned from the drug industry or elsewhere, there needs to be a greater transparency that allows for deep investigations into the algorithmic bias of types before self-driving cars appear in the streets.

Because, of course, we have well-intentioned yet face-palming initiatives such as this:


Image: Photo by Gareth Harrison on Unsplash