“Looking to Listen at the Cocktail Party” is Google’s “deep learning audio-visual model for isolating a single speech signal from a mixture of sounds such as other voices and background noise,” as reported on the Google Research blog.
In this work, we are able to computationally produce videos in which speech of specific people is enhanced while all other sounds are suppressed. Our method works on ordinary videos with a single audio track, and all that is required from the user is to select the face of the person in the video they want to hear, or to have such a person be selected algorithmically based on context. We believe this capability can have a wide range of applications, from speech enhancement and recognition in videos, through video conferencing, to improved hearing aids, especially in situations where there are multiple people speaking.
Meanwhile, “local law enforcement in China used facial recognition to recognize and arrest a man in a crowd of 60,000 people,” Mic reported.
Facial recognition technology is huge in China. Cross the street improperly, for example, and facial recognition tech will text you with a fine. As Maya Wang, a researcher at Human Rights Watch, pointed out to the Washington Post, China’s “complete lack of effective privacy protections” coupled with targeting individuals viewed as “politically threatening” is what sets it apart. According to the Atlantic, China’s facial recognition-powered surveillance coupled with the upcoming social credit system could lead to omniscience.