a stage filled with lights and music equipment

This AI can harness sound to reveal the structure of invisible spaces

Imagine walking through a series of rooms, spinning closer and closer to a sound source, whether it’s music playing through a loudspeaker or a person speaking. The noise you hear as you move through this maze will distort and fluctuate depending on where you are. Considering a scenario like this, a team of researchers from MIT and Carnegie Mellon University worked on a model that can realistically describe how the sound around a listener changes as they move through a certain space. . They published their work on this topic in a new preprint journal last week.

The sounds we hear in the world can vary depending on factors such as the type of spaces the sound waves bounce off, the material they hit or pass through, and the distance they have to travel. These characteristics can influence how sound scatters and decays. But researchers can also reverse this process. They can take a sound sample and even use it to infer what the environment looks like (in a way, it’s like how animals use echolocation to “see”).

“We mainly model space acoustics, so the [focus is on] reverberations,” says Yilun Du, an MIT graduate student and author of the paper. “Maybe if you’re in a concert hall there’s a lot of reverberation, maybe if you’re in a cathedral there’s a lot of echo against if you’re in a small room there’s no there’s no real echo.”

Their model, called neural acoustic field (NAF), is a neural network that can account for both the position of the sound source and the listener, as well as the geometry of the space through which the sound traveled. travel.

To train the NAF, the researchers fed it visual information about the scene and some spectrograms (representation of visual patterns that capture the amplitude, frequency, and duration of sounds) of audio gathered from what the listener would hear from different viewpoints and positions.

“We have a limited number of data points; from there, we adapt a type of model that can accurately synthesize the sound that any position in the room would give and the sound of a new position,” says Du. “Once we have adapted this model, you can simulate all kinds of virtual tours.”

The team used audio data obtained from a virtually simulated room. “We also have results in real scenes, but the problem is that collecting this data in the real world takes a long time,” Du notes.

Using this data, the model can learn to predict how the sounds the listener hears would change if they moved to another position. For example, if the music was coming from a loudspeaker in the center of the room, that sound would become louder as the listener moved closer to it and would become more muffled if the listener entered another room. The NAF can also use this information to predict the structure of the world around the listener.

A great application of this type of model is in virtual reality, so that sounds can be accurately generated for a listener moving through a space in virtual reality. The other big use he sees is in artificial intelligence.

“We have a lot of role models. But perception is not limited to vision, sound is also very important. We can also imagine it to be an attempt at perception using sound,” he says.

Sound isn’t the only medium researchers are playing with with AI. Machine learning technology today can take 2D images and use them to generate a 3D model of an object, providing different perspectives and new views. This technique is especially useful in virtual reality environments, where engineers and artists need to build realism into screen spaces.

Additionally, designs like this sound-focused one could improve current sensors and devices in low-light or underwater conditions. “Sound also allows you to see around corners. There is a lot of variability depending on lighting conditions. The objects are very different,” Du says. “But the sound bounces a bit the same most of the time. It’s a different sensory modality.

For now, one of the main obstacles to the further development of their model is the lack of information. “One thing that was surprisingly difficult was actually getting data, because people didn’t explore that very much,” he says. “When you try to synthesize new views in virtual reality, there are tons of datasets, all these real images. With more datasets, it would be very interesting to further explore these approaches, especially in real scenes.

Watch (and listen to) a presentation of a virtual space, below:


#harness #sound #reveal #structure #invisible #spaces

Leave a Comment

Your email address will not be published. Required fields are marked *