Audible Panorama: Automatic Spatial Audio Generation for Panorama Imagery.


*Haikun Huang, *Michael S. Solah, Dingzeyu Li, Lap-Fai Yu
*Equal contributors
Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 2019)
Conditionally accepted, more details coming soon.

[Paper], [Paper LowRes], [Supplementary Paper], [Supplementary Package], [Raw Video], [Bibtex], [Result & Sound Database]

Media Coverage

  • Seamless VR - a Japanese media [Website] [Twitter] [English Twitter]
  • Abstract

    As 360° cameras and virtual reality headsets become more popular, panorama images have become increasingly ubiquitous. While sounds are essential in delivering immersive and interactive user experiences, most panorama images, however, do not come with native audio. In this paper, we propose an automatic algorithm to augment static panorama images through realistic audio assignment. We accomplish this goal through object detection, scene classification, object depth estimation, and audio source placement. We built an audio file database composed of over 500 audio files to facilitate this process.

    We designed and conducted a user study to verify the efficacy of various components in our pipeline. We run our method on a large variety of panorama images of indoor and outdoor scenes. By analyzing the statistics, we learned the relative importance of these components, which can be used in prioritizing for power-sensitive time-critical tasks like mobile augmented reality (AR) applications.

    Keywords

    immersive media, spatial audio, panorama images, virtual reality, augmented reality

    Bibtex

    @inproceedings{ambient,
       author = "Haikun Huang and Michael S. Solah and Dingzeyu Li and Lap-Fai Yu",
       title = "Audible Panorama: Automatic Spatial Audio Generation for Panorama Imagery",
       booktitle = "Proceedings of the 37th Annual ACM Conference on Human Factors in Computing Systems",
       year = "2019"
    }

    Video

    Acknowledgments

    We are grateful to the anonymous reviewers for their useful comments and suggestions. We would also like to thank the user study participants, and we are also thankful for the free audio fles from freesound.org. The authors would also like to thank all the Flickr users for sharing their panorama images.

    This research project is supported by the National Science Foundation under award number 1565978.