Abstract:
Human-robot interaction often requires many sub-systems to work together to facilitate more natural and intelligent
interactions with multiple humans. For this work, the relevant systems include audio and visual direction of arrival estimation that
is used in an encompassing sensor fusion framework. The presented sub-systems are used concurrently online to allow
humanoid robots to identify active speakers in a scene, track human subjects, and identify when other subjects may require
attention. We present evaluations of performance, while also implementing relevant humanlike behaviors on the REEM-C
Humanoid Robot. A conducted user study delivers valuable feedback on these systems, which provide a strong foundation for
improved humanoid intelligence and more innovative human-robot interaction.
Download here :
https://ieeexplore.ieee.org/document/10375198
Citation :
P. Barot, E. N. MacDonald, and K. Mombaur, “An Audio-Video Sensor Fusion Framework To Augment Humanoid Capabilities For Identifying
And Interacting With Human Conversational Partners,” 2023 IEEE-RAS 22nd International Conference on Humanoid Robots (Humanoids),
Austin, TX, USA, 2023, pp. 1-8, doi: 10.1109/Humanoids57100.2023.10375198.
keywords: {Visualization;Direction-of-arrival estimation;Humanoid robots;Human-robot interaction;Sensor fusion;Robot sensing systems},