An Attention-based Approach to Hierarchical Multi-label Music Instrument ClassificationView Publication
Although music is typically multi-label, many works have studied hierarchical music tagging with simplified settings such as single-label data. Moreover, there lacks a framework to describe various joint training methods under the multi-label setting. In order to discuss the above topics, we introduce hierarchical multi-label music instrument classification task. The task provides a realistic setting where multi-instrument real music data is assumed. Various hierarchical methods that jointly train a DNN are summarized and explored in the context of the fusion of deep learning and conventional techniques. For the effective joint training in the multi-label setting, we propose two methods to model the connection between fine- and coarse-level tags, where one uses rule-based grouped max-pooling, the other one uses the attention mechanism obtained in a data-driven manner. Our evaluation reveals that the proposed methods have advantages over the method without joint training. In addition, the decision procedure within the proposed methods can be interpreted by visualizing attention maps or referring to fixed rules.
Related PublicationsView All
STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Kazuki Shimada, Archontis Politis*, Parthasaarathy Sudarsanam*, Daniel Krause*, Kengo Uchida, Sharath Adavanne*, Aapo Hakala*, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Tuomas Virtanen*, Yuki MitsufujiWhile direction of arrival (DOA) of sound events is generally estimated from multichannel audio data recorded […]
Automatic Piano Transcription with Hierarchical Frequency-Time Transformer
Keisuke Toyama, Taketo Akama, Yukara Ikemiya, Yuhta Takida, Wei-Hsiang Liao, Yuki MitsufujiTaking long-term spectral and temporal dependencies into account is essential for automatic piano transcriptio […]