Self-supervised audio encoder with contrastive pretraining for Respiratory Anomaly Detection

Home
Publications
Self-supervised audio encoder with contrastive pretraining for Respiratory Anomaly Detection

Research Area

Audio, Speech & NLP

Author

Kulkarni Shubham, Watanabe Hideaki, Fuminori Homma
* External authors

Company

Sony Group Corporation

Venue

ICASSP

Date

2023

View Publication

Abstract

Accurate analysis of lung sounds is essential for early disease detection and monitoring. We propose a self-supervised contrastive audio encoder for automated respiratory anomaly detection. The model consists of a direct waveform audio encoder trained in two stages. First, self-supervised pretraining using an acoustic dataset (Audioset) is used to extract high-level representations of the input audio. Second, domain-specific semi-supervised contrastive training is employed on a respiratory database to distinguish cough and breathing sounds. This direct waveform-based encoder outperforms conventional mel-frequency cepstral coefficients (MFCC) and image spectrogram features with CNN-ResNetbased detection models. It is also shown that the pretraining using varied audio sounds significantly improves detection accuracy compared to speech featurization models such as Wav2Vec2.0 and HuBERT. The proposed model achieves the highest accuracy score (91%) and inter-patient (specificity and sensitivity) evaluation score (84.1%) on the largest respiratory anomaly detection dataset. Our work further contributes to remote patient care via accurate continuous monitoring of respiratory abnormalities.

ICASSP SASB workshop

この記事をシェアする