Self-supervised audio encoder with contrastive pretraining for Respiratory Anomaly Detection
View PublicationAbstract
Accurate analysis of lung sounds is essential for early disease detection and monitoring. We propose a self-supervised contrastive audio encoder for automated respiratory anomaly detection. The model consists of a direct waveform audio encoder trained in two stages. First, self-supervised pretraining using an acoustic dataset (Audioset) is used to extract high-level representations of the input audio. Second, domain-specific semi-supervised contrastive training is employed on a respiratory database to distinguish cough and breathing sounds. This direct waveform-based encoder outperforms conventional mel-frequency cepstral coefficients (MFCC) and image spectrogram features with CNN-ResNetbased detection models. It is also shown that the pretraining using varied audio sounds significantly improves detection accuracy compared to speech featurization models such as Wav2Vec2.0 and HuBERT. The proposed model achieves the highest accuracy score (91%) and inter-patient (specificity and sensitivity) evaluation score (84.1%) on the largest respiratory anomaly detection dataset. Our work further contributes to remote patient care via accurate continuous monitoring of respiratory abnormalities.
ICASSP SASB workshop