Policy-Adaptive Estimator Selection for Off-Policy Evaluation
Fine-grained Image Editing by Pixel-wise Guidance Using Diffusion Models
Rawgment: Noise-Accounted RAW Augmentation Enables Recognition in a Wide Variety of Environments
Efficient Joint Detection and Multiple Object Tracking with Spatially Aware Transformer
Explainable data bias mitigation
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling
Improving Self-Supervised Learning for Audio Representations by Feature Diversity and Decorrelation
Nonparallel Emotional Voice Conversion for unseen speaker-emotion pairs using dual domain adversarial network Virtual Domain Pairing
The Pipeline System of ASR and NLU with MLM-based Data Augmentation toward STOP Low-resource Challenge
E-Branchformer-Based E2E SLU Toward Stop on-Device Challenge
A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge
Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History
Streaming Joint Speech Recognition and Disfluency Detection
An Attention-based Approach to Hierarchical Multi-label Music Instrument Classification
DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability
Hierarchical Diffusion Models for Singing Voice Neural Vocoder
Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects
Unsupervised Vocal Dereverberation with Diffusion-based Generative Models
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
ComFact: A Benchmark for Linking Contextual Commonsense Knowledge
Regularizing Score-based Models with Score Fokker-Planck Equations