キャラクター対話AI技術
Residual Language Model for End-to-end Speech Recognition
Distributed and Adaptive Edge-based AI Models for Sensor Networks (DAISeN)
Towards Developing a Multi-Modal Video Recommendation System
SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization
M2FNet: Multi-Modal Fusion Network for Emotion Recognition in Conversation
Bootstrapped Representation Learning for Skeleton-Based Action Recognition
Degree-of-linear-polarization-based Color Constancy
An Inductive System Monitoring Approach for GNSS Activation
レザバー計算による組込み指向型音声分類システム
Spatial mixup: Directional loudness modification as data augmentation for sound event localization and detection
Improving Character Error Rate Is Not Equal to Having Clean Speech: Speech Enhancement for ASR Systems with Black-box Acoustic Models
Good Examples Make A Faster Learner: Simple Demonstration-based Learning for Low-resource NER
Automatic DJ Transitions with Differentiable Audio Effects and Generative Adversarial Networks
Amicable Examples for Informed Source Separation
Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection
Source Mixing and Separation Robust Audio Steganography
Run-and-Back Stitch Search: Novel Block Synchronous Decoding For Streaming Encoder-Decoder ASR
Polyphone Disambiguation and Accent Prediction Using Pre-Trained Language Models in Japanese TTS Front-End
NVC-Net: End-to-End Adversarial Voice Conversion
Music Source Separation with Deep Equilibrium Models