• Home
  • Publications
  • Towards Developing a Multi-Modal Video Recommendation System

Research Area


  • Sriram Pingali*, Prabir Mondal*, Daipayan Chakder*, Sriparna Saha*, Angshuman Ghosh
  • * External authors


  • Sony Research India Private Limited (SRI)




  • 2022


Towards Developing a Multi-Modal Video Recommendation System

View Publication


With the surge of digitized entertainment systems in recent years, the element of personalised experience for users gives a competitive edge to entertainment businesses. Hence, the importance of recommendation systems, in particular for video or movie recommendation, is evident. However, majority of the recommendation systems are primarily learned in supervised fashion on empirical data like user ratings. This approach has certain short comings, specifically the cold start problem and the data sparsity problem. Moreover, existing recommendation systems do not utilize multiple information associated with videos/movies like their textual summary, meta-data information, audio and video signals etc. With the increase in the multi-modal information processing in different fields of artificial intelligence, we have proposed the task of multi-modal recommendation system in the current study. For representing the movie/video, feature vectors generated from text modality, meta-data modality, audio and video modalities are concatenated. A novel knowledge graph based approach is applied for generating feature vectors from text modality. Finally, a Siamese architecture based deep learning technique is proposed to perform regression over similarities using multi-modal user-item embeddings. As there was no data set available for solving the task of multi-modal recommendation, we have also enhanced the existing benchmark Movie-lens 100k dataset with text, video, audio information and utilized that for performing experimentation. Experimental results establish the efficacy of using multi-modal information for movie embedding generation