Dialogue Generation Conditional on Predefined Stories: Preliminary ResultsView Publication
This paper introduces dialogue generation conditional on predefined stories, which will be called Story2Dialogue. As a starting point for the task, this paper presents benchmark performances using simple but modern baseline methods along with error analysis. The experimental results show that few-shot prompting using large-scale pre-trained language models outperforms human effort in some objective evaluation metrics, but the quality of the generated dialogues is far inferior to humans’ creations in terms of suitability as entertainment content and semantic equivalence to the input story. Regarding suitability as a movie script, human evaluators preferred automatically generated dialogues over those created by human writers in only 20% to 29% of cases. As for semantic equivalence to the input story, 75% to 80% of the automatically generated dialogues were found to be semantically insufficient. The error analysis shows that around 80% of the semantically insufficient dialogues lacked information from the given stories and, conversely, irrelevant utterances (including undefined continuations of conversations) were added in 20% to 26% of the dialogues.