
Show HN: Critical Role Dungeons and Dragons dialogue sumamrization dataset - cosmicskewl
https://github.com/RevanthRameshkumar/CRD3
======
cosmicskewl
The abstract from the paper:

This paper describes the Critical Role Dungeons and Dragons Dataset (CRD3) and
related analyses. Critical Role is an unscripted, live-streamed show where a
fixed group of people play Dungeons and Dragons, an open-ended role-playing
game. The dataset is collected from 159 Critical Role episodes transcribed to
text dialogues, consisting of 398,682 turns. It also includes corresponding
abstractive summaries collected from the Fandom wiki. The dataset is
linguistically unique in that the narratives are generated entirely through
player collaboration and spoken interaction. For each dialogue, there are a
large number of turns, multiple abstractive summaries with varying levels of
detail, and semantic ties to the previous dialogues. In addition, we provide a
data augmentation method that produces 34,243 summary-dialogue chunk pairs to
support current neural ML approaches, and we provide an abstractive
summarization benchmark and evaluation.

------
Fjolsvith
Or, how to train your AI to be a geek.

