
Show HN: A dataset of 40k professionally-written summaries of news articles - CurationCorp
https://github.com/curationcorp/curation-corpus
======
CurationCorp
Hi HackerNews, we have recently open sourced a subset of our database of
human-written abstracts of news articles for use by the NLP community. These
summaries have been written and edited by professional writers for use in our
bespoke news solution.

We're hoping to spur development in AI abstractive summarisation, we've
written an introduction here: [https://medium.com/curation-
corporation/teaching-an-ai-to-ab...](https://medium.com/curation-
corporation/teaching-an-ai-to-abstract-a-new-dataset-for-abstractive-auto-
summarisation-5227f546caa8)

and technical posts on BERT-based abstractive summarisation here:
[https://medium.com/curation-corporation/fine-tuning-bert-
for...](https://medium.com/curation-corporation/fine-tuning-bert-for-
abstractive-summarisation-with-the-curation-dataset-79ea4b40a923)

and BART-based abstractive summarisation here: [https://medium.com/curation-
corporation/fine-tuning-bart-for...](https://medium.com/curation-
corporation/fine-tuning-bart-for-abstractive-text-summarisation-with-
fastai2-d7a2ad676a13?source=collection_home---6------0-----------------------)

If you're interested in the much larger commerically available dataset, please
get in touch: cto@curationcorp.com

