Hacker News new | past | comments | ask | show | jobs | submit login

Not specific to Wikipedia:

https://aarontay.medium.com/3-new-tools-to-try-for-literatur...

https://archive.is/Ul13s

Specific to Wikipedia:

Wikipedia Citations: Reproducible Citation Extraction from Multilingual Wikipedia [2024]

https://arxiv.org/abs/2406.19291v1

https://doi.org/10.48550/arXiv.2406.19291

> Wikipedia is an essential component of the open science ecosystem, yet it is poorly integrated with academic open science initiatives. Wikipedia Citations is a project that focuses on extracting and releasing comprehensive datasets of citations from Wikipedia. A total of 29.3 million citations were extracted from English Wikipedia in May 2020. Following this one-off research project, we designed a reproducible pipeline that can process any given Wikipedia dump in the cloud-based settings. To demonstrate its usability, we extracted 40.6 million citations in February 2023 and 44.7 million citations in February 2024. Furthermore, we equipped the pipeline with an adapted Wikipedia citation template translation module to process multilingual Wikipedia articles in 15 European languages so that they are parsed and mapped into a generic structured citation template. This paper presents our open-source software pipeline to retrieve, classify, and disambiguate citations on demand from a given Wikipedia dump.

Prior work referenced in above abstract with some team overlap:

Wikipedia citations: A comprehensive data set of citations with identifiers extracted from English Wikipedia [2021]

https://direct.mit.edu/qss/article/2/1/1/97565/Wikipedia-cit...

https://doi.org/10.1162/qss_a_00105

Datasets:

A Comprehensive Dataset of Classified Citations with Identifiers from English Wikipedia (2024)

https://zenodo.org/records/10782978

https://doi.org/10.5281/zenodo.10782978

A Comprehensive Dataset of Classified Citations with Identifiers from Multilingual Wikipedia (2024)

https://zenodo.org/records/11210434

https://doi.org/10.5281/zenodo.11210434

Code (MIT License):

https://github.com/albatros13/wikicite

https://github.com/albatros13/wikicite/tree/multilang

Bonus links:

https://www.mediawiki.org/wiki/Alternative_parsers

https://scholarlykitchen.sspnet.org/2022/11/01/guest-post-wi...






Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: