View Publication

WEXEA: Wikipedia EXhaustive Entity Annotation

Michael Strobl
Amine Trabelsi
Osmar R. Zaiane, University of Alberta (Database)

Full Text: LREC2020.pdf

In this paper, we are discussing an approach in order to create a text corpus based on Wikipedia with exhaustive annotations of entity mentions. Editors on Wikipedia are only expected to add hyperlinks in order to help the reader to understand the content, but are discouraged to add links that do not add any benefit for understanding an article. Therefore, many mentions of popular entities (such as countries or popular events in history), previously linked articles as well as the article entity itself, are not linked. This results in a huge potential for additional annotations that can be used for downstream NLP tasks, such as Relation Extraction. We show that our annotations are useful for creating distantly supervised datasets for this task. Furthermore, we publish all code necessary to derive a corpus from a raw Wikipedia dump, so that it can be reproduced by everyone.

Citation

M. Strobl, A. Trabelsi, O. Zaiane. "WEXEA: Wikipedia EXhaustive Entity Annotation". International Conference on Language Resources and Evaluation, (ed: Nicoletta Calzolari, FrÃ©dÃ©ric BÃ©chet, Philippe Blache, et al.), pp 1944-1951, May 2020.

Keywords:	Wikipedia, Knowledge Graphs, Relation Extraction
Category:	In Conference
Web Links:	ACL

BibTeX

@incollection{Strobl+al:20,
  author = {Michael Strobl and Amine Trabelsi and Osmar R. Zaiane},
  title = {WEXEA: Wikipedia EXhaustive Entity Annotation},
  Editor = {Nicoletta Calzolari, FrÃ©dÃ©ric BÃ©chet, Philippe Blache, et
    al.},
  Pages = {1944-1951},
  booktitle = {International Conference on Language Resources and Evaluation},
  year = 2020,
}

Last Updated: September 15, 2020
Submitted by Sabina P

Not Logged In

PapersDB

WEXEA: Wikipedia EXhaustive Entity Annotation

Citation

BibTeX