Language independent authorship attribution using character level language models
- Fuchun Peng, Department of Computer Science, University of Massachusetts at Amherst
- Dale Schuurmans, AICML
- Shaojun Wang, Dept of Computing Science
We present a method for computer assisted authorship attribution based on character level ngram language mod els. Our approach is based on simple information theoretic principles, and achieves improved performance across a variety of languages without requiring extensive preprocessing or feature selection. To demonstrate the effectiveness and language independence of our approach, we present experimental results on Greek, English, and Chinese data. We show that our approach achieves state of the art performance in each of these cases. In particular, we ob tain a 20% accuracy improvement over the best published results for a Greek data set, while using a far simpler tech nique than previous investigations.
Citation
F. Peng, D. Schuurmans, S. Wang. "Language independent authorship attribution using character level language models". EACL, April 2003.Keywords: | machine learning |
Category: | In Conference |
BibTeX
@incollection{Peng+al:EACL03, author = {Fuchun Peng and Dale Schuurmans and Shaojun Wang}, title = {Language independent authorship attribution using character level language models}, booktitle = {}, year = 2003, }Last Updated: June 01, 2007
Submitted by Staurt H. Johnson