Native Language Identification Using Probabilistic Graphical Models
- Garrett Nicolai
- Md Asadul Islam
- Russ Greiner, Dept of Computing Science; PI of AICML
Native Language Identification (NLI) is the task of identifying the native language of an author of a text written in a second language. Support Vector Machines and Maximum Entropy Learners are the most common methods used to solve this problem, but we consider it from the point-of-view of probabilistic graphical models. We hypothesize that graphical models are well-suited to this task, as they can capture feature inter-dependencies that cannot be exploited by SVMs. Using progressively more connected graphical models, we show that these models out-perform SVMs on reduced feature sets. Furthermore, on full feature sets, even naïve Bayes increases accuracy from 82.06% to 83.41% over SVMs on a 5-language classification task.
Citation
G. Nicolai, M. Islam, R. Greiner. "Native Language Identification Using Probabilistic Graphical Models". International Conference on Electrical Information and Communication Technology , pp n/a, February 2014.Keywords: | PGM, NLU, machine learning |
Category: | In Conference |
Web Links: | Journal URL |
DOI |
BibTeX
@incollection{Nicolai+al:EICT14, author = {Garrett Nicolai and Md Asadul Islam and Russ Greiner}, title = {Native Language Identification Using Probabilistic Graphical Models}, Pages = {n/a}, booktitle = {International Conference on Electrical Information and Communication Technology }, year = 2014, }Last Updated: February 12, 2020
Submitted by Sabina P