Not Logged In

Dynamic Web Log Session Identification With Statistical Language Models

We present a novel session identification method based on statistical language mod­ eling. Unlike standard timeout methods, which use fixed time thresholds for session identification, we use an information theoretic approach which yields more robust re­ sults for identifying session boundaries. We evaluate our new approach by learning interesting association rules from the segmented session files. We then compare the performance of our approach to three standard session identification methods---the standard timeout method, the reference length method and the maximal forward ref­ erence method---and find that our statistical language modeling approach generally yields superior results. However, as with every method, the performance of our tech­ nique varies with changing parameter settings. Therefore, we also analyze the influence of the two key factors in our language modeling based approach: the choice of smooth­ ing technique and the language model order. We find that all standard smoothing techniques, save one, perform well, and that performance is robust to language model order.

Citation

X. Huang, F. Peng, A. An, D. Schuurmans. "Dynamic Web Log Session Identification With Statistical Language Models". Journal of the American Society for Information Science and Technology (JASTIS), 55(14), pp 1290-1303, December 2004.

Keywords: session identification, web mining, language modeling, machine learning
Category: In Journal

BibTeX

@article{Huang+al:JASTIS04,
  author = {Xiangji Huang and Fuchun Peng and Aijun An and Dale Schuurmans},
  title = {Dynamic Web Log Session Identification With Statistical Language
    Models},
  Volume = "55",
  Number = "14",
  Pages = {1290-1303},
  journal = {Journal of the American Society for Information Science and
    Technology (JASTIS)},
  year = 2004,
}

Last Updated: March 14, 2007
Submitted by AICML Admin Assistant

University of Alberta Logo AICML Logo