Modeling data using directional distributions: Part II

Sra, S; Jain, P; Dhillon, I

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Report

Modeling data using directional distributions: Part II

MPS-Authors

There are no MPG-Authors in the publication available

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

TR-07-05.pdf
(Publisher version), 236KB

Supplementary Material (public)

There is no public supplementary material available

Citation

Sra, S., Jain, P., & Dhillon, I.(2007). Modeling data using directional distributions: Part II (TR-07-05). Austin, TX, USA: University of Texas.

Cite as: https://hdl.handle.net/11858/00-001M-0000-0013-CEBF-9

Abstract

High-dimensional data is central to most data mining applications, and only recently has it
been modeled via directional distributions. In [Banerjee et al., 2003] the authors introduced the
use of the von Mises-Fisher (vMF) distribution for modeling high-dimensional directional data,
particularly for text and gene expression analysis. The vMF distribution is one of the simplest
directional distributions. TheWatson, Bingham, and Fisher-Bingham distributions provide distri-
butions with an increasing number of parameters and thereby commensurately increased modeling
power. This report provides a followup study to the initial development in [Banerjee et al., 2003]
by presenting Expectation Maximization (EM) procedures for estimating parameters of a mixture
of Watson (moW) distributions. The numerical challenges associated with parameter estimation
for both of these distributions are significantly more difficult than for the vMF distribution. We
develop new numerical approximations for estimating the parameters permitting us to model real-
life data more accurately. Our experimental results establish that for certain data sets improved
modeling power translates into better results.