日本語
 
Help Privacy Policy ポリシー/免責事項
  詳細検索ブラウズ

アイテム詳細

登録内容を編集ファイル形式で保存
 
 
ダウンロード電子メール
  Sparse Dictionary Learning with Simplex Constraints and Application to Topic Modeling

Zheng, Q. (2012). Sparse Dictionary Learning with Simplex Constraints and Application to Topic Modeling. Master Thesis, Universität des Saarlandes, Saarbrücken.

Item is

基本情報

表示: 非表示:
資料種別: 学位論文

ファイル

表示: ファイル
非表示: ファイル
:
2012_Qinqing Zheng_Master's Thesis.pdf (全文テキスト(全般)), 19MB
 
ファイルのパーマリンク:
-
ファイル名:
2012_Qinqing Zheng_Master's Thesis.pdf
説明:
-
OA-Status:
閲覧制限:
制限付き (Max Planck Institute for Informatics, MSIN; )
MIMEタイプ / チェックサム:
application/pdf
技術的なメタデータ:
著作権日付:
-
著作権情報:
-
CCライセンス:
-

関連URL

表示:

作成者

表示:
非表示:
 作成者:
Zheng, Qinqing1, 著者           
Hein, Matthias2, 学位論文主査
Slawski, Martin2, 監修者
所属:
1International Max Planck Research School, MPI for Informatics, Max Planck Society, ou_1116551              
2External Organizations, ou_persistent22              

内容説明

表示:
非表示:
キーワード: -
 要旨: Probabilistic mixture model is a powerful tool to provide a low-dimensional representation of count data. In the context of topic modeling, this amounts to representing the distribution of one document as a mixture of multiple distributions known as topics. The mixing proportions are called coecients. A common attempt is to introduce sparsity into both the topics and the coecients for better interpretability. We first discuss the problem of recovering sparse coecients of given documents when the topics are known. This is formulated as a penalized least squares problem on the probability simplex, where the sparsity is achieved through regularization. However, the typical `1 regularizer becomes toothless in this case since it is constant over the simplex. To overcome this issue, we propose a group of concave penalties for inducing sparsity. An alternative approach is to post-process the solution of non-negative lasso to produce result that conform to the simplex constraint. Our experiments show that both kinds of approaches can effiectively recover the sparsity pattern of coefficients. We then elaborately compare their robustness for dierent characteristics of input data. The second problem we discuss is to model both the topics and the coefficients of a collection of documents via matrix factorization. We propose the LpT approach, in which all the topics and coefficients are constrained on the simplex, and the `p penalty is imposed on each topic to promote sparsity. We also consider procedures that post-process the solutions of other methods. For example, the L1 approach first solves the problem where the simplex constraints imposed on the topics are relaxed into the non-negativity constraints, and the `p penalty is the replaced by the `1 penalty. Afterwards, L1 normalize the estimated topics to generate results satisfying the simplex constraints. As detecting the number of mixture components inherent in the data is of central importance for the probabilistic mixture model, we analyze how the regularization techniques can help us to automatically find out this number. We compare the capabilities of these approaches to recover the low-rank structure underlying the data, when the number of topics are correctly specied and over-specified, respectively. The empirical results demonstrate that LpT and L1 can discover the sparsity pattern of the ground truth. In addition, when the number of topics is over-specied, they adapt to the true number of topics.

資料詳細

表示:
非表示:
言語: eng - English
 日付: 2012-032012
 出版の状態: 出版
 ページ: -
 出版情報: Saarbrücken : Universität des Saarlandes
 目次: -
 査読: -
 識別子(DOI, ISBNなど): BibTex参照ID: Zheng2012
 学位: 修士号 (Master)

関連イベント

表示:

訴訟

表示:

Project information

表示:

出版物

表示: