Title: | GSDMM Short Text Clustering via Dirichlet Mixture Models |
---|---|
Description: | This package implements a Dirichlet Mixture Model and accompanying Gibbs sampler for short text clustering proposed by Yin and Wang 2014. |
Authors: | Till Tietz [aut, cre], Akiru Kato [ctb] |
Maintainer: | Till Tietz <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.6 |
Built: | 2024-12-30 06:36:49 UTC |
Source: | https://github.com/paithiov909/gsdmm |
This package implements a Dirichlet Mixture Model and accompanying Gibbs sampler for short text clustering proposed by Yin and Wang 2014.
Maintainer: Till Tietz [email protected]
Other contributors:
Akiru Kato [contributor]
Fit Dirichlet Mixture Model
gsdmm( texts, n_iter = 30L, n_clust = 8L, alpha = 0.1, beta = 0.1, progress = TRUE )
gsdmm( texts, n_iter = 30L, n_clust = 8L, alpha = 0.1, beta = 0.1, progress = TRUE )
texts |
a list of character vectors |
n_iter |
integer number of iterations to run gibbs sampler for |
n_clust |
integer upper bound on number of clusters. The returned number of cluster will be smaller or equal to n_clust |
alpha |
double governing the probability of assigning a text to a currently empty cluster (larger alpha means higher probability). |
beta |
double governing the tradeoff between cluster size and fit. Smaller betas make clustering more sensitive to congruence between cluster-word and document-word distributions while larger betas make it more sensitive to cluster size. |
progress |
logical indicating whether to print progress bar. |
a list that contains an integer vector of clusters and a word-cluster matrix.