| Title: | GSDMM Short Text Clustering via Dirichlet Mixture Models |
|---|---|
| Description: | This package implements a Dirichlet Mixture Model and accompanying Gibbs sampler for short text clustering proposed by Yin and Wang 2014. |
| Authors: | Till Tietz [aut, cre], Akiru Kato [ctb] |
| Maintainer: | Till Tietz <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.0.6 |
| Built: | 2026-05-17 06:48:11 UTC |
| Source: | https://github.com/paithiov909/gsdmm |
This package implements a Dirichlet Mixture Model and accompanying Gibbs sampler for short text clustering proposed by Yin and Wang 2014.
Maintainer: Till Tietz [email protected]
Other contributors:
Akiru Kato [contributor]
Fit Dirichlet Mixture Model
gsdmm( texts, n_iter = 30L, n_clust = 8L, alpha = 0.1, beta = 0.1, progress = TRUE )gsdmm( texts, n_iter = 30L, n_clust = 8L, alpha = 0.1, beta = 0.1, progress = TRUE )
texts |
a list of character vectors |
n_iter |
integer number of iterations to run gibbs sampler for |
n_clust |
integer upper bound on number of clusters. The returned number of cluster will be smaller or equal to n_clust |
alpha |
double governing the probability of assigning a text to a currently empty cluster (larger alpha means higher probability). |
beta |
double governing the tradeoff between cluster size and fit. Smaller betas make clustering more sensitive to congruence between cluster-word and document-word distributions while larger betas make it more sensitive to cluster size. |
progress |
logical indicating whether to print progress bar. |
a list that contains an integer vector of clusters and a word-cluster matrix.