Package 'gsdmm'

Title: GSDMM Short Text Clustering via Dirichlet Mixture Models
Description: This package implements a Dirichlet Mixture Model and accompanying Gibbs sampler for short text clustering proposed by Yin and Wang 2014.
Authors: Till Tietz [aut, cre], Akiru Kato [ctb]
Maintainer: Till Tietz <[email protected]>
License: MIT + file LICENSE
Version: 0.0.6
Built: 2024-10-31 20:17:30 UTC
Source: https://github.com/paithiov909/gsdmm

Help Index


gsdmm: GSDMM Short Text Clustering via Dirichlet Mixture Models

Description

This package implements a Dirichlet Mixture Model and accompanying Gibbs sampler for short text clustering proposed by Yin and Wang 2014.

Author(s)

Maintainer: Till Tietz [email protected]

Other contributors:

  • Akiru Kato [contributor]


Fit Dirichlet Mixture Model

Description

Fit Dirichlet Mixture Model

Usage

gsdmm(
  texts,
  n_iter = 30L,
  n_clust = 8L,
  alpha = 0.1,
  beta = 0.1,
  progress = TRUE
)

Arguments

texts

a list of character vectors

n_iter

integer number of iterations to run gibbs sampler for

n_clust

integer upper bound on number of clusters. The returned number of cluster will be smaller or equal to n_clust

alpha

double governing the probability of assigning a text to a currently empty cluster (larger alpha means higher probability).

beta

double governing the tradeoff between cluster size and fit. Smaller betas make clustering more sensitive to congruence between cluster-word and document-word distributions while larger betas make it more sensitive to cluster size.

progress

logical indicating whether to print progress bar.

Value

a list that contains an integer vector of clusters and a word-cluster matrix.