audubon - Japanese Text Processing Tools
A collection of Japanese text processing tools for filling Japanese iteration marks, Japanese character type conversions, segmentation by phrase, and text normalization which is based on rules for the 'Sudachi' morphological analyzer and the 'NEologd' (Neologism dictionary for 'MeCab'). These features are specific to Japanese and are not implemented in 'ICU' (International Components for Unicode).
Last updated 4 days ago
japanesejavascript
5.43 score 10 stars 1 packages 3 scripts 740 downloadsgibasa - An Alternative 'Rcpp' Wrapper of 'MeCab'
A plain 'Rcpp' wrapper of 'MeCab' that can segment Chinese, Japanese, and Korean text into tokens. The main goal of this package is to provide an alternative to 'tidytext' using morphological analysis.
Last updated 4 days ago
mecabpos-taggingrcpp
5.05 score 14 stars 3 scripts 264 downloadsbaritsu - Wrappers for 'mlpack'
A collection of wrappers for the 'mlpack' package that allows passing formula as their argument.
Last updated 4 days ago
tidymodels
3.18 score 3 stars 1 scriptsshikakusphere - Miscellaneous Functions for Japanese Mahjong
A collection of miscellaneous functions for Japanese mahjong that wraps C++ sources of 'shanten-number' <https://github.com/tomohxx/shanten-number> and 'cmajiang' <https://github.com/TadaoYamaoka/cmajiang>.
Last updated 3 days ago
rcpp
3.15 score 4 stars 3 scriptspipian - Tiny Interface to CaboCha for R
A tiny interface to 'CaboCha'; a Japanese dependency structure parser. The main goal of this package is to implement a parser for that XML output.
Last updated 4 days ago
cabocha
3.15 score 4 stars 1 scriptsRcppMeCab - 'Rcpp' Wrapper for 'MeCab' Library
R package based on 'Rcpp' for 'MeCab': Yet Another Part-of-Speech and Morphological Analyzer. The purpose of this package is providing a seamless developing and analyzing environment for CJK texts. This package utilizes parallel programming for providing highly efficient text preprocessing 'posParallel()' function.
Last updated 1 years ago
2.90 score 40 scripts 354 downloadsjprailway - Dataset of Japanese Railway
Provides an extended dataset of Japanese railway revised from <https://github.com/Seo-4d696b75/station_database>. The original dataset is sourced from <https://www.ekidata.jp/>, the digital national land information download site, or other resources, and licensed under 'CC BY 4.0' <https://creativecommons.org/licenses/by/4.0/>.
Last updated 3 months ago
2.54 score 1 starsldccr - Utilities for Various Japanese Corpora
The goal of ldccr package is to make easy to use Japanese language resources. This package provides parsers for several Japanese corpora that are free or open licensed and a downloader of zipped text files published on Aozora Bunko.
Last updated 1 months ago
2.40 score 1 stars 1 scriptsapportita - Utility for Handling 'magnitude' Word Embeddings
A partial R port from 'magnitude', which is a fast, simple utility library for handling vector embeddings. The main goal of this package is to enable access to user's local magnitude data store.
Last updated 10 months ago
embeddings
2.18 score 1 stars 4 scriptsjisx0402 - Datasets Related to 'JIS X 0402:2020'
Provides datasets for handling Japanese municipality code defined in 'JIS X 0402' and 'JIS X 0401'.
Last updated 10 months ago
2.18 score 3 starstangela - rJava Interface to Kuromoji
An rJava wrapper for atilika/kuromoji (v0.7.7). This package will work fine, but it is too slow to be used in production.
Last updated 2 months ago
kuromojirjava
2.00 score 1 starsaznyan - An 'Utanet' Scraper and Utilities
Scrape lyrics from 'Utanet' website.
Last updated 6 months ago
2.00 score 1 scriptskelpbeds - Dictionary Tool for 'MeCab'
Provides the source 'IPAdic' for 'MeCab'.
Last updated 7 months ago
1.70 score