
audubon - Japanese Text Processing Tools
A collection of Japanese text processing tools for filling Japanese iteration marks, Japanese character type conversions, segmentation by phrase, and text normalization which is based on rules for the 'Sudachi' morphological analyzer and the 'NEologd' (Neologism dictionary for 'MeCab'). These features are specific to Japanese and are not implemented in 'ICU' (International Components for Unicode).
Last updated 7 days ago
japanesejavascript
5.65 score 10 stars 1 dependents 3 scripts 745 downloads
gibasa - An Alternative 'Rcpp' Wrapper of 'MeCab'
A plain 'Rcpp' wrapper for 'MeCab' that can segment Chinese, Japanese, and Korean text into tokens. The main goal of this package is to provide an alternative to 'tidytext' using morphological analysis.
Last updated 7 days ago
mecabpos-taggingrcppcpp
5.02 score 15 stars 3 scripts 414 downloadsshikakusphere - Miscellaneous Functions for Japanese Mahjong
A collection of miscellaneous functions for Japanese mahjong that wraps C++ sources of 'shanten-number' <https://github.com/tomohxx/shanten-number> and 'cmajiang' <https://github.com/TadaoYamaoka/cmajiang>.
Last updated 24 days ago
mahjongrcppcpp
3.41 score 4 stars 5 scriptsbaritsu - Wrappers for 'mlpack'
A collection of wrappers for the 'mlpack' package that allows passing formula as their argument.
Last updated 30 days ago
tidymodels
3.08 score 3 stars 1 scripts
pipian - Tiny Interface to CaboCha for R
A tiny interface to 'CaboCha'; a Japanese dependency structure parser. The main goal of this package is to implement a parser for that XML output.
Last updated 2 months ago
cabochacpp
3.00 score 4 stars 1 scriptsjprailway - Dataset of Japanese Railway
Provides an extended dataset of Japanese railway revised from <https://github.com/Seo-4d696b75/station_database>. The original dataset is sourced from <https://www.ekidata.jp/>, the digital national land information download site, or other resources, and licensed under 'CC BY 4.0' <https://creativecommons.org/licenses/by/4.0/>.
Last updated 23 days ago
2.65 score 1 stars
sudachir2 - R Wrapper for 'sudachi.rs'
Offers bindings to 'sudachi.rs' <https://github.com/WorksApplications/sudachi.rs>, a Rust implementation of 'Sudachi' Japanese morphological analyzer.
Last updated 4 days ago
pos-taggingrustcargo
2.48 score 3 stars 3 scriptsconvlog - Read Mahjong Logs From 'tenhou.net/6' Format
Offers wrappers for the 'convlog' crate from 'mjai-reviewer' <https://github.com/Equim-chan/mjai-reviewer> that can directly read mahjong logs from 'tenhou.net/6' format into tibbles.
Last updated 24 days ago
rustcargo
2.40 score 1 stars 3 scriptsldccr - Utilities for Various Japanese Corpora
The goal of ldccr package is to make easy to use Japanese language resources. This package provides parsers for several Japanese corpora that are free or open licensed and a downloader of zipped text files published on Aozora Bunko.
Last updated 24 days ago
cpp
2.40 score 1 stars 1 scriptsskiagd - R wrapper for 'rust-skia'
A toy R wrapper for 'rust-skia' <https://github.com/rust-skia/rust-skia> (the Rust crate 'skia_safe' <https://rust-skia.github.io/doc/skia_safe/>, a binding for 'Skia' <https://skia.org/>).
Last updated 1 days ago
graphicsrustcargofontconfigfreetype
2.30 score 1 starsvibrrt - An R Wrapper for 'vibrato'
An R wrapper for 'vibrato' <https://github.com/daac-tools/vibrato>, a Rust reimplementation of 'MeCab' for fast tokenization.
Last updated 24 days ago
pos-taggingrustcargo
2.30 score 1 scriptsjisx0402 - Datasets Related to 'JIS X 0402:2020'
Provides datasets for handling Japanese municipality code defined in 'JIS X 0402' and 'JIS X 0401'.
Last updated 1 years ago
2.18 score 3 starsaznyan - An 'Utanet' Scraper and Utilities
Scrape lyrics from 'Utanet' website.
Last updated 10 months ago
cpp
2.00 score 1 scripts
apportita - Utility for Handling 'magnitude' Word Embeddings
A partial R port from 'magnitude', which is a fast, simple utility library for handling vector embeddings. The main goal of this package is to enable access to user's local magnitude data store.
Last updated 2 months ago
embeddings
1.70 score 1 stars 4 scriptskelpbeds - Dictionary Tool for 'MeCab'
Provides the source 'IPAdic' for 'MeCab'.
Last updated 11 months ago
1.70 score