Package: ldccr 2024.10.10
ldccr: Utilities for Various Japanese Corpora
The goal of ldccr package is to make easy to use Japanese language resources. This package provides parsers for several Japanese corpora that are free or open licensed and a downloader of zipped text files published on Aozora Bunko.
Authors:
ldccr_2024.10.10.tar.gz
ldccr_2024.10.10.zip(r-4.5)ldccr_2024.10.10.zip(r-4.4)ldccr_2024.10.10.zip(r-4.3)
ldccr_2024.10.10.tgz(r-4.4-any)ldccr_2024.10.10.tgz(r-4.3-any)
ldccr_2024.10.10.tar.gz(r-4.5-noble)ldccr_2024.10.10.tar.gz(r-4.4-noble)
ldccr_2024.10.10.tgz(r-4.4-emscripten)ldccr_2024.10.10.tgz(r-4.3-emscripten)
ldccr.pdf |ldccr.html✨
ldccr/json (API)
# Install 'ldccr' in R: |
install.packages('ldccr', repos = c('https://paithiov909.r-universe.dev', 'https://cloud.r-project.org')) |
Bug tracker:https://github.com/paithiov909/ldccr/issues
- AozoraBunkoSnapshot - Meta data of text files published on Aozora Bunko
- NekoText - Whole text of ‘Wagahai Wa Neko Dearu’ written by Natsume Souseki from Aozora Bunko
Last updated 1 months agofrom:6b79ddaf25. Checks:OK: 3 NOTE: 4. Indexed: yes.
Target | Result | Date |
---|---|---|
Doc / Vignettes | OK | Nov 09 2024 |
R-4.5-win | OK | Nov 09 2024 |
R-4.5-linux | OK | Nov 09 2024 |
R-4.4-win | NOTE | Nov 09 2024 |
R-4.4-mac | NOTE | Nov 09 2024 |
R-4.3-win | NOTE | Nov 09 2024 |
R-4.3-mac | NOTE | Nov 09 2024 |
Exports:clean_emojiclean_urldownload_unidicis_within_erajrte_rte_filesldnws_categoriesparse_jrte_reasoningparse_to_jdateread_aozoraread_ja_text8read_jrteread_ldnwsunidic_availables
Dependencies:bitbit64cachemclicliprcpp11crayondplyrfansifastmapgenericsgluehmslifecyclemagrittrmemoisepillarpkgconfigprettyunitsprogresspurrrR6RcppRcppSimdJsonreadrrlangstringitibbletidyselecttzdbutf8vctrsvroomwithryesno
Readme and manuals
Help Manual
Help page | Topics |
---|---|
Meta data of text files published on Aozora Bunko | AozoraBunkoSnapshot |
Remove emojis | clean_emoji |
Remove URLs | clean_url |
Download and unzip 'UniDic' | download_unidic |
Check if dates are within Japanese era | is_within_era |
Data for Textual Entailment | jrte_rte_files |
List of categories of the Livedoor News Corpus | ldnws_categories |
Whole text of ‘Wagahai Wa Neko Dearu’ written by Natsume Souseki from Aozora Bunko | NekoText |
Parse reasoning column of 'rte.*.tsv' | parse_jrte_reasoning |
Parse dates to Japanese dates | parse_to_jdate |
Download text file from Aozora Bunko | read_aozora |
Read the ja.text8 corpus | read_ja_text8 |
Read the JRTE Corpus | read_jrte |
Read the Livedoor News Corpus | read_ldnws |
List of available 'UniDic' | unidic_availables |