| Title: | Interface to 'MeCab' |
|---|---|
| Description: | Parses Japanese texts with 'MeCab'. The original 'MeCab' is licensed under the BSD 3-Clause "New" or "Revised" License. See the "LICENSE.note" file for its license notice. |
| Authors: | Motohiro Ishida [aut, cre], Taku Kudo [cph], Nippon Telegraph and Telephone Corporation [cph] |
| Maintainer: | Motohiro Ishida <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.15 |
| Built: | 2026-05-28 05:58:22 UTC |
| Source: | https://github.com/paithiov909/rmecab-doc |
Checks if any mecabrc file exists.
anyRcfileExists()anyRcfileExists()
This is a helper function that checks if any mecabrc file exists before initializing tagger.
'MeCab' expects a mecabrc file to be present; if not, it will raise an error (without any message!).
A logical.
Finds collocations from the specified text file.
Takes a node word and a window span as arguments.
collocate(filename, node, span = 3, dic = "", mecabrc = "", etc = "")collocate(filename, node, span = 3, dic = "", mecabrc = "", etc = "")
filename |
An input file. |
node |
Node word. |
span |
Window span. Defaults to |
dic |
Path to a user dictionary file such as |
mecabrc |
Path to a mecabrc file. |
etc |
Other options for 'MeCab' tagger. |
A data.frame.
## Not run: text_file <- system.file("samples/doc1.txt", package = "RMeCab") out <- collocate(text_file, "\u6570\u5b66") out ## End(Not run)## Not run: text_file <- system.file("samples/doc1.txt", package = "RMeCab") out <- collocate(text_file, "\u6570\u5b66") out ## End(Not run)
Calculates T-score and MI-score according to the result of collocate().
collScores(kekka, node, span)collScores(kekka, node, span)
kekka |
Result of |
node |
Node word. |
span |
Window span. |
A data frame.
## Not run: text_file <- system.file("samples/doc1.txt", package = "RMeCab") out <- collocate(text_file, "\u6570\u5b66") collScores(out, "\u6570\u5b66", 3) ## End(Not run)## Not run: text_file <- system.file("samples/doc1.txt", package = "RMeCab") out <- collocate(text_file, "\u6570\u5b66") collScores(out, "\u6570\u5b66", 3) ## End(Not run)
Counts tokens (characters, terms, or N-grams) within target.
target can be a file, directory, or a data.frame.
docDF( target, column = 0, type = 0, pos = NULL, minFreq = 1, N = 1, Genkei = 0, weight = "", nDF = 0, co = 0, dic = "", mecabrc = "", etc = "" )docDF( target, column = 0, type = 0, pos = NULL, minFreq = 1, N = 1, Genkei = 0, weight = "", nDF = 0, co = 0, dic = "", mecabrc = "", etc = "" )
target |
A file, directory, or a data.frame. |
column |
Column number or name which include the text to analyze. |
type |
Kind of tokens. |
pos |
Parts of speech that should be extracted.
If |
minFreq |
Minimum document frequency for filtering terms.
Terms that appear less than |
N |
Unit of tokens. If |
Genkei |
If |
weight |
Method to weight term frequencies. |
nDF |
If |
co |
If |
dic |
Path to a user dictionary file such as |
mecabrc |
Path to a mecabrc file. |
etc |
Other options for 'MeCab' tagger. |
A data.frame is invisibly returned.
## Not run: text_dir <- system.file("samples", package = "RMeCab") out <- docDF(text_dir, column = 0, type = 1, minFreq = 2) head(out) ## End(Not run)## Not run: text_dir <- system.file("samples", package = "RMeCab") out <- docDF(text_dir, column = 0, type = 1, minFreq = 2) head(out) ## End(Not run)
Creates a document-term matrix out of all files in a given directory. Each cell of the matrix shows the actual frequency of each word.
docMatrix( mydir, pos = "Default", minFreq = 1, weight = "no", kigo = 0, co = 0, dic = "", mecabrc = "", etc = "" )docMatrix( mydir, pos = "Default", minFreq = 1, weight = "no", kigo = 0, co = 0, dic = "", mecabrc = "", etc = "" )
mydir |
A directory where text files are stored. |
pos |
Parts of speech that should be extracted.
If |
minFreq |
Minimum document frequency for filtering terms.
Terms that appear less than |
weight |
Method to weight term frequencies. |
kigo |
If |
co |
If |
dic |
Path to a user dictionary file such as |
mecabrc |
Path to a mecabrc file. |
etc |
Other options for 'MeCab' tagger. |
An integer matrix is invisibly returned.
## Not run: text_dir <- system.file("samples", package = "RMeCab") out <- docMatrix(text_dir) head(out) ## End(Not run)## Not run: text_dir <- system.file("samples", package = "RMeCab") out <- docMatrix(text_dir) head(out) ## End(Not run)
Creates a document-term matrix out of all files in a given directory. Each cell of the matrix shows the actual frequency of each word.
docMatrix2( directory, pos = "Default", minFreq = 1, weight = "no", kigo = 0, co = 0, dic = "", mecabrc = "", etc = "" )docMatrix2( directory, pos = "Default", minFreq = 1, weight = "no", kigo = 0, co = 0, dic = "", mecabrc = "", etc = "" )
directory |
A directory where text files are stored or a single file. |
pos |
Parts of speech that should be extracted.
If |
minFreq |
Minimum document frequency for filtering terms.
Terms that appear less than |
weight |
Method to weight term frequencies. |
kigo |
If |
co |
If |
dic |
Path to a user dictionary file such as |
mecabrc |
Path to a mecabrc file. |
etc |
Other options for 'MeCab' tagger. |
An integer matrix is invisibly returned.
## Not run: text_dir <- system.file("samples", package = "RMeCab") out <- docMatrix2(text_dir) head(out) ## End(Not run)## Not run: text_dir <- system.file("samples", package = "RMeCab") out <- docMatrix2(text_dir) head(out) ## End(Not run)
Creates a document-term matrix out of a character vector. Each cell of the matrix shows the actual frequency of each word.
docMatrixDF( charVec = c("MeCab", "CaBoCha"), pos = "Default", minFreq = 1, weight = "no", co = 0, dic = "", mecabrc = "", etc = "" )docMatrixDF( charVec = c("MeCab", "CaBoCha"), pos = "Default", minFreq = 1, weight = "no", co = 0, dic = "", mecabrc = "", etc = "" )
charVec |
A character vector. |
pos |
Parts of speech that should be extracted.
If |
minFreq |
Minimum document frequency for filtering terms.
Terms that appear less than |
weight |
Method to weight term frequencies. |
co |
If |
dic |
Path to a user dictionary file such as |
mecabrc |
Path to a mecabrc file. |
etc |
Other options for 'MeCab' tagger. |
An integer matrix is invisibly returned.
Creates a data.frame of N-gram out of all files in a given directory.
docNgram( mydir, type = 1, N = 2, pos = "Default", dic = "", mecabrc = "", etc = "" )docNgram( mydir, type = 1, N = 2, pos = "Default", dic = "", mecabrc = "", etc = "" )
mydir |
A directory where text files are stored. |
type |
Kind of tokens. |
N |
Unit of tokens. If |
pos |
Parts of speech that should be extracted.
If |
dic |
Path to a user dictionary file such as |
mecabrc |
Path to a mecabrc file. |
etc |
Other options for 'MeCab' tagger. |
A data.frame is invisibly returned.
## Not run: text_dir <- system.file("samples", package = "RMeCab") out <- docNgram(text_dir, type = 1) head(out) ## End(Not run)## Not run: text_dir <- system.file("samples", package = "RMeCab") out <- docNgram(text_dir, type = 1) head(out) ## End(Not run)
Creates a data frame of N-grams out of all files in a given directory.
docNgram2( directory, type = 0, pos = "Default", minFreq = 1, N = 2, kigo = 0, weight = "no", dic = "", mecabrc = "", etc = "" )docNgram2( directory, type = 0, pos = "Default", minFreq = 1, N = 2, kigo = 0, weight = "no", dic = "", mecabrc = "", etc = "" )
directory |
directory in which text files are stored or a single file. |
type |
Kind of tokens. |
pos |
Parts of speech that should be extracted.
If |
minFreq |
Minimum document frequency for filtering terms.
Terms that appear less than |
N |
Unit of tokens. If |
kigo |
If |
weight |
Method to weight term frequencies. |
dic |
Path to a user dictionary file such as |
mecabrc |
Path to a mecabrc file. |
etc |
Other options for 'MeCab' tagger. |
A data.frame is invisibly returned.
## Not run: text_dir <- system.file("samples", package = "RMeCab") out <- docNgram2(text_dir, type = 1) head(out) ## End(Not run)## Not run: text_dir <- system.file("samples", package = "RMeCab") out <- docNgram2(text_dir, type = 1) head(out) ## End(Not run)
Creates a data.frame of N-grams out of a character vector.
docNgramDF( mojiVec = "MeCab", type = 0, pos = "Default", baseform = 0, minFreq = 1, N = 1, kigo = 0, weight = "no", co = 0, dic = "", mecabrc = "", etc = "" )docNgramDF( mojiVec = "MeCab", type = 0, pos = "Default", baseform = 0, minFreq = 1, N = 1, kigo = 0, weight = "no", co = 0, dic = "", mecabrc = "", etc = "" )
mojiVec |
A character vector. |
type |
Kind of tokens. |
pos |
Parts of speech that should be extracted.
If |
baseform |
Genkei. See |
minFreq |
Minimum document frequency for filtering terms.
Terms that appear less than |
N |
Unit of tokens. If |
kigo |
If |
weight |
Method to weight term frequencies. |
co |
If |
dic |
Path to a user dictionary file such as |
mecabrc |
Path to a mecabrc file. |
etc |
Other options for 'MeCab' tagger. |
A data frame is invisibly returned.
Returns a data.frame of N-gram.
Ngram( filename, type = 0, N = 2, pos = "Default", dic = "", mecabrc = "", etc = "" )Ngram( filename, type = 0, N = 2, pos = "Default", dic = "", mecabrc = "", etc = "" )
filename |
An input file. |
type |
Kind of tokens. |
N |
Unit of tokens. If |
pos |
Parts of speech that should be extracted.
If |
dic |
Path to a user dictionary file such as |
mecabrc |
Path to a mecabrc file. |
etc |
Other options for 'MeCab' tagger. |
A data.frame.
## Not run: text_file <- system.file("samples/doc1.txt", package = "RMeCab") out <- Ngram(text_file, type = 1) head(out) ## End(Not run)## Not run: text_file <- system.file("samples/doc1.txt", package = "RMeCab") out <- Ngram(text_file, type = 1) head(out) ## End(Not run)
Returns a data frame of N-gram.
NgramDF( filename, type = 0, N = 2, pos = "Default", dic = "", mecabrc = "", etc = "" )NgramDF( filename, type = 0, N = 2, pos = "Default", dic = "", mecabrc = "", etc = "" )
filename |
An input file. |
type |
Kind of tokens. |
N |
Unit of tokens. If |
pos |
Parts of speech that should be extracted.
If |
dic |
Path to a user dictionary file such as |
mecabrc |
Path to a mecabrc file. |
etc |
Other options for 'MeCab' tagger. |
A data.frame.
## Not run: text_file <- system.file("samples/doc1.txt", package = "RMeCab") out <- NgramDF(text_file, type = 1) head(out) ## End(Not run)## Not run: text_file <- system.file("samples/doc1.txt", package = "RMeCab") out <- NgramDF(text_file, type = 1) head(out) ## End(Not run)
Creates a data.frame of N-grams out of all files in a given directory.
NgramDF2( directory, type = 0, pos = "Default", minFreq = 1, N = 2, kigo = 0, dic = "", mecabrc = "", etc = "" )NgramDF2( directory, type = 0, pos = "Default", minFreq = 1, N = 2, kigo = 0, dic = "", mecabrc = "", etc = "" )
directory |
A directory in which text files are stored or a single file. |
type |
Kind of tokens. |
pos |
Parts of speech that should be extracted.
If |
minFreq |
Minimum document frequency for filtering terms.
Terms that appear less than |
N |
Unit of tokens. If |
kigo |
If |
dic |
Path to a user dictionary file such as |
mecabrc |
Path to a mecabrc file. |
etc |
Other options for 'MeCab' tagger. |
A data.frame is invisibly returned.
## Not run: text_dir <- system.file("samples", package = "RMeCab") out <- NgramDF2(text_dir, type = 1) head(out) ## End(Not run)## Not run: text_dir <- system.file("samples", package = "RMeCab") out <- NgramDF2(text_dir, type = 1) head(out) ## End(Not run)
Takes a string as an argument and tokenize it into a length-1 lists of term.
RMeCabC(str, mypref = 0, dic = "", mecabrc = "", etc = "")RMeCabC(str, mypref = 0, dic = "", mecabrc = "", etc = "")
str |
A string scalar to be tokenized. |
mypref |
If |
dic |
Path to a user dictionary file such as |
mecabrc |
Path to a mecabrc file. |
etc |
Other options for 'MeCab' tagger. |
A list.
## Not run: text <- scan( system.file("samples/doc1.txt", package = "RMeCab"), what = character() ) unlist(RMeCabC(text)) ## End(Not run)## Not run: text <- scan( system.file("samples/doc1.txt", package = "RMeCab"), what = character() ) unlist(RMeCabC(text)) ## End(Not run)
Takes a data frame as an argument and tokenize it into a length-1 lists of term.
RMeCabDF(dataf, coln, mypref = 0, dic = "", mecabrc = "", etc = "")RMeCabDF(dataf, coln, mypref = 0, dic = "", mecabrc = "", etc = "")
dataf |
A data.frame. |
coln |
Column number or name which include the text to analyze. |
mypref |
If |
dic |
Path to a user dictionary file such as |
mecabrc |
Path to a mecabrc file. |
etc |
Other options for 'MeCab' tagger. |
This is a wrapper of RMeCabC().
Any blanks should be replaced with NA for coln.
A list.
Takes a file as an argument and tokenize it into a list of term.
RMeCabDoc(filename, mypref = 1, kigo = 0, dic = "", mecabrc = "", etc = "")RMeCabDoc(filename, mypref = 1, kigo = 0, dic = "", mecabrc = "", etc = "")
filename |
An input file. |
mypref |
If |
kigo |
If |
dic |
Path to a user dictionary file such as |
mecabrc |
Path to a mecabrc file. |
etc |
Other options for 'MeCab' tagger. |
A list.
## Not run: text_file <- system.file("samples/doc1.txt", package = "RMeCab") unlist(RMeCabDoc(text_file)) ## End(Not run)## Not run: text_file <- system.file("samples/doc1.txt", package = "RMeCab") unlist(RMeCabDoc(text_file)) ## End(Not run)
Takes text files as first argument and returns parts of speech and frequencies as a data.frame.
RMeCabFreq(filename, dic = "", mecabrc = "", etc = "")RMeCabFreq(filename, dic = "", mecabrc = "", etc = "")
filename |
an input file. |
dic |
Path to a user dictionary file such as |
mecabrc |
Path to a mecabrc file. |
etc |
Other options for 'MeCab' tagger. |
A data.frame.
## Not run: text_file <- system.file("samples/doc1.txt", package = "RMeCab") RMeCabFreq(text_file) ## End(Not run)## Not run: text_file <- system.file("samples/doc1.txt", package = "RMeCab") RMeCabFreq(text_file) ## End(Not run)
Takes a file as an argument and tokenize it into a list of terms and parts of speech.
RMeCabText(filename, dic = "", mecabrc = "", etc = "")RMeCabText(filename, dic = "", mecabrc = "", etc = "")
filename |
An input file |
dic |
Path to a user dictionary file such as |
mecabrc |
Path to a mecabrc file. |
etc |
Other options for 'MeCab' tagger. |
A list.
## Not run: text_file <- system.file("samples/doc1.txt", package = "RMeCab") RMeCabText(text_file) ## End(Not run)## Not run: text_file <- system.file("samples/doc1.txt", package = "RMeCab") RMeCabText(text_file) ## End(Not run)