Package 'pipian' reference manual

Title:	Tiny Interface to CaboCha for R
Description:	A tiny interface to 'CaboCha'; a Japanese dependency structure parser. The main goal of this package is to implement a parser for that XML output.
Authors:	Akiru Kato [aut, cre], Marcin Kalicinski [aut] (Author of rapidxml)
Maintainer:	Akiru Kato <[email protected]>
License:	MIT + file LICENSE
Version:	0.4.0
Built:	2025-03-01 06:53:46 UTC
Source:	https://github.com/paithiov909/pipian

Ngrams tokenizer

Description

Make an ngram tokenizer function.

Usage

ngram_tokenizer(n = 1L)
ngram_tokenizer(n = 1L)

Arguments

n

Integer.

Value

ngram tokenizer function

Pack prettified data.frame of tokens

Description

Packs a prettified data.frame of tokens into a new data.frame of corpus, which is compatible with the Text Interchange Formats.

Usage

pack(tbl, pull = "token", n = 1L, sep = "-", .collapse = " ")
pack(tbl, pull = "token", n = 1L, sep = "-", .collapse = " ")

Arguments

`tbl`	A prettified data.frame of tokens.
`pull`	Column to be packed into text or ngrams body. Default value is 'token'.
`n`	Integer internally passed to ngrams tokenizer function created of `audubon::ngram_tokenizer()`
`sep`	Character scalar internally used as the concatenator of ngrams.
`.collapse`	This argument is passed to `stringi::stri_join()`.

Value

A data.frame.

Text Interchange Formats (TIF)

The Text Interchange Formats (TIF) is a set of standards that allows R text analysis packages to target defined inputs and outputs for corpora, tokens, and document-term matrices.

Valid data.frame of tokens

The prettified data.frame of tokens here is a data.frame object compatible with the TIF.

A TIF valid data.frame of tokens are expected to have one unique key column (named 'doc_id') of each text and several feature columns of each tokens. The feature columns must contain at least 'token' itself.

Execute cabocha command

Description

Execute 'cabocha -f3 -n1' command using system2, then return the paths to the temporary XML files.

Usage

ppn_cabocha(text, rcpath = NULL)
ppn_cabocha(text, rcpath = NULL)

Arguments

`text`	A character vector to be parsed with CaboCha.
`rcpath`	String; path to the 'mecabrc' file if any.

Value

Paths to the CaboCha XML output are returned.

Examples

## Not run: 
ppn_cabocha(enc2utf8("\u96e8\u306b\u3082\u8ca0\u3051\u305a"))

## End(Not run)
## Not run: 
ppn_cabocha(enc2utf8("\u96e8\u306b\u3082\u8ca0\u3051\u305a"))

## End(Not run)

Cast dependency structure as an igraph

Description

Cast dependency structure as an igraph

Usage

ppn_make_graph(df)
ppn_make_graph(df)

Arguments

`df`	Output of `pipian::ppn_parse_xml`.

Value

An 'igraph' object is returned.

Examples

xml <- ppn_parse_xml(system.file("sample.xml", package = "pipian"))
ppn_make_graph(xml)
xml <- ppn_parse_xml(system.file("sample.xml", package = "pipian"))
ppn_make_graph(xml)

Parse XML output of CaboCha

Description

Parse XML output of CaboCha

Usage

ppn_parse_xml(
  path,
  into = c("POS1", "POS2", "POS3", "POS4", "X5StageUse1", "X5StageUse2", "Original",
    "Yomi1", "Yomi2"),
  col_select = seq_along(into)
)
ppn_parse_xml(
  path,
  into = c("POS1", "POS2", "POS3", "POS4", "X5StageUse1", "X5StageUse2", "Original",
    "Yomi1", "Yomi2"),
  col_select = seq_along(into)
)

Arguments

`path`	String; output from `pipian::ppn_cabocha`.
`into`	Character vector; feature names of output.
`col_select`	Character or integer vector; features that will be kept in the result.

Value

A tibble.

Examples

head(ppn_parse_xml(system.file("sample.xml", package = "pipian")))
head(ppn_parse_xml(system.file("sample.xml", package = "pipian")))

Package 'pipian'

Help Index

Ngrams tokenizer

Description

Usage

Arguments

Value

Pack prettified data.frame of tokens

Description

Usage

Arguments

Value

Text Interchange Formats (TIF)

Valid data.frame of tokens

See Also

Execute cabocha command

Description

Usage

Arguments

Value

Examples

Cast dependency structure as an igraph

Description

Usage

Arguments

Value

Examples

Parse XML output of CaboCha

Description

Usage

Arguments

Value

Examples