Title: | Utility for Handling 'magnitude' Word Embeddings |
---|---|
Description: | A partial R port from 'magnitude', which is a fast, simple utility library for handling vector embeddings. The main goal of this package is to enable access to user's local magnitude data store. |
Authors: | Akiru Kato [aut, cre] |
Maintainer: | Akiru Kato <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.5 |
Built: | 2024-11-05 04:12:17 UTC |
Source: | https://github.com/paithiov909/apportita |
Calculate distances from keys to keys
calc_dist( conn, keys, q, normalized = TRUE, method = c("euclidean", "chisquared", "kullback", "manhattan", "maximum", "canberra", "minkowski", "hamming"), ... )
calc_dist( conn, keys, q, normalized = TRUE, method = c("euclidean", "chisquared", "kullback", "manhattan", "maximum", "canberra", "minkowski", "hamming"), ... )
conn |
a Magnitude connection. |
keys |
character vector. |
q |
character vector. |
normalized |
logical; whether or not vector embeddings should be normalized? |
method |
string; method to compute distance. |
... |
other arguments are passed to |
a sparse Matrix of 'Matrix' package.
Calculate similarities from keys to keys
calc_simil( conn, keys, q, normalized = TRUE, method = c("cosine", "correlation", "jaccard", "ejaccard", "dice", "edice", "hamann", "simple matching", "faith"), ... )
calc_simil( conn, keys, q, normalized = TRUE, method = c("cosine", "correlation", "jaccard", "ejaccard", "dice", "edice", "hamann", "simple matching", "faith"), ... )
conn |
a Magnitude connection. |
keys |
character vector. |
q |
character vector. |
normalized |
logical; whether or not vector embeddings should be normalized? |
method |
string; method to compute similarity. |
... |
other arguments are passed to |
a sparse Matrix of 'Matrix' package.
Calculate Word Rotator's Distance from keys to keys
calc_wrd(conn, keys, q, normalized = TRUE, ...)
calc_wrd(conn, keys, q, normalized = TRUE, ...)
conn |
a Magnitude connection. |
keys |
character vector. |
q |
character vector. |
normalized |
logical; whether or not vector embeddings should be normalized? |
... |
other arguments are passed
to |
numeric scalar.
Close a Magnitude connection
## S4 method for signature 'Magnitude' close(con)
## S4 method for signature 'Magnitude' close(con)
con |
a Magnitude connection. |
the value from RSQLite::dbDisconnect
is returned invisibly.
Dimensions of a Magnitude table
## S4 method for signature 'Magnitude' dim(x)
## S4 method for signature 'Magnitude' dim(x)
x |
a Magnitude connection. |
a numeric vector.
Order keys by their distances to a key
doesnt_match( conn, key, q, n = 1L, normalized = TRUE, method = c("euclidean", "chisquared", "kullback", "manhattan", "maximum", "canberra", "minkowski", "hamming") )
doesnt_match( conn, key, q, n = 1L, normalized = TRUE, method = c("euclidean", "chisquared", "kullback", "manhattan", "maximum", "canberra", "minkowski", "hamming") )
conn |
a Magnitude connection. |
key |
string. |
q |
character vector. elements exact same with key will be dropped from result. |
n |
integer. |
normalized |
logical; whether or not vector embeddings should be normalized? |
method |
string; method to compute distance. |
a tibble.
Check if keys exist in a Magnitude table?
has_exact(conn, keys)
has_exact(conn, keys)
conn |
a Magnitude connection. |
keys |
a character vector. |
a tibble.
Create a Magnitude connection
magnitude(path, ...)
magnitude(path, ...)
path |
string; a path to a magnitude file. |
... |
other arguments are passed to |
a Magnitude connection object inheriting SQLiteConnection class from 'RSQLite' package.
Order keys by their similarity to a key
most_similar( conn, key, q, n = 1L, normalized = TRUE, method = c("cosine", "correlation", "jaccard", "ejaccard", "dice", "edice", "hamann", "simple matching", "faith") )
most_similar( conn, key, q, n = 1L, normalized = TRUE, method = c("cosine", "correlation", "jaccard", "ejaccard", "dice", "edice", "hamann", "simple matching", "faith") )
conn |
a Magnitude connection. |
key |
string. |
q |
character vector. elements exact same with key will be dropped from result. |
n |
integer. |
normalized |
logical; whether or not vector embeddings should be normalized? |
method |
string; method to compute similarity. |
a tibble.
Get vector embeddings of keys. If out of vocabulary, their embeddings would be generated at random.
query( conn, q, normalized = TRUE, ngram_beg = NULL, ngram_end = NULL, topn = 5L )
query( conn, q, normalized = TRUE, ngram_beg = NULL, ngram_end = NULL, topn = 5L )
conn |
a Magnitude connection. |
q |
a character vector. |
normalized |
logical; whether or not vector embeddings should be normalized? |
ngram_beg |
integer. If supplied, the function gets out-of-vocabulary vectors by using character ngrams of which length are 'ngram_end - ngram_beg'. |
ngram_end |
integer. |
topn |
integer used for making out-of-vocabulary vectors. |
a tibble.
Slice samples by fraction from a Magnitude table
slice_frac(conn, frac = 0.001, normalized = TRUE)
slice_frac(conn, frac = 0.001, normalized = TRUE)
conn |
a Magnitude connection. |
frac |
numeric. |
normalized |
logical; whether or not vector embeddings should be normalized? |
a tibble.
Slice samples by index from a Magnitude table
slice_index(conn, index, normalized = TRUE)
slice_index(conn, index, normalized = TRUE)
conn |
a Magnitude connection. |
index |
integer vector. |
normalized |
logical; whether or not vector embeddings should be normalized? |
a tibble.
Slice samples from a Magnitude table
slice_n(conn, n, offset = 0, normalized = TRUE)
slice_n(conn, n, offset = 0, normalized = TRUE)
conn |
a Magnitude connection. |
n |
integer. |
offset |
integer. |
normalized |
logical; whether or not vector embeddings should be normalized? |
a tibble.
Calculate Word Rotator's Distance between two distributions.
wrd(x, y, ...)
wrd(x, y, ...)
x |
a dense or sparse matrix. |
y |
a dense or sparse matrix. |
... |
other arguments are passed
to |
Word Rotator's Distance is a measure of textual similarity improved of Word Mover's Distance.
numeric scalar.
http://dx.doi.org/10.18653/v1/2020.emnlp-main.236