R/RcppExports.R
dist_many_many.Rd
This functions computes the distance/dissimilarity between two sets of probability density functions.
dist_many_many(
dists1,
dists2,
method,
p = NA_real_,
testNA = TRUE,
unit = "log",
epsilon = 1e-05
)
a numeric matrix storing distributions in its rows.
a numeric matrix storing distributions in its rows.
a character string indicating whether the distance measure that should be computed.
power of the Minkowski distance.
a logical value indicating whether or not distributions shall be checked for NA
values.
type of log
function. Option are
unit = "log"
unit = "log2"
unit = "log10"
epsilon a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by epsilon
. The default is epsilon = 0.00001
.
However, we recommend to choose a custom epsilon
value depending on the size of the input vectors,
the expected similarity between compared probability density functions and
whether or not many 0 values are present within the compared vectors.
As a rough rule of thumb we suggest that when dealing with very large
input vectors which are very similar and contain many 0
values,
the epsilon
value should be set even smaller (e.g. epsilon = 0.000000001
),
whereas when vector sizes are small or distributions very divergent then
higher epsilon
values may also be appropriate (e.g. epsilon = 0.01
).
Addressing this epsilon
issue is important to avoid cases where distance metrics
return negative values which are not defined and only occur due to the
technical issues of computing x / 0 or 0 / 0 cases.
A matrix of distance values