Changelog

New Features

Updates

Fixing warning on Debian systems:

Result: WARN
  Found the following significant warnings:
    RcppExports.cpp:865:18: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
    RcppExports.cpp:899:18: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
    RcppExports.cpp:933:18: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
    RcppExports.cpp:967:18: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
  See ‘/home/hornik/tmp/R.check/r-devel-clang/Work/PKGS/philentropy.Rcheck/00install.out’ for details.
  * used C++ compiler: ‘Debian clang version 17.0.5 (1)’

The solution was to implement this quick fix by reinstalling Rcpp v1.0.11.6 via devtools::install_github("https://github.com/RcppCore/Rcpp") and rerun Rcpp::compileAttributes().

New Features

Updates

the Distances vignette now has a fixed documentation for the benchmarking of low-level distance functions. Many thanks to (@Nowosad) #30
in ../src/correlation.h adjustment of use of logical operators rather than Wbitwise (| -> or) which otherwises raises warnings in clang14
vector element limit is now extended to long vectors for all distance measures by declaring R_xlen_t instead of int during indexing.

New Features

distance() and all other individual information theory functions receive a new argument epsilon with default value epsilon = 0.00001 to treat cases where in individual distance or similarity computations yield x / 0 or 0 / 0. Instead of a hard coded epsilon, users can now set epsilon according to their input vectors. (Many thanks to Joshua McNeill #26 for this great question).
three new functions dist_one_one(), dist_one_many(), dist_many_many() are added. They are fairly flexible intermediaries between distance() and single distance functions. dist_one_one() expects two vectors (probability density functions) and returns a single value. dist_one_many() expects one vector (a probability density function) and one matrix (a set of probability density functions), and returns a vector of values. dist_many_many() expects two matrices (two sets of probability density functions), and returns a matrix of values. (Many thanks to Jakub Nowosad, see #27, #28, and New Vignette Many_Distance)

Updates

a new Vignette Comparing many probability density functions (Many thanks to Jakub Nowosad)
dplyr package dependency was removed and replaced by the poorman due to the heavy dependency burden of dplyr, since philentropy only used dplyr::between() which is now poorman::between() (Many thanks to Patrice Kiener for this suggestion)
distance(..., as.dist.obj = TRUE) now returns the same values as stats::dist() when working with 2 dimensional input matrices (2 vector inputs) (see #29) (Many thanks to Jakub Nowosad (@Nowosad)) Example:

library(philentropy)

m1 = matrix(c(1, 2), ncol = 1)

dist(m1)
#> 1
#> 2 1
distance(m1, as.dist.obj = TRUE)
#> Metric: 'euclidean'; comparing: 2 vectors.
#> 1
#> 2 1

New Features

the distance() function receives a new argument mute.message allowing users to mute message printing when running large-scale distance computations. Example:

distance(rbind(1:10/sum(1:10), 20:29/sum(20:29)), 
         method = "euclidean", 
         mute.message = TRUE)

adding markdown dependency to DESCRIPTION (find details here)

New Features

the distance() function receives a new argument use.row.names to enable passing the row names from the input probability or count matrix to the output distance matrix
the distance() function can now handle data.table and tibble input #16
adding new functionality and arguments as.dist.obj, diag, and upper to philentropy::distance() to allow users to retrieve a stats::dist() object when working with philentropy::distance() (Many thanks to Hugo Tavares #18 - see also #13) When using philentropy::distance(..., as.dist.obj = TRUE) users can now directly pass the distance() output into hclust:

Before:

ProbMatrix <- rbind(1:10/sum(1:10), 20:29/sum(20:29),30:39/sum(30:39))
dist.mat <- distance(ProbMatrix, method = "jaccard")
true.dist.mat <- as.dist(dist.mat)
clust.res <- hclust(true.dist.mat, method = "complete")
clust.res

Call:
hclust(d = true.dist.mat, method = "complete")

Cluster method   : complete
Number of objects: 3

Now:

ProbMatrix <- rbind(1:10/sum(1:10), 20:29/sum(20:29),30:39/sum(30:39))
dist.mat <- distance(ProbMatrix, method = "jaccard", as.dist.obj = TRUE)
clust.res <- hclust(true.dist.mat, method = "complete")
clust.res

Call:
hclust(d = true.dist.mat, method = "complete")

Cluster method   : complete
Number of objects: 3

Bug fixes

fixing a bug in gJSD() which tested transposed matrix rows rather than transposed matrix columns for sum > 1 (see issue #17 ; many thanks to @wkc1986)

New functionality

exporting all Rcpp distance measure functions individually (see issue #9), this enables access to much faster computations (see micro benchmarks at https://hajkd.github.io/philentropy/articles/Distances.html)

Bug fixes

fixing bug which caused that KL distance returns NaN when P == 0 (see issue #10; Many thanks to @KaiserDominici)
fixing bug which caused stack overflow when computing distance matrices with many rows (see issue #7; Many thanks to @wkc1986 and @elbamos)
fixing bug in gJSD() where an rbind() input matrix is not properly transposed (Many thanks to @vrodriguezf; see issue #14)

New Features

gJSD() receives new argument est.prob to enable empirical estimation of probability vectors from input count vectors (non-probabilistic vectors)
Jaccard and Tanimoto similarity measures now return 0 instead of NAN when probability vectors contain zeros (Many thanks to @JonasMandel; see issue #15)

Bug fixes

Fixing bug that caused jensen-shannon computations to compute wrong values when 0 values were present in the input vectors (see issue #4 ; Many thanks to @wkc1986)
Fixing bug that caused jensen-difference computations to compute wrong values when 0 values were present in the input vectors
Fixing bugs in all distance metrics when handing 0/0, 0/x or x/0 cases

New Features

new message system
extending documentation

Bug fixes

Fixing bug that caused that JSD() gives NaN when any probability is 0 - see https://github.com/HajkD/philentropy/issues/1 (Thanks to William Kurtis Chang)

Bug fixes

Fixing C++ memory leaks in dist.diversity() and distance() when check for colSums(x) > 1.001 was peformed (leak was found with rhub::check_with_valgrind())

Initial submission version.

Version 0.9.0.9000

New Features

Version 0.8.02023-12-02

Updates

Version 0.7.02022-11-05

New Features

Updates

Version 0.6.02022-02-14

New Features

Updates

Version 0.5.02021-05-12

New Features

Version 0.4.02020-01-09

New Features

Bug fixes

Version 0.3.02019-02-13

New functionality

Bug fixes

New Features

Version 0.2.02018-05-22

Bug fixes

Version 0.1.02018-04-10

New Features

Bug fixes

Version 0.0.22017-05-04

Bug fixes

Version 0.0.12017-04-25