New Features

Updates

Fixing warning on Debian systems:

Result: WARN
  Found the following significant warnings:
    RcppExports.cpp:865:18: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
    RcppExports.cpp:899:18: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
    RcppExports.cpp:933:18: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
    RcppExports.cpp:967:18: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
  See ‘/home/hornik/tmp/R.check/r-devel-clang/Work/PKGS/philentropy.Rcheck/00install.out’ for details.
  * used C++ compiler: ‘Debian clang version 17.0.5 (1)’
  • The solution was to implement this quick fix by reinstalling Rcpp v1.0.11.6 via devtools::install_github("https://github.com/RcppCore/Rcpp") and rerun Rcpp::compileAttributes().

New Features

Updates

  • the Distances vignette now has a fixed documentation for the benchmarking of low-level distance functions. Many thanks to (@Nowosad) #30
  • in ../src/correlation.h adjustment of use of logical operators rather than Wbitwise (| -> or) which otherwises raises warnings in clang14
  • vector element limit is now extended to long vectors for all distance measures by declaring R_xlen_t instead of int during indexing.

New Features

  • distance() and all other individual information theory functions receive a new argument epsilon with default value epsilon = 0.00001 to treat cases where in individual distance or similarity computations yield x / 0 or 0 / 0. Instead of a hard coded epsilon, users can now set epsilon according to their input vectors. (Many thanks to Joshua McNeill #26 for this great question).
  • three new functions dist_one_one(), dist_one_many(), dist_many_many() are added. They are fairly flexible intermediaries between distance() and single distance functions. dist_one_one() expects two vectors (probability density functions) and returns a single value. dist_one_many() expects one vector (a probability density function) and one matrix (a set of probability density functions), and returns a vector of values. dist_many_many() expects two matrices (two sets of probability density functions), and returns a matrix of values. (Many thanks to Jakub Nowosad, see #27, #28, and New Vignette Many_Distance)

Updates

  • a new Vignette Comparing many probability density functions (Many thanks to Jakub Nowosad)
  • dplyr package dependency was removed and replaced by the poorman due to the heavy dependency burden of dplyr, since philentropy only used dplyr::between() which is now poorman::between() (Many thanks to Patrice Kiener for this suggestion)
  • distance(..., as.dist.obj = TRUE) now returns the same values as stats::dist() when working with 2 dimensional input matrices (2 vector inputs) (see #29) (Many thanks to Jakub Nowosad (@Nowosad)) Example:
library(philentropy)

m1 = matrix(c(1, 2), ncol = 1)

dist(m1)
#> 1
#> 2 1
distance(m1, as.dist.obj = TRUE)
#> Metric: 'euclidean'; comparing: 2 vectors.
#> 1
#> 2 1

New Features

  • the distance() function receives a new argument mute.message allowing users to mute message printing when running large-scale distance computations. Example:
distance(rbind(1:10/sum(1:10), 20:29/sum(20:29)), 
         method = "euclidean", 
         mute.message = TRUE)

New Features

  • the distance() function receives a new argument use.row.names to enable passing the row names from the input probability or count matrix to the output distance matrix

  • the distance() function can now handle data.table and tibble input #16

  • adding new functionality and arguments as.dist.obj, diag, and upper to philentropy::distance() to allow users to retrieve a stats::dist() object when working with philentropy::distance() (Many thanks to Hugo Tavares #18 - see also #13) When using philentropy::distance(..., as.dist.obj = TRUE) users can now directly pass the distance() output into hclust:

Before:

ProbMatrix <- rbind(1:10/sum(1:10), 20:29/sum(20:29),30:39/sum(30:39))
dist.mat <- distance(ProbMatrix, method = "jaccard")
true.dist.mat <- as.dist(dist.mat)
clust.res <- hclust(true.dist.mat, method = "complete")
clust.res
Call:
hclust(d = true.dist.mat, method = "complete")

Cluster method   : complete
Number of objects: 3 

Now:

ProbMatrix <- rbind(1:10/sum(1:10), 20:29/sum(20:29),30:39/sum(30:39))
dist.mat <- distance(ProbMatrix, method = "jaccard", as.dist.obj = TRUE)
clust.res <- hclust(true.dist.mat, method = "complete")
clust.res
Call:
hclust(d = true.dist.mat, method = "complete")

Cluster method   : complete
Number of objects: 3 

Bug fixes

  • fixing a bug in gJSD() which tested transposed matrix rows rather than transposed matrix columns for sum > 1 (see issue #17 ; many thanks to @wkc1986)

New functionality

Bug fixes

  • fixing bug which caused that KL distance returns NaN when P == 0 (see issue #10; Many thanks to @KaiserDominici)

  • fixing bug which caused stack overflow when computing distance matrices with many rows (see issue #7; Many thanks to @wkc1986 and @elbamos)

  • fixing bug in gJSD() where an rbind() input matrix is not properly transposed (Many thanks to @vrodriguezf; see issue #14)

New Features

  • gJSD() receives new argument est.prob to enable empirical estimation of probability vectors from input count vectors (non-probabilistic vectors)

  • Jaccard and Tanimoto similarity measures now return 0 instead of NAN when probability vectors contain zeros (Many thanks to @JonasMandel; see issue #15)

Bug fixes

  • Fixing bug that caused jensen-shannon computations to compute wrong values when 0 values were present in the input vectors (see issue #4 ; Many thanks to @wkc1986)
  • Fixing bug that caused jensen-difference computations to compute wrong values when 0 values were present in the input vectors
  • Fixing bugs in all distance metrics when handing 0/0, 0/x or x/0 cases

New Features

  • new message system
  • extending documentation

Bug fixes

Bug fixes

  • Fixing C++ memory leaks in dist.diversity() and distance() when check for colSums(x) > 1.001 was peformed (leak was found with rhub::check_with_valgrind())

Initial submission version.