Generalized Jensen-Shannon Divergence

This function computes the Generalized Jensen-Shannon Divergence of a probability matrix.

gJSD(x, unit = "log2", weights = NULL, est.prob = NULL)

Arguments

x

a probability matrix.

unit

a character string specifying the logarithm unit that shall be used to compute distances that depend on log computations.

weights

a numeric vector specifying the weights for each distribution in x. Default: weights = NULL; in this case all distributions are weighted equally (= uniform distribution of weights). In case users wish to specify non-uniform weights for e.g. 3 distributions, they can specify the argument weights = c(0.5, 0.25, 0.25). This notation denotes that vec1 is weighted by 0.5, vec2 is weighted by 0.25, and vec3 is weighted by 0.25 as well.

est.prob

method to estimate probabilities from input count vectors such as non-probability vectors. Default: est.prob = NULL. Options are:

est.prob = "empirical": The relative frequencies of each vector are computed internally. For example an input matrix rbind(1:10, 11:20) will be transformed to a probability vector rbind(1:10 / sum(1:10), 11:20 / sum(11:20))

Value

The Jensen-Shannon divergence between all possible combinations of comparisons.

Details

Function to compute the Generalized Jensen-Shannon Divergence

\(JSD_{\pi_1,...,\pi_n}(P_1, ..., P_n) = H(\sum_{i = 1}^n \pi_i * P_i) - \sum_{i = 1}^n \pi_i*H(P_i)\)

where \(\pi_1,...,\pi_n\) denote the weights selected for the probability vectors P_1,...,P_n and H(P_i) denotes the Shannon Entropy of probability vector P_i.

Author

Hajk-Georg Drost

Examples

# define input probability matrix
Prob <- rbind(1:10/sum(1:10), 20:29/sum(20:29), 30:39/sum(30:39))

# compute the Generalized JSD comparing the PS probability matrix
gJSD(Prob)
#> No weights were specified ('weights = NULL'), thus equal weights for all distributions will be calculated and applied.
#> Metric: 'gJSD'; unit = 'log2'; comparing: 3 vectors (v1, ... , v3).
#> Weights: v1 = 0.333333333333333, v2 = 0.333333333333333, v3 = 0.333333333333333
#> [1] 0.03512892

# Generalized Jensen-Shannon Divergence between three vectors using different log bases
gJSD(Prob, unit = "log2") # Default
#> No weights were specified ('weights = NULL'), thus equal weights for all distributions will be calculated and applied.
#> Metric: 'gJSD'; unit = 'log2'; comparing: 3 vectors (v1, ... , v3).
#> Weights: v1 = 0.333333333333333, v2 = 0.333333333333333, v3 = 0.333333333333333
#> [1] 0.03512892
gJSD(Prob, unit = "log")
#> No weights were specified ('weights = NULL'), thus equal weights for all distributions will be calculated and applied.
#> Metric: 'gJSD'; unit = 'log'; comparing: 3 vectors (v1, ... , v3).
#> Weights: v1 = 0.333333333333333, v2 = 0.333333333333333, v3 = 0.333333333333333
#> [1] 0.02434951
gJSD(Prob, unit = "log10")
#> No weights were specified ('weights = NULL'), thus equal weights for all distributions will be calculated and applied.
#> Metric: 'gJSD'; unit = 'log10'; comparing: 3 vectors (v1, ... , v3).
#> Weights: v1 = 0.333333333333333, v2 = 0.333333333333333, v3 = 0.333333333333333
#> [1] 0.01057486

# Jensen-Shannon Divergence Divergence between count vectors P.count and Q.count
P.count <- 1:10
Q.count <- 20:29
R.count <- 30:39
x.count <- rbind(P.count, Q.count, R.count)
gJSD(x.count, est.prob = "empirical")
#> No weights were specified ('weights = NULL'), thus equal weights for all distributions will be calculated and applied.
#> Metric: 'gJSD'; unit = 'log2'; comparing: 3 vectors (v1, ... , v3).
#> Weights: v1 = 0.333333333333333, v2 = 0.333333333333333, v3 = 0.333333333333333
#> [1] 0.03512892