Efficient Sum and Mean

Efficient Sum and Mean

In many situations we are not really that interested in the individual loss values (or derivatives) of each observation, but the sum or mean of them; be it weighted or unweighted. For example, by computing the unweighted mean of the loss for our training set, we would effectively compute what is known as the empirical risk. This is usually the quantity (or an important part of it) that we are interesting in minimizing.

When we say "weighted" or "unweighted", we are referring to whether we are explicitly specifying the influence of individual observations on the result. "Weighing" an observation is achieved by multiplying its value with some number (i.e. the "weight" of that observation). As a consequence that weighted observation will have a stronger or weaker influence on the result. In order to weigh an observation we have to know which array dimension (if there are more than one) denotes the observations. On the other hand, for computing an unweighted result we don't actually need to know anything about the meaning of the array dimensions, as long as the targets and the outputs are of compatible shape and size.

The naive way to compute such an unweighted reduction, would be to call mean or sum on the result of the element-wise operation. The following code snipped show an example of that. We say "naive", because it will not give us an acceptable performance.

julia> value(L1DistLoss(), [1.,2,3], [2,5,-2])
3-element Array{Float64,1}:
 1.0
 3.0
 5.0

julia> sum(value(L1DistLoss(), [1.,2,3], [2,5,-2])) # WARNING: Bad code
9.0

This works as expected, but there is a price for it. Before the sum can be computed, value will allocate a temporary array and fill it with the element-wise results. After that, sum will iterate over this temporary array and accumulate the values accordingly. Bottom line: we allocate temporary memory that we don't need in the end and could avoid.

For that reason we provide special methods that compute the common accumulations efficiently without allocating temporary arrays. These methods can be invoked using an additional parameter which specifies how the values should be accumulated / averaged. The type of this parameter has to be a subtype of AverageMode.

Average Modes

Before we discuss these memory-efficient methods, let us briefly introduce the available average mode types. We provide a number of different averages modes, all of which are contained within the namespace AvgMode. An instance of such type can then be used as additional parameter to value, deriv, and deriv2, as we will see further down.

It follows a list of available average modes. Each of which with a short description of what their effect would be when used as an additional parameter to the functions mentioned above.

AvgMode.None
AvgMode.Sum
AvgMode.Mean
AvgMode.WeightedSum
AvgMode.WeightedMean

Unweighted Sum and Mean

As hinted before, we provide special memory efficient methods for computing the sum or the mean of the element-wise (or broadcasted) results of value, deriv, and deriv2. These methods avoid the allocation of a temporary array and instead compute the result directly.

LearnBase.valueMethod.
value(loss, target::AbstractArray, output::AbstractArray, avgmode::AggregateMode) -> Number

Compute the weighted or unweighted sum or mean (depending on avgmode) of the individual values of the loss function for each pair in targets and outputs. This method will not allocate a temporary array.

In the case that the two parameters are arrays with a different number of dimensions, broadcast will be performed. Note that the given parameters are expected to have the same size in the dimensions they share.

Note: This function should always be type-stable. If it isn't, you likely found a bug.

Arguments

  • loss::SupervisedLoss: The loss-function $L$ we are working with.

  • targets::AbstractArray: The array of ground truths $\mathbf{y}$.

  • outputs::AbstractArray: The array of predicted outputs $\mathbf{\hat{y}}$.

  • avgmode::AggregateMode: Must be one of the following: AggMode.Sum(), AggMode.Mean(), AggMode.WeightedSum, or AggMode.WeightedMean.

Examples

julia> value(L1DistLoss(), [1,2,3], [2,5,-2], AggMode.Sum())
9

julia> value(L1DistLoss(), [1.,2,3], [2,5,-2], AggMode.Sum())
9.0

julia> value(L1DistLoss(), [1,2,3], [2,5,-2], AggMode.Mean())
3.0

julia> value(L1DistLoss(), Float32[1,2,3], Float32[2,5,-2], AggMode.Mean())
3.0f0
source

The exact same method signature is also implemented for deriv and deriv2 respectively.

LearnBase.derivMethod.
deriv(loss, target::AbstractArray, output::AbstractArray, avgmode::AggregateMode) -> Number

Compute the weighted or unweighted sum or mean (depending on avgmode) of the individual derivatives of the loss function for each pair in targets and outputs. This method will not allocate a temporary array.

In the case that the two parameters are arrays with a different number of dimensions, broadcast will be performed. Note that the given parameters are expected to have the same size in the dimensions they share.

Note: This function should always be type-stable. If it isn't, you likely found a bug.

Arguments

  • loss::SupervisedLoss: The loss-function $L$ we are working with.

  • targets::AbstractArray: The array of ground truths $\mathbf{y}$.

  • outputs::AbstractArray: The array of predicted outputs $\mathbf{\hat{y}}$.

  • avgmode::AggregateMode: Must be one of the following: AggMode.Sum(), AggMode.Mean(), AggMode.WeightedSum, or AggMode.WeightedMean.

Examples

julia> deriv(L2DistLoss(), [1,2,3], [2,5,-2], AggMode.Sum())
-2

julia> deriv(L2DistLoss(), [1,2,3], [2,5,-2], AggMode.Mean())
-0.6666666666666666
source
LearnBase.deriv2Method.
deriv2(loss, target::AbstractArray, output::AbstractArray, avgmode::AggregateMode) -> Number

Compute the weighted or unweighted sum or mean (depending on avgmode) of the individual second derivatives of the loss function for each pair in targets and outputs. This method will not allocate a temporary array.

In the case that the two parameters are arrays with a different number of dimensions, broadcast will be performed. Note that the given parameters are expected to have the same size in the dimensions they share.

Note: This function should always be type-stable. If it isn't, you likely found a bug.

Arguments

  • loss::SupervisedLoss: The loss-function $L$ we are working with.

  • targets::AbstractArray: The array of ground truths $\mathbf{y}$.

  • outputs::AbstractArray: The array of predicted outputs $\mathbf{\hat{y}}$.

  • avgmode::AggregateMode: Must be one of the following: AggMode.Sum(), AggMode.Mean(), AggMode.WeightedSum, or AggMode.WeightedMean.

Examples

julia> deriv2(LogitDistLoss(), [1.,2,3], [2,5,-2], AggMode.Sum())
0.49687329928636825

julia> deriv2(LogitDistLoss(), [1.,2,3], [2,5,-2], AggMode.Mean())
0.1656244330954561
source

Sum and Mean per Observation

When the targets and predicted outputs are multi-dimensional arrays instead of vectors, we may be interested in accumulating the values over all but one dimension. This is typically the case when we work in a multi-variable regression setting, where each observation has multiple outputs and thus multiple targets. In those scenarios we may be more interested in the average loss for each observation, rather than the total average over all the data.

To be able to accumulate the values for each observation separately, we have to know and explicitly specify the dimension that denotes the observations. For that purpose we provide the types contained in the namespace ObsDim.

LearnBase.valueMethod.
value(loss, target::AbstractArray, output::AbstractArray, avgmode::AggregateMode, obsdim::ObsDimension) -> AbstractVector

Compute the values of the loss function for each pair in targets and outputs individually, and return either the weighted or unweighted sum or mean for each observation (depending on avgmode). This method will not allocate a temporary array, but it will allocate the resulting vector.

Both arrays have to be of the same shape and size. Furthermore they have to have at least two array dimensions (i.e. they must not be vectors).

Note: This function should always be type-stable. If it isn't, you likely found a bug.

Arguments

  • loss::SupervisedLoss: The loss-function $L$ we are working with.

  • targets::AbstractArray: The array of ground truths $\mathbf{y}$.

  • outputs::AbstractArray: The array of predicted outputs $\mathbf{\hat{y}}$.

  • avgmode::AggregateMode: Must be one of the following: AggMode.Sum(), AggMode.Mean(), AggMode.WeightedSum, or AggMode.WeightedMean.

  • obsdim::ObsDimension: Specifies which of the array dimensions denotes the observations. see ?ObsDim for more information.

source

Consider the following two matrices, targets and outputs. We will fill them with some generated example values in order to better understand the effects of later operations.

julia> targets = reshape(1:8, (2, 4)) ./ 8
2×4 Array{Float64,2}:
 0.125  0.375  0.625  0.875
 0.25   0.5    0.75   1.0

julia> outputs = reshape(1:2:16, (2, 4)) ./ 8
2×4 Array{Float64,2}:
 0.125  0.625  1.125  1.625
 0.375  0.875  1.375  1.875

There are two ways to interpret the shape of these arrays if one dimension is supposed to denote the observations. The first interpretation would be to say that the first dimension denotes the observations. Thus this data would consist of two observations with four variables each.

julia> value(L1DistLoss(), targets, outputs, AvgMode.Sum(), ObsDim.First())
2-element Array{Float64,1}:
 1.5
 2.0

julia> value(L1DistLoss(), targets, outputs, AvgMode.Mean(), ObsDim.First())
2-element Array{Float64,1}:
 0.375
 0.5

The second possible interpretation would be to say that the second/last dimension denotes the observations. In that case our data consists of four observations with two variables each.

julia> value(L1DistLoss(), targets, outputs, AvgMode.Sum(), ObsDim.Last())
4-element Array{Float64,1}:
 0.125
 0.625
 1.125
 1.625

julia> value(L1DistLoss(), targets, outputs, AvgMode.Mean(), ObsDim.Last())
4-element Array{Float64,1}:
 0.0625
 0.3125
 0.5625
 0.8125

Because this method returns a vector of values, we also provide a mutating version that can make use a preallocated vector to write the results into.

LearnBase.value!Method.
value!(buffer::AbstractArray, loss, target::AbstractArray, output::AbstractArray, avgmode::AggregateMode, obsdim::ObsDimension) -> buffer

Compute the values of the loss function for each pair in targets and outputs individually, and return either the weighted or unweighted sum or mean for each observation, depending on avgmode. The results are stored into the given vector buffer. This method will not allocate a temporary array.

Both arrays have to be of the same shape and size. Furthermore they have to have at least two array dimensions (i.e. so they must not be vectors).

Note: This function should always be type-stable. If it isn't, you likely found a bug.

Arguments

  • buffer::AbstractArray: Array to store the computed values in. Old values will be overwritten and lost.

  • loss::SupervisedLoss: The loss-function $L$ we are working with.

  • targets::AbstractArray: The array of ground truths $\mathbf{y}$.

  • outputs::AbstractArray: The array of predicted outputs $\mathbf{\hat{y}}$.

  • avgmode::AggregateMode: Must be one of the following: AggMode.Sum(), AggMode.Mean(), AggMode.WeightedSum, or AggMode.WeightedMean.

  • obsdim::ObsDimension: Specifies which of the array dimensions denotes the observations. see ?ObsDim for more information.

Examples

julia> targets = reshape(1:8, (2, 4)) ./ 8;

julia> outputs = reshape(1:2:16, (2, 4)) ./ 8;

julia> buffer = zeros(2);

julia> value!(buffer, L1DistLoss(), targets, outputs, AggMode.Sum(), ObsDim.First())
2-element Array{Float64,1}:
 1.5
 2.0

julia> buffer = zeros(4);

julia> value!(buffer, L1DistLoss(), targets, outputs, AggMode.Sum(), ObsDim.Last())
4-element Array{Float64,1}:
 0.125
 0.625
 1.125
 1.625
source

Naturally we also provide both of these methods for deriv and deriv2 respectively.

LearnBase.derivMethod.
deriv(loss, target::AbstractArray, output::AbstractArray, avgmode::AggregateMode, obsdim::ObsDimension) -> AbstractVector

Compute the derivative of the loss function for each pair in targets and outputs individually, and return either the weighted or unweighted sum or mean for each observation (depending on avgmode). This method will not allocate a temporary array, but it will allocate the resulting vector.

Both arrays have to be of the same shape and size. Furthermore they have to have at least two array dimensions (i.e. they must not be vectors).

Note: This function should always be type-stable. If it isn't, you likely found a bug.

Arguments

  • loss::SupervisedLoss: The loss-function $L$ we are working with.

  • targets::AbstractArray: The array of ground truths $\mathbf{y}$.

  • outputs::AbstractArray: The array of predicted outputs $\mathbf{\hat{y}}$.

  • avgmode::AggregateMode: Must be one of the following: AggMode.Sum(), AggMode.Mean(), AggMode.WeightedSum, or AggMode.WeightedMean.

  • obsdim::ObsDimension: Specifies which of the array dimensions denotes the observations. see ?ObsDim for more information.

source
LearnBase.deriv!Method.
deriv!(buffer::AbstractArray, loss, target::AbstractArray, output::AbstractArray, avgmode::AggregateMode, obsdim::ObsDimension) -> buffer

Compute the derivative of the loss function for each pair in targets and outputs individually, and return either the weighted or unweighted sum or mean for each observation, depending on avgmode. The results are stored into the given vector buffer. This method will not allocate a temporary array.

Both arrays have to be of the same shape and size. Furthermore they have to have at least two array dimensions (i.e. so they must not be vectors).

Note: This function should always be type-stable. If it isn't, you likely found a bug.

Arguments

  • buffer::AbstractArray: Array to store the computed values in. Old values will be overwritten and lost.

  • loss::SupervisedLoss: The loss-function $L$ we are working with.

  • targets::AbstractArray: The array of ground truths $\mathbf{y}$.

  • outputs::AbstractArray: The array of predicted outputs $\mathbf{\hat{y}}$.

  • avgmode::AggregateMode: Must be one of the following: AggMode.Sum(), AggMode.Mean(), AggMode.WeightedSum, or AggMode.WeightedMean.

  • obsdim::ObsDimension: Specifies which of the array dimensions denotes the observations. see ?ObsDim for more information.

Examples

julia> targets = reshape(1:8, (2, 4)) ./ 8;

julia> outputs = reshape(1:2:16, (2, 4)) ./ 8;

julia> buffer = zeros(2);

julia> deriv!(buffer, L1DistLoss(), targets, outputs, AggMode.Sum(), ObsDim.First())
2-element Array{Float64,1}:
 3.0
 4.0

julia> buffer = zeros(4);

julia> deriv!(buffer, L1DistLoss(), targets, outputs, AggMode.Sum(), ObsDim.Last())
4-element Array{Float64,1}:
 1.0
 2.0
 2.0
 2.0
source
LearnBase.deriv2Method.
deriv2(loss, target::AbstractArray, output::AbstractArray, avgmode::AggregateMode, obsdim::ObsDimension) -> AbstractVector

Compute the second derivative of the loss function for each pair in targets and outputs individually, and return either the weighted or unweighted sum or mean for each observation (depending on avgmode). This method will not allocate a temporary array, but it will allocate the resulting vector.

Both arrays have to be of the same shape and size. Furthermore they have to have at least two array dimensions (i.e. they must not be vectors).

Note: This function should always be type-stable. If it isn't, you likely found a bug.

Arguments

  • loss::SupervisedLoss: The loss-function $L$ we are working with.

  • targets::AbstractArray: The array of ground truths $\mathbf{y}$.

  • outputs::AbstractArray: The array of predicted outputs $\mathbf{\hat{y}}$.

  • avgmode::AggregateMode: Must be one of the following: AggMode.Sum(), AggMode.Mean(), AggMode.WeightedSum, or AggMode.WeightedMean.

  • obsdim::ObsDimension: Specifies which of the array dimensions denotes the observations. see ?ObsDim for more information.

source
deriv2!(buffer::AbstractArray, loss, target::AbstractArray, output::AbstractArray, avgmode::AggregateMode, obsdim::ObsDimension) -> buffer

Compute the second derivative of the loss function for each pair in targets and outputs individually, and return either the weighted or unweighted sum or mean for each observation, depending on avgmode. The results are stored into the given vector buffer. This method will not allocate a temporary array.

Both arrays have to be of the same shape and size. Furthermore they have to have at least two array dimensions (i.e. so they must not be vectors).

Note: This function should always be type-stable. If it isn't, you likely found a bug.

Arguments

  • buffer::AbstractArray: Array to store the computed values in. Old values will be overwritten and lost.

  • loss::SupervisedLoss: The loss-function $L$ we are working with.

  • targets::AbstractArray: The array of ground truths $\mathbf{y}$.

  • outputs::AbstractArray: The array of predicted outputs $\mathbf{\hat{y}}$.

  • avgmode::AggregateMode: Must be one of the following: AggMode.Sum(), AggMode.Mean(), AggMode.WeightedSum, or AggMode.WeightedMean.

  • obsdim::ObsDimension: Specifies which of the array dimensions denotes the observations. see ?ObsDim for more information.

Examples

julia> targets = reshape(1:8, (2, 4)) ./ 8;

julia> outputs = reshape(1:2:16, (2, 4)) ./ 8;

julia> buffer = zeros(2);

julia> deriv2!(buffer, L2DistLoss(), targets, outputs, AggMode.Sum(), ObsDim.First())
2-element Array{Float64,1}:
 8.0
 8.0

julia> buffer = zeros(4);

julia> deriv2!(buffer, L2DistLoss(), targets, outputs, AggMode.Sum(), ObsDim.Last())
4-element Array{Float64,1}:
 4.0
 4.0
 4.0
 4.0
source

Weighted Sum and Mean

Up to this point, all the averaging was performed in an unweighted manner. That means that each observation was treated as equal and had thus the same potential influence on the result. In this sub-section we will consider the situations in which we do want to explicitly specify the influence of each observation (i.e. we want to weigh them). When we say we "weigh" an observation, what it effectively boils down to is multiplying the result for that observation (i.e. the computed loss or derivative) with some number. This is done for every observation individually.

To get a better understand of what we are talking about, let us consider performing a weighting scheme manually. The following code will compute the loss for three observations, and then multiply the result of the second observation with the number 2, while the other two remains as they are. If we then sum up the results, we will see that the loss of the second observation was effectively counted twice.

julia> result = value.(L1DistLoss(), [1.,2,3], [2,5,-2]) .* [1,2,1]
3-element Array{Float64,1}:
 1.0
 6.0
 5.0

julia> sum(result)
12.0

The point of weighing observations is to inform the learning algorithm we are working with, that it is more important to us to predict some observations correctly than it is for others. So really, the concrete weight-factor matters less than the ratio between the different weights. In the example above the second observation was thus considered twice as important as any of the other two observations.

In the case of multi-dimensional arrays the process isn't that simple anymore. In such a scenario, computing the weighted sum (or weighted mean) can be thought of as having an additional step. First we either compute the sum or (unweighted) average for each observation (which results in a vector), and then we compute the weighted sum of all observations.

The following code snipped demonstrates how to compute the AvgMode.WeightedSum([2,1]) manually. This is not meant as an example of how to do it, but simply to show what is happening qualitatively. In this example we assume that we are working in a multi-variable regression setting, in which our data set has four observations with two target-variables each.

julia> targets = reshape(1:8, (2, 4)) ./ 8
2×4 Array{Float64,2}:
 0.125  0.375  0.625  0.875
 0.25   0.5    0.75   1.0

julia> outputs = reshape(1:2:16, (2, 4)) ./ 8
2×4 Array{Float64,2}:
 0.125  0.625  1.125  1.625
 0.375  0.875  1.375  1.875

julia> # WARNING: BAD CODE - ONLY FOR ILLUSTRATION

julia> tmp = sum(value.(L1DistLoss(), targets, outputs), dims=2) # assuming ObsDim.First()
2×1 Array{Float64,2}:
 1.5
 2.0

julia> sum(tmp .* [2, 1]) # weigh 1st observation twice as high
5.0

To manually compute the result for AvgMode.WeightedMean([2,1]) we follow a similar approach, but use the normalized weight vector in the last step.

julia> using Statistics # for access to "mean"

julia> # WARNING: BAD CODE - ONLY FOR ILLUSTRATION

julia> tmp = mean(value.(L1DistLoss(), targets, outputs), dims=2) # ObsDim.First()
2×1 Array{Float64,2}:
 0.375
 0.5

julia> sum(tmp .* [0.6666, 0.3333]) # weigh 1st observation twice as high
0.416625

Note that you can specify explicitly if you want to normalize the weight vector. That option is supported for computing the weighted sum, as well as for computing the weighted mean. See the documentation for AvgMode.WeightedSum and AvgMode.WeightedMean for more information.

The code-snippets above are of course very inefficient, because they allocate (multiple) temporary arrays. We only included them to demonstrate what is happening in terms of desired result / effect. For doing those computations efficiently we provide special methods for value, deriv, deriv2 and their mutating counterparts.

julia> value(L1DistLoss(), [1.,2,3], [2,5,-2], AvgMode.WeightedSum([1,2,1]))
12.0

julia> value(L1DistLoss(), [1.,2,3], [2,5,-2], AvgMode.WeightedMean([1,2,1]))
3.0

julia> value(L1DistLoss(), targets, outputs, AvgMode.WeightedSum([2,1]), ObsDim.First())
5.0

julia> value(L1DistLoss(), targets, outputs, AvgMode.WeightedMean([2,1]), ObsDim.First())
0.4166666666666667

We also provide this functionality for deriv and deriv2 respectively.

julia> deriv(L2DistLoss(), [1.,2,3], [2,5,-2], AvgMode.WeightedSum([1,2,1]))
4.0

julia> deriv(L2DistLoss(), [1.,2,3], [2,5,-2], AvgMode.WeightedMean([1,2,1]))
1.0

julia> deriv(L2DistLoss(), targets, outputs, AvgMode.WeightedSum([2,1]), ObsDim.First())
10.0

julia> deriv(L2DistLoss(), targets, outputs, AvgMode.WeightedMean([2,1]), ObsDim.First())
0.8333333333333334