Working with Losses

Even though they are called loss "functions", this package implements them as immutable types instead of true Julia functions. There are good reasons for that. For example it allows us to specify the properties of losse functions explicitly (e.g. isconvex(myloss)). It also makes for a more consistent API when it comes to computing the value or the derivative. Some loss functions even have additional parameters that need to be specified, such as the $\epsilon$ in the case of the $\epsilon$-insensitive loss. Here, types allow for member variables to hide that information away from the method signatures.

In order to avoid potential confusions with true Julia functions, we will refer to "loss functions" as "losses" instead. The available losses share a common interface for the most part. This section will provide an overview of the basic functionality that is available for all the different types of losses. We will discuss how to create a loss, how to compute its value and derivative, and how to query its properties.

Instantiating a Loss

Losses are immutable types. As such, one has to instantiate one in order to work with it. For most losses, the constructors do not expect any parameters.

julia> L2DistLoss()
LPDistLoss{2}()

julia> HingeLoss()
L1HingeLoss()

We just said that we need to instantiate a loss in order to work with it. One could be inclined to belief, that it would be more memory-efficient to "pre-allocate" a loss when using it in more than one place.

julia> loss = L2DistLoss()
LPDistLoss{2}()

julia> value(loss, 2, 3)
1

However, that is a common oversimplification. Because all losses are immutable types, they can live on the stack and thus do not come with a heap-allocation overhead.

Even more interesting in the example above, is that for such losses as L2DistLoss, which do not have any constructor parameters or member variables, there is no additional code executed at all. Such singletons are only used for dispatch and don't even produce any additional code, which you can observe for yourself in the code below. As such they are zero-cost abstractions.

julia> v1(loss,t,y) = value(loss,t,y)

julia> v2(t,y) = value(L2DistLoss(),t,y)

julia> @code_llvm v1(loss, 2, 3)
define i64 @julia_v1_70944(i64, i64) #0 {
top:
  %2 = sub i64 %1, %0
  %3 = mul i64 %2, %2
  ret i64 %3
}

julia> @code_llvm v2(2, 3)
define i64 @julia_v2_70949(i64, i64) #0 {
top:
  %2 = sub i64 %1, %0
  %3 = mul i64 %2, %2
  ret i64 %3
}

On the other hand, some types of losses are actually more comparable to whole families of losses instead of just a single one. For example, the immutable type L1EpsilonInsLoss has a free parameter $\epsilon$. Each concrete $\epsilon$ results in a different concrete loss of the same family of epsilon-insensitive losses.

julia> L1EpsilonInsLoss(0.5)
L1EpsilonInsLoss{Float64}(0.5)

julia> L1EpsilonInsLoss(1)
L1EpsilonInsLoss{Float64}(1.0)

For such losses that do have parameters, it can make a slight difference to pre-instantiate a loss. While they will live on the stack, the constructor usually performs some assertions and conversion for the given parameter. This can come at a slight overhead. At the very least it will not produce the same exact code when pre-instantiated. Still, the fact that they are immutable makes them very efficient abstractions with little to no performance overhead, and zero memory allocations on the heap.

Computing the Values

The first thing we may want to do is compute the loss for some observation (singular). In fact, all losses are implemented on single observations under the hood. The core function to compute the value of a loss is value. We will see throughout the documentation that this function allows for a lot of different method signatures to accomplish a variety of tasks.

LearnBase.value — Method.

value(loss, target::Number, output::Number) -> Number

Compute the (non-negative) numeric result for the loss-function denoted by the parameter loss and return it. Note that target and output can be of different numeric type, in which case promotion is performed in the manner appropriate for the given loss.

Note: This function should always be type-stable. If it isn't, you likely found a bug.

\[L : Y \times \mathbb{R} \rightarrow [0,\infty)\]

Arguments

loss::SupervisedLoss: The loss-function $L$ we want to compute the value with.
target::Number: The ground truth $y \in Y$ of the observation.
output::Number: The predicted output $\hat{y} \in \mathbb{R}$ for the observation.

Examples

#               loss        y    ŷ
julia> value(L1DistLoss(), 1.0, 2.0)
1.0

julia> value(L1DistLoss(), 1, 2)
1

julia> value(L1HingeLoss(), -1, 2)
3

julia> value(L1HingeLoss(), -1f0, 2f0)
3.0f0

Working with Losses

Instantiating a Loss

Computing the Values

Computing the 1st Derivatives

Computing the 2nd Derivatives

Function Closures

Properties of a Loss