Altering existing Losses

Altering existing Losses

There are situations in which one wants to work with slightly altered versions of specific loss functions. This package provides two generic ways to create such meta losses for specific families of loss functions.

  1. Scaling a supervised loss by a constant real number. This is done at compile time and can in some situations even lead to simpler code (e.g. in the case of the derivative for a L2DistLoss)

  2. Weighting the classes of a margin-based loss differently in order to better deal with unbalanced binary classification problems.

Scaling a Supervised Loss

It is quite common in machine learning courses to define the least squares loss as $\frac{1}{2} (\hat{y} - y)^2$, while this package implements that type of loss as an $L_2$ distance loss using $(\hat{y} - y)^2$, i.e. without the constant scale factor.

For situations in which one wants a scaled version of an existing loss type, we provide the concept of a scaled loss. The difference is literally only a constant real number that gets multiplied to the existing implementation of the loss function (and derivatives).

LearnBase.scaled โ€” Function.
scaled(loss::SupervisedLoss, K)

Returns a version of loss that is uniformly scaled by K. This function dispatches on the type of loss in order to choose the appropriate type of scaled loss that will be used as the decorator. For example, if typeof(loss) <: DistanceLoss then the given loss will be boxed into a ScaledDistanceLoss.

Note: If typeof(K) <: Number, then this method will poison the type-inference of the calling scope. This is because K will be promoted to a type parameter. For a typestable version use the following signature: scaled(loss, Val(K))

source
julia> lsloss = 1/2 * L2DistLoss()
ScaledDistanceLoss{LPDistLoss{2},0.5}(LPDistLoss{2}())

julia> value(L2DistLoss(), 0.0, 4.0)
16.0

julia> value(lsloss, 0.0, 4.0)
8.0

While the resulting loss is of the same basic family as the original loss (i.e. margin-based or distance-based), it is not a sub-type of it.

julia> typeof(lsloss) <: DistanceLoss
true

julia> typeof(lsloss) <: L2DistLoss
false

As you have probably noticed, the constant scale factor gets promoted to a type-parameter. This can be quite an overhead when done on the fly every time the loss value is computed. To avoid this one can make use of Val to specify the scale factor in a type-stable manner.

julia> lsloss = scaled(L2DistLoss(), Val(0.5))
ScaledDistanceLoss{LPDistLoss{2},0.5}(LPDistLoss{2}())

Storing the scale factor as a type-parameter instead of a member variable has some nice advantages. For one it makes it possible to define new types of losses using simple type-aliases.

julia> const LeastSquaresLoss = LossFunctions.ScaledDistanceLoss{L2DistLoss,0.5}
ScaledDistanceLoss{LPDistLoss{2},0.5}

julia> value(LeastSquaresLoss(), 0.0, 4.0)
8.0

Furthermore, it allows the compiler to do some quite convenient optimizations if possible. For example the compiler is able to figure out that the derivative simplifies for our newly defined LeastSquaresLoss, because 1/2 * 2 cancels each other. This is accomplished using the power of @fastmath.

julia> @code_llvm deriv(L2DistLoss(), 0.0, 4.0)
define double @julia_deriv_71652(double, double) #0 {
top:
  %2 = fsub double %1, %0
  %3 = fmul double %2, 2.000000e+00
  ret double %3
}

julia> @code_llvm deriv(LeastSquaresLoss(), 0.0, 4.0)
define double @julia_deriv_71659(double, double) #0 {
top:
  %2 = fsub double %1, %0
  ret double %2
}

Reweighting a Margin Loss

It is not uncommon in classification scenarios to find yourself working with in-balanced data sets, where one class has much more observations than the other one. There are different strategies to deal with this kind of problem. The approach that this package provides is to weight the loss for the classes differently. This basically means that we penalize mistakes in one class more than mistakes in the other class. More specifically we scale the loss of the positive class by the weight-factor $w$ and the loss of the negative class with $1-w$.

if target > 0
    w * loss(target, output)
else
    (1-w) * loss(target, output)
end

Instead of providing special functions to compute a class-weighted loss, we instead expose a generic way to create new weighted versions of already existing unweighted losses. This way, every existing subtype of MarginLoss can be re-weighted arbitrarily. Furthermore, it allows every algorithm that expects a binary loss to work with weighted binary losses as well.

LossFunctions.weightedloss โ€” Function.
weightedloss(loss, weight)

Returns a weighted version of loss for which the value of the positive class is changed to be weight times its original, and the negative class 1 - weight times its original respectively.

Note: If typeof(weight) <: Number, then this method will poison the type-inference of the calling scope. This is because weight will be promoted to a type parameter. For a typestable version use the following signature: weightedloss(loss, Val(weight))

source
julia> myloss = weightedloss(HingeLoss(), 0.8)
WeightedBinaryLoss{L1HingeLoss,0.8}(L1HingeLoss())

julia> value(myloss, 1.0, -4.0) # positive class
4.0

julia> value(HingeLoss(), 1.0, -4.0)
5.0

julia> value(myloss, -1.0, 4.0) # negative class
0.9999999999999998

julia> value(HingeLoss(), -1.0, 4.0)
5.0

Note that the scaled version of a margin-based loss does not anymore belong to the family of margin-based losses itself. In other words the resulting loss is neither a subtype of MarginLoss, nor of the original type of loss.

julia> typeof(myloss) <: MarginLoss
false

julia> typeof(myloss) <: HingeLoss
false

Similar to scaled losses, the constant weight factor gets promoted to a type-parameter. This can be quite an overhead when done on the fly every time the loss value is computed. To avoid this one can make use of Val to specify the scale factor in a type-stable manner.

julia> myloss = weightedloss(HingeLoss(), Val(0.8))
WeightedBinaryLoss{L1HingeLoss,0.8}(L1HingeLoss())

Storing the scale factor as a type-parameter instead of a member variable has a nice advantage. It makes it possible to define new types of losses using simple type-aliases.

julia> const MyWeightedHingeLoss = LossFunctions.WeightedBinaryLoss{HingeLoss,0.8}
WeightedBinaryLoss{L1HingeLoss,0.8}

julia> value(MyWeightedHingeLoss(), 1.0, -4.0)
4.0