Working with Losses

Working with Losses

Even though they are called loss "functions", this package implements them as immutable types instead of true Julia functions. There are good reasons for that. For example it allows us to specify the properties of losse functions explicitly (e.g. isconvex(myloss)). It also makes for a more consistent API when it comes to computing the value or the derivative. Some loss functions even have additional parameters that need to be specified, such as the $\epsilon$ in the case of the $\epsilon$-insensitive loss. Here, types allow for member variables to hide that information away from the method signatures.

In order to avoid potential confusions with true Julia functions, we will refer to "loss functions" as "losses" instead. The available losses share a common interface for the most part. This section will provide an overview of the basic functionality that is available for all the different types of losses. We will discuss how to create a loss, how to compute its value and derivative, and how to query its properties.

Instantiating a Loss

Losses are immutable types. As such, one has to instantiate one in order to work with it. For most losses, the constructors do not expect any parameters.

julia> L2DistLoss()
LPDistLoss{2}()

julia> HingeLoss()
L1HingeLoss()

We just said that we need to instantiate a loss in order to work with it. One could be inclined to belief, that it would be more memory-efficient to "pre-allocate" a loss when using it in more than one place.

julia> loss = L2DistLoss()
LPDistLoss{2}()

julia> value(loss, 2, 3)
1

However, that is a common oversimplification. Because all losses are immutable types, they can live on the stack and thus do not come with a heap-allocation overhead.

Even more interesting in the example above, is that for such losses as L2DistLoss, which do not have any constructor parameters or member variables, there is no additional code executed at all. Such singletons are only used for dispatch and don't even produce any additional code, which you can observe for yourself in the code below. As such they are zero-cost abstractions.

julia> v1(loss,t,y) = value(loss,t,y)

julia> v2(t,y) = value(L2DistLoss(),t,y)

julia> @code_llvm v1(loss, 2, 3)
define i64 @julia_v1_70944(i64, i64) #0 {
top:
  %2 = sub i64 %1, %0
  %3 = mul i64 %2, %2
  ret i64 %3
}

julia> @code_llvm v2(2, 3)
define i64 @julia_v2_70949(i64, i64) #0 {
top:
  %2 = sub i64 %1, %0
  %3 = mul i64 %2, %2
  ret i64 %3
}

On the other hand, some types of losses are actually more comparable to whole families of losses instead of just a single one. For example, the immutable type L1EpsilonInsLoss has a free parameter $\epsilon$. Each concrete $\epsilon$ results in a different concrete loss of the same family of epsilon-insensitive losses.

julia> L1EpsilonInsLoss(0.5)
L1EpsilonInsLoss{Float64}(0.5)

julia> L1EpsilonInsLoss(1)
L1EpsilonInsLoss{Float64}(1.0)

For such losses that do have parameters, it can make a slight difference to pre-instantiate a loss. While they will live on the stack, the constructor usually performs some assertions and conversion for the given parameter. This can come at a slight overhead. At the very least it will not produce the same exact code when pre-instantiated. Still, the fact that they are immutable makes them very efficient abstractions with little to no performance overhead, and zero memory allocations on the heap.

Computing the Values

The first thing we may want to do is compute the loss for some observation (singular). In fact, all losses are implemented on single observations under the hood. The core function to compute the value of a loss is value. We will see throughout the documentation that this function allows for a lot of different method signatures to accomplish a variety of tasks.

LearnBase.valueMethod.
value(loss, target::Number, output::Number) -> Number

Compute the (non-negative) numeric result for the loss-function denoted by the parameter loss and return it. Note that target and output can be of different numeric type, in which case promotion is performed in the manner appropriate for the given loss.

Note: This function should always be type-stable. If it isn't, you likely found a bug.

\[L : Y \times \mathbb{R} \rightarrow [0,\infty)\]

Arguments

  • loss::SupervisedLoss: The loss-function $L$ we want to compute the value with.

  • target::Number: The ground truth $y \in Y$ of the observation.

  • output::Number: The predicted output $\hat{y} \in \mathbb{R}$ for the observation.

Examples

#               loss        y    ŷ
julia> value(L1DistLoss(), 1.0, 2.0)
1.0

julia> value(L1DistLoss(), 1, 2)
1

julia> value(L1HingeLoss(), -1, 2)
3

julia> value(L1HingeLoss(), -1f0, 2f0)
3.0f0
source

It may be interesting to note, that this function also supports broadcasting and all the syntax benefits that come with it. Thus, it is quite simple to make use of preallocated memory for storing the element-wise results.

julia> value.(L1DistLoss(), [1,2,3], [2,5,-2])
3-element Array{Int64,1}:
 1
 3
 5

julia> buffer = zeros(3); # preallocate a buffer

julia> buffer .= value.(L1DistLoss(), [1.,2,3], [2,5,-2])
3-element Array{Float64,1}:
 1.0
 3.0
 5.0

Furthermore, with the loop fusion changes that were introduced in Julia 0.6, one can also easily weight the influence of each observation without allocating a temporary array.

julia> buffer .= value.(L1DistLoss(), [1.,2,3], [2,5,-2]) .* [2,1,0.5]
3-element Array{Float64,1}:
 2.0
 3.0
 2.5

Even though broadcasting is supported, we do expose a vectorized method natively. This is done mainly for API consistency reasons. Internally it even uses broadcast itself, but it does provide the additional benefit of a more reliable type-inference.

LearnBase.valueMethod.
value(loss, targets::AbstractArray, outputs::AbstractArray)

Compute the result of the loss function for each index-pair in targets and outputs individually and return the result as an array of the appropriate size.

In the case that the two parameters are arrays with a different number of dimensions, broadcast will be performed. Note that the given parameters are expected to have the same size in the dimensions they share.

Note: This function should always be type-stable. If it isn't, you likely found a bug.

Arguments

  • loss::SupervisedLoss: The loss-function $L$ we are working with.

  • targets::AbstractArray: The array of ground truths $\mathbf{y}$.

  • outputs::AbstractArray: The array of predicted outputs $\mathbf{\hat{y}}$.

Examples

julia> value(L2DistLoss(), [1.0, 2.0, 3.0], [2, 5, -2])
3-element Array{Float64,1}:
  1.0
  9.0
 25.0
source

We also provide a mutating version for the same reasons. It even utilizes broadcast! underneath.

LearnBase.value!Method.
value!(buffer::AbstractArray, loss, targets::AbstractArray, outputs::AbstractArray) -> buffer

Compute the result of the loss function for each index-pair in targets and outputs individually, and store them in the preallocated buffer. Note that buffer has to be of the appropriate size.

In the case that the two parameters, targets and outputs, are arrays with a different number of dimensions, broadcast will be performed. Note that the given parameters are expected to have the same size in the dimensions they share.

Note: This function should always be type-stable. If it isn't, you likely found a bug.

Arguments

  • buffer::AbstractArray: Array to store the computed values in. Old values will be overwritten and lost.

  • loss::SupervisedLoss: The loss-function $L$ we are working with.

  • targets::AbstractArray: The array of ground truths $\mathbf{y}$.

  • outputs::AbstractArray: The array of predicted outputs $\mathbf{\hat{y}}$.

Examples

julia> buffer = zeros(3); # preallocate a buffer

julia> value!(buffer, L2DistLoss(), [1.0, 2.0, 3.0], [2, 5, -2])
3-element Array{Float64,1}:
  1.0
  9.0
 25.0
source

Computing the 1st Derivatives

Maybe the more interesting aspect of loss functions are their derivatives. In fact, most of the popular learning algorithm in Supervised Learning, such as gradient descent, utilize the derivatives of the loss in one way or the other during the training process.

To compute the derivative of some loss we expose the function deriv. It supports the same exact method signatures as value. It may be interesting to note explicitly, that we always compute the derivative in respect to the predicted output, since we are interested in deducing in which direction the output should change.

LearnBase.derivMethod.
deriv(loss, target::Number, output::Number) -> Number

Compute the derivative for the loss-function (denoted by the parameter loss) in respect to the output. Note that target and output can be of different numeric type, in which case promotion is performed in the manner appropriate for the given loss.

Note: This function should always be type-stable. If it isn't, you likely found a bug.

Arguments

  • loss::SupervisedLoss: The loss-function $L$ we want to compute the derivative with.

  • target::Number: The ground truth $y \in Y$ of the observation.

  • output::Number: The predicted output $\hat{y} \in \mathbb{R}$ for the observation.

Examples

#               loss        y    ŷ
julia> deriv(L2DistLoss(), 1.0, 2.0)
2.0

julia> deriv(L2DistLoss(), 1, 2)
2

julia> deriv(L2HingeLoss(), -1, 2)
6

julia> deriv(L2HingeLoss(), -1f0, 2f0)
6.0f0
source

Similar to value, this function also supports broadcasting and all the syntax benefits that come with it. Thus, one can make use of preallocated memory for storing the element-wise derivatives.

julia> deriv.(L2DistLoss(), [1,2,3], [2,5,-2])
3-element Array{Int64,1}:
   2
   6
 -10

julia> buffer = zeros(3); # preallocate a buffer

julia> buffer .= deriv.(L2DistLoss(), [1.,2,3], [2,5,-2])
3-element Array{Float64,1}:
   2.0
   6.0
 -10.0

Furthermore, with the loop fusion changes that were introduced in Julia 0.6, one can also easily weight the influence of each observation without allocating a temporary array.

julia> buffer .= deriv.(L2DistLoss(), [1.,2,3], [2,5,-2]) .* [2,1,0.5]
3-element Array{Float64,1}:
  4.0
  6.0
 -5.0

While broadcast is supported, we do expose a vectorized method natively. This is done mainly for API consistency reasons. Internally it even uses broadcast itself, but it does provide the additional benefit of a more reliable type-inference.

LearnBase.derivMethod.
deriv(loss, targets::AbstractArray, outputs::AbstractArray)

Compute the derivative of the loss function in respect to the output for each index-pair in targets and outputs individually and return the result as an array of the appropriate size.

In the case that the two parameters are arrays with a different number of dimensions, broadcast will be performed. Note that the given parameters are expected to have the same size in the dimensions they share.

Note: This function should always be type-stable. If it isn't, you likely found a bug.

Arguments

  • loss::SupervisedLoss: The loss-function $L$ we are working with.

  • targets::AbstractArray: The array of ground truths $\mathbf{y}$.

  • outputs::AbstractArray: The array of predicted outputs $\mathbf{\hat{y}}$.

Examples

julia> deriv(L2DistLoss(), [1.0, 2.0, 3.0], [2, 5, -2])
3-element Array{Float64,1}:
   2.0
   6.0
 -10.0
source

We also provide a mutating version for the same reasons. It even utilizes $broadcast!$ underneath.

LearnBase.deriv!Method.
deriv!(buffer::AbstractArray, loss, targets::AbstractArray, outputs::AbstractArray) -> buffer

Compute the derivative of the loss function in respect to the output for each index-pair in targets and outputs individually, and store them in the preallocated buffer. Note that buffer has to be of the appropriate size.

In the case that the two parameters, targets and outputs, are arrays with a different number of dimensions, broadcast will be performed. Note that the given parameters are expected to have the same size in the dimensions they share.

Note: This function should always be type-stable. If it isn't, you likely found a bug.

Arguments

  • buffer::AbstractArray: Array to store the computed values in. Old values will be overwritten and lost.

  • loss::SupervisedLoss: The loss-function $L$ we are working with.

  • targets::AbstractArray: The array of ground truths $\mathbf{y}$.

  • outputs::AbstractArray: The array of predicted outputs $\mathbf{\hat{y}}$.

Examples

julia> buffer = zeros(3); # preallocate a buffer

julia> deriv!(buffer, L2DistLoss(), [1.0, 2.0, 3.0], [2, 5, -2])
3-element Array{Float64,1}:
   2.0
   6.0
 -10.0
source

It is also possible to compute the value and derivative at the same time. For some losses that means less computation overhead.

value_deriv(loss, target::Number, output::Number) -> Tuple

Return the results of value and deriv as a tuple, in which the first element is the value and the second element the derivative.

In some cases this function can yield better performance, because the losses can make use of shared variables when computing the results. Note that target and output can be of different numeric type, in which case promotion is performed in the manner appropriate for the given loss.

Note: This function should always be type-stable. If it isn't, you likely found a bug.

Arguments

  • loss::SupervisedLoss: The loss-function $L$ we are working with.

  • target::Number: The ground truth $y \in Y$ of the observation.

  • output::Number: The predicted output $\hat{y} \in \mathbb{R}$

Examples

#                     loss         y    ŷ
julia> value_deriv(L2DistLoss(), -1.0, 3.0)
(16.0, 8.0)
source

Computing the 2nd Derivatives

Additionally to the first derivative, we also provide the corresponding methods for the second derivative through the function deriv2. Note again, that we always compute the derivative in respect to the predicted output.

LearnBase.deriv2Method.
deriv2(loss, target::Number, output::Number) -> Number

Compute the second derivative for the loss-function (denoted by the parameter loss) in respect to the output. Note that target and output can be of different numeric type, in which case promotion is performed in the manner appropriate for the given loss.

Note: This function should always be type-stable. If it isn't, you likely found a bug.

Arguments

  • loss::SupervisedLoss: The loss-function $L$ we want to compute the second derivative with.

  • target::Number: The ground truth $y \in Y$ of the observation.

  • output::Number: The predicted output $\hat{y} \in \mathbb{R}$ for the observation.

Examples

#               loss             y    ŷ
julia> deriv2(LogitDistLoss(), -0.5, 0.3)
0.42781939304058886

julia> deriv2(LogitMarginLoss(), -1f0, 2f0)
0.104993574f0
source

Just like deriv and value, this function also supports broadcasting and all the syntax benefits that come with it. Thus, one can make use of preallocated memory for storing the element-wise derivatives.

julia> deriv2.(LogitDistLoss(), [-0.5, 1.2, 3], [0.3, 2.3, -2])
3-element Array{Float64,1}:
 0.42781939304058886
 0.3747397590950412
 0.013296113341580313

julia> buffer = zeros(3); # preallocate a buffer

julia> buffer .= deriv2.(LogitDistLoss(), [-0.5, 1.2, 3], [0.3, 2.3, -2])
3-element Array{Float64,1}:
 0.42781939304058886
 0.3747397590950412
 0.013296113341580313

Furthermore deriv2 supports all the same method signatures as deriv does.

LearnBase.deriv2Method.
deriv2(loss, targets::AbstractArray, outputs::AbstractArray)

Compute the second derivative of the loss function in respect to the output for each index-pair in targets and outputs individually and return the result as an array of the appropriate size.

In the case that the two parameters are arrays with a different number of dimensions, broadcast will be performed. Note that the given parameters are expected to have the same size in the dimensions they share.

Note: This function should always be type-stable. If it isn't, you likely found a bug.

Arguments

  • loss::SupervisedLoss: The loss-function $L$ we are working with.

  • targets::AbstractArray: The array of ground truths $\mathbf{y}$.

  • outputs::AbstractArray: The array of predicted outputs $\mathbf{\hat{y}}$.

Examples

julia> deriv2(L2DistLoss(), [1.0, 2.0, 3.0], [2, 5, -2])
3-element Array{Float64,1}:
 2.0
 2.0
 2.0
source
deriv2!(buffer::AbstractArray, loss, targets::AbstractArray, outputs::AbstractArray) -> buffer

Compute the second derivative of the loss function in respect to the output for each index-pair in targets and outputs individually, and store them in the preallocated buffer. Note that buffer has to be of the appropriate size.

In the case that the two parameters, targets and outputs, are arrays with a different number of dimensions, broadcast will be performed. Note that the given parameters are expected to have the same size in the dimensions they share.

Note: This function should always be type-stable. If it isn't, you likely found a bug.

Arguments

  • buffer::AbstractArray: Array to store the computed values in. Old values will be overwritten and lost.

  • loss::SupervisedLoss: The loss-function $L$ we are working with.

  • targets::AbstractArray: The array of ground truths $\mathbf{y}$.

  • outputs::AbstractArray: The array of predicted outputs $\mathbf{\hat{y}}$.

Examples

julia> buffer = zeros(3); # preallocate a buffer

julia> deriv2!(buffer, L2DistLoss(), [1.0, 2.0, 3.0], [2, 5, -2])
3-element Array{Float64,1}:
 2.0
 2.0
 2.0
source

Function Closures

In some circumstances it may be convenient to have the loss function or its derivative as a proper Julia function. Instead of exporting special function names for every implemented loss (like l2distloss(...)), we provide the ability to generate a true function on the fly for any given loss.

value_fun(loss::SupervisedLoss) -> Function

Returns a new function that computes the value for the given loss. This new function will support all the signatures that value does.

julia> f = value_fun(L2DistLoss());

julia> f(-1.0, 3.0) # computes the value of L2DistLoss
16.0

julia> f.([1.,2], [4,7])
2-element Array{Float64,1}:
  9.0
 25.0
source
deriv_fun(loss::SupervisedLoss) -> Function

Returns a new function that computes the deriv for the given loss. This new function will support all the signatures that deriv does.

julia> g = deriv_fun(L2DistLoss());

julia> g(-1.0, 3.0) # computes the deriv of L2DistLoss
8.0

julia> g.([1.,2], [4,7])
2-element Array{Float64,1}:
  6.0
 10.0
source
deriv2_fun(loss::SupervisedLoss) -> Function

Returns a new function that computes the deriv2 (i.e. second derivative) for the given loss. This new function will support all the signatures that deriv2 does.

julia> g2 = deriv2_fun(L2DistLoss());

julia> g2(-1.0, 3.0) # computes the second derivative of L2DistLoss
2.0

julia> g2.([1.,2], [4,7])
2-element Array{Float64,1}:
 2.0
 2.0
source
value_deriv_fun(loss::SupervisedLoss) -> Function

Returns a new function that computes the value_deriv for the given loss. This new function will support all the signatures that value_deriv does.

julia> fg = value_deriv_fun(L2DistLoss());

julia> fg(-1.0, 3.0) # computes the second derivative of L2DistLoss
(16.0, 8.0)
source

Properties of a Loss

In some situations it can be quite useful to assert certain properties about a loss-function. One such scenario could be when implementing an algorithm that requires the loss to be strictly convex or Lipschitz continuous. Note that we will only skim over the defintions in most cases. A good treatment of all of the concepts involved can be found in either [BOYD2004] or [STEINWART2008].

[BOYD2004]

Stephen Boyd and Lieven Vandenberghe. "Convex Optimization". Cambridge University Press, 2004.

[STEINWART2008]

Steinwart, Ingo, and Andreas Christmann. "Support vector machines". Springer Science & Business Media, 2008.

This package uses functions to represent individual properties of a loss. It follows a list of implemented property-functions defined in LearnBase.jl.

LearnBase.isconvexFunction.
isconvex(loss::SupervisedLoss) -> Bool

Return true if the given loss denotes a convex function. A function $f : \mathbb{R}^n \rightarrow \mathbb{R}$ is convex if its domain is a convex set and if for all $x, y$ in that domain, with $\theta$ such that for $0 \leq \theta \leq 1$, we have

\[f(\theta x + (1 - \theta) y) \leq \theta f(x) + (1 - \theta) f(y)\]

Examples

julia> isconvex(LPDistLoss(0.5))
false

julia> isconvex(ZeroOneLoss())
false

julia> isconvex(L1DistLoss())
true

julia> isconvex(L2DistLoss())
true
source
isstrictlyconvex(loss::SupervisedLoss) -> Bool

Return true if the given loss denotes a strictly convex function. A function $f : \mathbb{R}^n \rightarrow \mathbb{R}$ is strictly convex if its domain is a convex set and if for all $x, y$ in that domain where $x \neq y$, with $\theta$ such that for $0 < \theta < 1$, we have

\[f(\theta x + (1 - \theta) y) < \theta f(x) + (1 - \theta) f(y)\]

Examples

julia> isstrictlyconvex(L1DistLoss())
false

julia> isstrictlyconvex(LogitDistLoss())
true

julia> isstrictlyconvex(L2DistLoss())
true
source
isstronglyconvex(loss::SupervisedLoss) -> Bool

Return true if the given loss denotes a strongly convex function. A function $f : \mathbb{R}^n \rightarrow \mathbb{R}$ is $m$-strongly convex if its domain is a convex set and if $\forall x,y \in$ dom $f$ where $x \neq y$, and $\theta$ such that for $0 \le \theta \le 1$ , we have

\[f(\theta x + (1 - \theta)y) < \theta f(x) + (1 - \theta) f(y) - 0.5 m \cdot \theta (1 - \theta) {\| x - y \|}_2^2\]

In a more familiar setting, if the loss function is differentiable we have

\[\left( \nabla f(x) - \nabla f(y) \right)^\top (x - y) \ge m {\| x - y\|}_2^2\]

Examples

julia> isstronglyconvex(L1DistLoss())
false

julia> isstronglyconvex(LogitDistLoss())
false

julia> isstronglyconvex(L2DistLoss())
true
source
isdifferentiable(loss::SupervisedLoss, [x::Number]) -> Bool

Return true if the given loss is differentiable (optionally limited to the given point x if specified).

A function $f : \mathbb{R}^{n} \rightarrow \mathbb{R}^{m}$ is differentiable at a point $x \in$ int dom $f$, if there exists a matrix $Df(x) \in \mathbb{R}^{m \times n}$ such that it satisfies:

\[\lim_{z \neq x, z \to x} \frac{{\|f(z) - f(x) - Df(x)(z-x)\|}_2}{{\|z - x\|}_2} = 0\]

A function is differentiable if its domain is open and it is differentiable at every point $x$.

Examples

julia> isdifferentiable(L1DistLoss())
false

julia> isdifferentiable(L1DistLoss(), 1)
true

julia> isdifferentiable(L2DistLoss())
true
source
istwicedifferentiable(loss::SupervisedLoss, [x::Number]) -> Bool

Return true if the given loss is differentiable (optionally limited to the given point x if specified).

A function $f : \mathbb{R}^{n} \rightarrow \mathbb{R}$ is said to be twice differentiable at a point $x \in$ int dom $f$, if the function derivative for $\nabla f$ exists at $x$.

\[\nabla^2 f(x) = D \nabla f(x)\]

A function is twice differentiable if its domain is open and it is twice differentiable at every point $x$.

Examples

julia> isdifferentiable(L1DistLoss())
false

julia> isdifferentiable(L1DistLoss(), 1)
true

julia> isdifferentiable(L2DistLoss())
true
source
islocallylipschitzcont(loss::SupervisedLoss) -> Bool

Return true if the given loss function is locally-Lipschitz continous.

A supervised loss $L : Y \times \mathbb{R} \rightarrow [0, \infty)$ is called locally Lipschitz continuous if $\forall a \ge 0$ there exists a constant :math:c_a \ge 0, such that

\[\sup_{y \in Y} \left| L(y,t) − L(y,t′) \right| \le c_a |t − t′|, \qquad t,t′ \in [−a,a]\]

Every convex function is locally lipschitz continuous

Examples

julia> islocallylipschitzcont(ExpLoss())
true

julia> islocallylipschitzcont(SigmoidLoss())
true
source
islipschitzcont(loss::SupervisedLoss) -> Bool

Return true if the given loss function is Lipschitz continuous.

A supervised loss function $L : Y \times \mathbb{R} \rightarrow [0, \infty)$ is Lipschitz continous, if there exists a finite constant $M < \infty$ such that

\[|L(y, t) - L(y, t′)| \le M |t - t′|, \qquad \forall (y, t) \in Y \times \mathbb{R}\]

Examples

julia> islipschitzcont(SigmoidLoss())
true

julia> islipschitzcont(ExpLoss())
false
source
LearnBase.isnemitskiFunction.
isnemitski(loss::SupervisedLoss) -> Bool

Return true if the given loss denotes a Nemitski loss function.

We call a supervised loss function $L : Y \times \mathbb{R} \rightarrow [0,\infty)$ a Nemitski loss if there exist a measurable function $b : Y \rightarrow [0, \infty)$ and an increasing function $h : [0, \infty) \rightarrow [0, \infty)$ such that

\[L(y,\hat{y}) \le b(y) + h(|\hat{y}|), \qquad (y, \hat{y}) \in Y \times \mathbb{R}.\]

If a loss if locally lipsschitz continuous then it is a Nemitski loss

source
LearnBase.isclipableFunction.
isclipable(loss::SupervisedLoss) -> Bool

Return true if the given loss function is clipable. A supervised loss $L : Y \times \mathbb{R} \rightarrow [0, \infty)$ can be clipped at $M > 0$ if, for all $(y,t) \in Y \times \mathbb{R}$,

\[L(y, \hat{t}) \le L(y, t)\]

where $\hat{t}$ denotes the clipped value of $t$ at $\pm M$. That is

\[\hat{t} = \begin{cases} -M & \quad \text{if } t < -M \\ t & \quad \text{if } t \in [-M, M] \\ M & \quad \text{if } t > M \end{cases}\]

Examples

julia> isclipable(ExpLoss())
false

julia> isclipable(L2DistLoss())
true
source
ismarginbased(loss::SupervisedLoss) -> Bool

Return true if the given loss is a margin-based loss.

A supervised loss function $L : Y \times \mathbb{R} \rightarrow [0, \infty)$ is said to be margin-based, if there exists a representing function $\psi : \mathbb{R} \rightarrow [0, \infty)$ satisfying

\[L(y, \hat{y}) = \psi (y \cdot \hat{y}), \qquad (y, \hat{y}) \in Y \times \mathbb{R}\]

Examples

julia> ismarginbased(HuberLoss(2))
false

julia> ismarginbased(L2MarginLoss())
true
source
isclasscalibrated(loss::SupervisedLoss) -> Bool
source
isdistancebased(loss::SupervisedLoss) -> Bool

Return true ifthe given loss is a distance-based loss.

A supervised loss function $L : Y \times \mathbb{R} \rightarrow [0, \infty)$ is said to be distance-based, if there exists a representing function $\psi : \mathbb{R} \rightarrow [0, \infty)$ satisfying $\psi (0) = 0$ and

\[L(y, \hat{y}) = \psi (\hat{y} - y), \qquad (y, \hat{y}) \in Y \times \mathbb{R}\]

Examples

julia> isdistancebased(HuberLoss(2))
true

julia> isdistancebased(L2MarginLoss())
false
source
issymmetric(loss::SupervisedLoss) -> Bool

Return true if the given loss is a symmetric loss.

A function $f : \mathbb{R} \rightarrow [0,\infty)$ is said to be symmetric about origin if we have

\[f(x) = f(-x), \qquad \forall x \in \mathbb{R}\]

A distance-based loss is said to be symmetric if its representing function is symmetric.

Examples

julia> issymmetric(QuantileLoss(0.2))
false

julia> issymmetric(LPDistLoss(2))
true
source