Margin-based Losses
Margin-based loss functions are particularly useful for binary classification. In contrast to the distance-based losses, these do not care about the difference between true target and prediction. Instead they penalize predictions based on how well they agree with the sign of the target.
This section lists all the subtypes of MarginLoss
that are implemented in this package.
ZeroOneLoss
LossFunctions.ZeroOneLoss
— TypeZeroOneLoss <: MarginLoss
The classical classification loss. It penalizes every misclassified observation with a loss of 1
while every correctly classified observation has a loss of 0
. It is not convex nor continuous and thus seldom used directly. Instead one usually works with some classification-calibrated surrogate loss, such as L1HingeLoss.
\[L(a) = \begin{cases} 1 & \quad \text{if } a < 0 \\ 0 & \quad \text{if } a >= 0\\ \end{cases}\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
1 │------------┐ │ 1 │ │
│ | │ │ │
│ | │ │ │
│ | │ │_________________________│
│ | │ │ │
│ | │ │ │
│ | │ │ │
0 │ └------------│ -1 │ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y * h(x) y * h(x)
PerceptronLoss
LossFunctions.PerceptronLoss
— TypePerceptronLoss <: MarginLoss
The perceptron loss linearly penalizes every prediction where the resulting agreement <= 0
. It is Lipschitz continuous and convex, but not strictly convex.
\[L(a) = \max \{ 0, -a \}\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
2 │\. │ 0 │ ┌------------│
│ '.. │ │ | │
│ \. │ │ | │
│ '. │ │ | │
L │ '. │ L' │ | │
│ \. │ │ | │
│ '. │ │ | │
0 │ \.____________│ -1 │------------┘ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ
L1HingeLoss
LossFunctions.L1HingeLoss
— TypeL1HingeLoss <: MarginLoss
The hinge loss linearly penalizes every predicition where the resulting agreement < 1
. It is Lipschitz continuous and convex, but not strictly convex.
\[L(a) = \max \{ 0, 1 - a \}\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
3 │'\. │ 0 │ ┌------│
│ ''_ │ │ | │
│ \. │ │ | │
│ '. │ │ | │
L │ ''_ │ L' │ | │
│ \. │ │ | │
│ '. │ │ | │
0 │ ''_______│ -1 │------------------┘ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ
SmoothedL1HingeLoss
LossFunctions.SmoothedL1HingeLoss
— TypeSmoothedL1HingeLoss <: MarginLoss
As the name suggests a smoothed version of the L1 hinge loss. It is Lipschitz continuous and convex, but not strictly convex.
\[L(a) = \begin{cases} \frac{0.5}{\gamma} \cdot \max \{ 0, 1 - a \} ^2 & \quad \text{if } a \ge 1 - \gamma \\ 1 - \frac{\gamma}{2} - a & \quad \text{otherwise}\\ \end{cases}\]
Lossfunction (γ=2) Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
2 │\. │ 0 │ ,r------│
│ '. │ │ ./' │
│ \. │ │ ,/ │
│ '. │ │ ./' │
L │ '. │ L' │ ,' │
│ \. │ │ ,/ │
│ ', │ │ ./' │
0 │ '*-._________│ -1 │______./ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ
ModifiedHuberLoss
LossFunctions.ModifiedHuberLoss
— TypeModifiedHuberLoss <: MarginLoss
A special (4 times scaled) case of the SmoothedL1HingeLoss
with γ=2
. It is Lipschitz continuous and convex, but not strictly convex.
\[L(a) = \begin{cases} \max \{ 0, 1 - a \} ^2 & \quad \text{if } a \ge -1 \\ - 4 a & \quad \text{otherwise}\\ \end{cases}\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
5 │ '. │ 0 │ .+-------│
│ '. │ │ ./' │
│ '\ │ │ ,/ │
│ \ │ │ ,/ │
L │ '. │ L' │ ./ │
│ '. │ │ ./' │
│ \. │ │______/' │
0 │ '-.________│ -5 │ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ
DWDMarginLoss
LossFunctions.DWDMarginLoss
— TypeDWDMarginLoss <: MarginLoss
The distance weighted discrimination margin loss. It is a differentiable generalization of the L1HingeLoss that is different than the SmoothedL1HingeLoss. It is Lipschitz continuous and convex, but not strictly convex.
\[L(a) = \begin{cases} 1 - a & \quad \text{if } a \le \frac{q}{q+1} \\ \frac{1}{a^q} \frac{q^q}{(q+1)^{q+1}} & \quad \text{otherwise}\\ \end{cases}\]
Lossfunction (q=1) Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
2 │ ". │ 0 │ ._r-│
│ \. │ │ ./ │
│ ', │ │ ./ │
│ \. │ │ / │
L │ "\. │ L' │ . │
│ \. │ │ / │
│ ":__ │ │ ; │
0 │ '""---│ -1 │---------------┘ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ
L2MarginLoss
LossFunctions.L2MarginLoss
— TypeL2MarginLoss <: MarginLoss
The margin-based least-squares loss for classification, which penalizes every prediction where agreement != 1
quadratically. It is locally Lipschitz continuous and strongly convex.
\[L(a) = {\left( 1 - a \right)}^2\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
5 │ . │ 2 │ ,r│
│ '. │ │ ,/ │
│ '\ │ │ ,/ │
│ \ │ ├ ,/ ┤
L │ '. │ L' │ ./ │
│ '. │ │ ./ │
│ \. .│ │ ./ │
0 │ '-.____.-' │ -3 │ ./ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ
L2HingeLoss
LossFunctions.L2HingeLoss
— TypeL2HingeLoss <: MarginLoss
The truncated least squares loss quadratically penalizes every predicition where the resulting agreement < 1
. It is locally Lipschitz continuous and convex, but not strictly convex.
\[L(a) = \max \{ 0, 1 - a \}^2\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
5 │ . │ 0 │ ,r------│
│ '. │ │ ,/ │
│ '\ │ │ ,/ │
│ \ │ │ ,/ │
L │ '. │ L' │ ./ │
│ '. │ │ ./ │
│ \. │ │ ./ │
0 │ '-.________│ -5 │ ./ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ
LogitMarginLoss
LossFunctions.LogitMarginLoss
— TypeLogitMarginLoss <: MarginLoss
The margin version of the logistic loss. It is infinitely many times differentiable, strictly convex, and Lipschitz continuous.
\[L(a) = \ln (1 + e^{-a})\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
2 │ \. │ 0 │ ._--/""│
│ \. │ │ ../' │
│ \. │ │ ./ │
│ \.. │ │ ./' │
L │ '-_ │ L' │ .,' │
│ '-_ │ │ ./ │
│ '\-._ │ │ .,/' │
0 │ '""*-│ -1 │__.--'' │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -4 4
y ⋅ ŷ y ⋅ ŷ
ExpLoss
LossFunctions.ExpLoss
— TypeExpLoss <: MarginLoss
The margin-based exponential loss for classification, which penalizes every prediction exponentially. It is infinitely many times differentiable, locally Lipschitz continuous and strictly convex, but not clipable.
\[L(a) = e^{-a}\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
5 │ \. │ 0 │ _,,---:'""│
│ l │ │ _r/"' │
│ l. │ │ .r/' │
│ ": │ │ .r' │
L │ \. │ L' │ ./ │
│ "\.. │ │ .' │
│ '":,_ │ │ ,' │
0 │ ""---:.__│ -5 │ ./ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ
SigmoidLoss
LossFunctions.SigmoidLoss
— TypeSigmoidLoss <: MarginLoss
Continuous loss which penalizes every prediction with a loss within in the range (0,2). It is infinitely many times differentiable, Lipschitz continuous but nonconvex.
\[L(a) = 1 - \tanh(a)\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
2 │""'--,. │ 0 │.. ..│
│ '\. │ │ "\. ./" │
│ '. │ │ ', ,' │
│ \. │ │ \ / │
L │ "\. │ L' │ \ / │
│ \. │ │ \. ./ │
│ \, │ │ \. ./ │
0 │ '"-:.__│ -1 │ ',_,' │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ