Margin-based Losses
Margin-based loss functions are particularly useful for binary classification. In contrast to the distance-based losses, these do not care about the difference between true target and prediction. Instead they penalize predictions based on how well they agree with the sign of the target.
This section lists all the subtypes of MarginLoss that are implemented in this package.
ZeroOneLoss
LossFunctions.ZeroOneLoss — TypeZeroOneLoss <: MarginLossThe classical classification loss. It penalizes every misclassified observation with a loss of 1 while every correctly classified observation has a loss of 0. It is not convex nor continuous and thus seldom used directly. Instead one usually works with some classification-calibrated surrogate loss, such as L1HingeLoss.
\[L(a) = \begin{cases} 1 & \quad \text{if } a < 0 \\ 0 & \quad \text{if } a >= 0\\ \end{cases}\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
1 │------------┐ │ 1 │ │
│ | │ │ │
│ | │ │ │
│ | │ │_________________________│
│ | │ │ │
│ | │ │ │
│ | │ │ │
0 │ └------------│ -1 │ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y * h(x) y * h(x)PerceptronLoss
LossFunctions.PerceptronLoss — TypePerceptronLoss <: MarginLossThe perceptron loss linearly penalizes every prediction where the resulting agreement <= 0. It is Lipschitz continuous and convex, but not strictly convex.
\[L(a) = \max \{ 0, -a \}\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
2 │\. │ 0 │ ┌------------│
│ '.. │ │ | │
│ \. │ │ | │
│ '. │ │ | │
L │ '. │ L' │ | │
│ \. │ │ | │
│ '. │ │ | │
0 │ \.____________│ -1 │------------┘ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷL1HingeLoss
LossFunctions.L1HingeLoss — TypeL1HingeLoss <: MarginLossThe hinge loss linearly penalizes every predicition where the resulting agreement < 1 . It is Lipschitz continuous and convex, but not strictly convex.
\[L(a) = \max \{ 0, 1 - a \}\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
3 │'\. │ 0 │ ┌------│
│ ''_ │ │ | │
│ \. │ │ | │
│ '. │ │ | │
L │ ''_ │ L' │ | │
│ \. │ │ | │
│ '. │ │ | │
0 │ ''_______│ -1 │------------------┘ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷSmoothedL1HingeLoss
LossFunctions.SmoothedL1HingeLoss — TypeSmoothedL1HingeLoss <: MarginLossAs the name suggests a smoothed version of the L1 hinge loss. It is Lipschitz continuous and convex, but not strictly convex.
\[L(a) = \begin{cases} \frac{0.5}{\gamma} \cdot \max \{ 0, 1 - a \} ^2 & \quad \text{if } a \ge 1 - \gamma \\ 1 - \frac{\gamma}{2} - a & \quad \text{otherwise}\\ \end{cases}\]
Lossfunction (γ=2) Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
2 │\. │ 0 │ ,r------│
│ '. │ │ ./' │
│ \. │ │ ,/ │
│ '. │ │ ./' │
L │ '. │ L' │ ,' │
│ \. │ │ ,/ │
│ ', │ │ ./' │
0 │ '*-._________│ -1 │______./ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷModifiedHuberLoss
LossFunctions.ModifiedHuberLoss — TypeModifiedHuberLoss <: MarginLossA special (4 times scaled) case of the SmoothedL1HingeLoss with γ=2. It is Lipschitz continuous and convex, but not strictly convex.
\[L(a) = \begin{cases} \max \{ 0, 1 - a \} ^2 & \quad \text{if } a \ge -1 \\ - 4 a & \quad \text{otherwise}\\ \end{cases}\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
5 │ '. │ 0 │ .+-------│
│ '. │ │ ./' │
│ '\ │ │ ,/ │
│ \ │ │ ,/ │
L │ '. │ L' │ ./ │
│ '. │ │ ./' │
│ \. │ │______/' │
0 │ '-.________│ -5 │ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷDWDMarginLoss
LossFunctions.DWDMarginLoss — TypeDWDMarginLoss <: MarginLossThe distance weighted discrimination margin loss. It is a differentiable generalization of the L1HingeLoss that is different than the SmoothedL1HingeLoss. It is Lipschitz continuous and convex, but not strictly convex.
\[L(a) = \begin{cases} 1 - a & \quad \text{if } a \le \frac{q}{q+1} \\ \frac{1}{a^q} \frac{q^q}{(q+1)^{q+1}} & \quad \text{otherwise}\\ \end{cases}\]
Lossfunction (q=1) Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
2 │ ". │ 0 │ ._r-│
│ \. │ │ ./ │
│ ', │ │ ./ │
│ \. │ │ / │
L │ "\. │ L' │ . │
│ \. │ │ / │
│ ":__ │ │ ; │
0 │ '""---│ -1 │---------------┘ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷL2MarginLoss
LossFunctions.L2MarginLoss — TypeL2MarginLoss <: MarginLossThe margin-based least-squares loss for classification, which penalizes every prediction where agreement != 1 quadratically. It is locally Lipschitz continuous and strongly convex.
\[L(a) = {\left( 1 - a \right)}^2\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
5 │ . │ 2 │ ,r│
│ '. │ │ ,/ │
│ '\ │ │ ,/ │
│ \ │ ├ ,/ ┤
L │ '. │ L' │ ./ │
│ '. │ │ ./ │
│ \. .│ │ ./ │
0 │ '-.____.-' │ -3 │ ./ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷL2HingeLoss
LossFunctions.L2HingeLoss — TypeL2HingeLoss <: MarginLossThe truncated least squares loss quadratically penalizes every predicition where the resulting agreement < 1. It is locally Lipschitz continuous and convex, but not strictly convex.
\[L(a) = \max \{ 0, 1 - a \}^2\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
5 │ . │ 0 │ ,r------│
│ '. │ │ ,/ │
│ '\ │ │ ,/ │
│ \ │ │ ,/ │
L │ '. │ L' │ ./ │
│ '. │ │ ./ │
│ \. │ │ ./ │
0 │ '-.________│ -5 │ ./ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷLogitMarginLoss
LossFunctions.LogitMarginLoss — TypeLogitMarginLoss <: MarginLossThe margin version of the logistic loss. It is infinitely many times differentiable, strictly convex, and Lipschitz continuous.
\[L(a) = \ln (1 + e^{-a})\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
2 │ \. │ 0 │ ._--/""│
│ \. │ │ ../' │
│ \. │ │ ./ │
│ \.. │ │ ./' │
L │ '-_ │ L' │ .,' │
│ '-_ │ │ ./ │
│ '\-._ │ │ .,/' │
0 │ '""*-│ -1 │__.--'' │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -4 4
y ⋅ ŷ y ⋅ ŷExpLoss
LossFunctions.ExpLoss — TypeExpLoss <: MarginLossThe margin-based exponential loss for classification, which penalizes every prediction exponentially. It is infinitely many times differentiable, locally Lipschitz continuous and strictly convex, but not clipable.
\[L(a) = e^{-a}\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
5 │ \. │ 0 │ _,,---:'""│
│ l │ │ _r/"' │
│ l. │ │ .r/' │
│ ": │ │ .r' │
L │ \. │ L' │ ./ │
│ "\.. │ │ .' │
│ '":,_ │ │ ,' │
0 │ ""---:.__│ -5 │ ./ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷSigmoidLoss
LossFunctions.SigmoidLoss — TypeSigmoidLoss <: MarginLossContinuous loss which penalizes every prediction with a loss within in the range (0,2). It is infinitely many times differentiable, Lipschitz continuous but nonconvex.
\[L(a) = 1 - \tanh(a)\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
2 │""'--,. │ 0 │.. ..│
│ '\. │ │ "\. ./" │
│ '. │ │ ', ,' │
│ \. │ │ \ / │
L │ "\. │ L' │ \ / │
│ \. │ │ \. ./ │
│ \, │ │ \. ./ │
0 │ '"-:.__│ -1 │ ',_,' │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ