Margin-based Losses
Margin-based loss functions are particularly useful for binary classification. In contrast to the distance-based losses, these do not care about the difference between true target and prediction. Instead they penalize predictions based on how well they agree with the sign of the target.
This section lists all the subtypes of MarginLoss that are implemented in this package.
ZeroOneLoss
LossFunctions.ZeroOneLoss — Type.ZeroOneLoss <: MarginLossThe classical classification loss. It penalizes every misclassified observation with a loss of 1 while every correctly classified observation has a loss of 0. It is not convex nor continuous and thus seldom used directly. Instead one usually works with some classification-calibrated surrogate loss, such as L1HingeLoss.
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
1 │------------┐ │ 1 │ │
│ | │ │ │
│ | │ │ │
│ | │ │_________________________│
│ | │ │ │
│ | │ │ │
│ | │ │ │
0 │ └------------│ -1 │ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y * h(x) y * h(x)| Lossfunction | Derivative |
|---|---|
| $L(a) = \begin{cases} 1 & \quad \text{if } a < 0 \\ 0 & \quad \text{otherwise}\\ \end{cases}$ | $L'(a) = 0$ |
PerceptronLoss
LossFunctions.PerceptronLoss — Type.PerceptronLoss <: MarginLossThe perceptron loss linearly penalizes every prediction where the resulting agreement <= 0. It is Lipschitz continuous and convex, but not strictly convex.
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
2 │\. │ 0 │ ┌------------│
│ '.. │ │ | │
│ \. │ │ | │
│ '. │ │ | │
L │ '. │ L' │ | │
│ \. │ │ | │
│ '. │ │ | │
0 │ \.____________│ -1 │------------┘ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ| Lossfunction | Derivative |
|---|---|
| $L(a) = \max \{ 0, - a \}$ | $L'(a) = \begin{cases} -1 & \quad \text{if } a < 0 \\ 0 & \quad \text{otherwise}\\ \end{cases}$ |
L1HingeLoss
LossFunctions.L1HingeLoss — Type.L1HingeLoss <: MarginLossThe hinge loss linearly penalizes every predicition where the resulting agreement < 1 . It is Lipschitz continuous and convex, but not strictly convex.
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
3 │'\. │ 0 │ ┌------│
│ ''_ │ │ | │
│ \. │ │ | │
│ '. │ │ | │
L │ ''_ │ L' │ | │
│ \. │ │ | │
│ '. │ │ | │
0 │ ''_______│ -1 │------------------┘ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ| Lossfunction | Derivative |
|---|---|
| $L(a) = \max \{ 0, 1 - a \}$ | $L'(a) = \begin{cases} -1 & \quad \text{if } a < 1 \\ 0 & \quad \text{otherwise}\\ \end{cases}$ |
SmoothedL1HingeLoss
SmoothedL1HingeLoss <: MarginLossAs the name suggests a smoothed version of the L1 hinge loss. It is Lipschitz continuous and convex, but not strictly convex.
Lossfunction (γ=2) Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
2 │\. │ 0 │ ,r------│
│ '. │ │ ./' │
│ \. │ │ ,/ │
│ '. │ │ ./' │
L │ '. │ L' │ ,' │
│ \. │ │ ,/ │
│ ', │ │ ./' │
0 │ '*-._________│ -1 │______./ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ| Lossfunction | Derivative |
|---|---|
| $L(a) = \begin{cases} \frac{1}{2 \gamma} \cdot \max \{ 0, 1 - a \} ^2 & \quad \text{if } a \ge 1 - \gamma \\ 1 - \frac{\gamma}{2} - a & \quad \text{otherwise}\\ \end{cases}$ | $L'(a) = \begin{cases} - \frac{1}{\gamma} \cdot \max \{ 0, 1 - a \} & \quad \text{if } a \ge 1 - \gamma \\ - 1 & \quad \text{otherwise}\\ \end{cases}$ |
ModifiedHuberLoss
LossFunctions.ModifiedHuberLoss — Type.ModifiedHuberLoss <: MarginLossA special (4 times scaled) case of the SmoothedL1HingeLoss with γ=2. It is Lipschitz continuous and convex, but not strictly convex.
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
5 │ '. │ 0 │ .+-------│
│ '. │ │ ./' │
│ '\ │ │ ,/ │
│ \ │ │ ,/ │
L │ '. │ L' │ ./ │
│ '. │ │ ./' │
│ \. │ │______/' │
0 │ '-.________│ -5 │ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ| Lossfunction | Derivative |
|---|---|
| $L(a) = \begin{cases} \max \{ 0, 1 - a \} ^2 & \quad \text{if } a \ge -1 \\ - 4 a & \quad \text{otherwise}\\ \end{cases}$ | $L'(a) = \begin{cases} - 2 \cdot \max \{ 0, 1 - a \} & \quad \text{if } a \ge -1 \\ - 4 & \quad \text{otherwise}\\ \end{cases}$ |
DWDMarginLoss
LossFunctions.DWDMarginLoss — Type.DWDMarginLoss <: MarginLossThe distance weighted discrimination margin loss. It is a differentiable generalization of the L1HingeLoss that is different than the SmoothedL1HingeLoss. It is Lipschitz continuous and convex, but not strictly convex.
Lossfunction (q=1) Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
2 │ ". │ 0 │ ._r-│
│ \. │ │ ./ │
│ ', │ │ ./ │
│ \. │ │ / │
L │ "\. │ L' │ . │
│ \. │ │ / │
│ ":__ │ │ ; │
0 │ '""---│ -1 │---------------┘ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ| Lossfunction | Derivative |
|---|---|
| $L(a) = \begin{cases} 1 - a & \quad \text{if } a \le \frac{q}{q+1} \\ \frac{1}{a^q} \frac{q^q}{(q+1)^{q+1}} & \quad \text{otherwise}\\ \end{cases}$ | $L'(a) = \begin{cases} - 1 & \quad \text{if } a \le \frac{q}{q+1} \\ - \frac{1}{a^{q+1}} \left( \frac{q}{q+1} \right)^{q+1} & \quad \text{otherwise}\\ \end{cases}$ |
L2MarginLoss
LossFunctions.L2MarginLoss — Type.L2MarginLoss <: MarginLossThe margin-based least-squares loss for classification, which penalizes every prediction where agreement != 1 quadratically. It is locally Lipschitz continuous and strongly convex.
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
5 │ . │ 2 │ ,r│
│ '. │ │ ,/ │
│ '\ │ │ ,/ │
│ \ │ ├ ,/ ┤
L │ '. │ L' │ ./ │
│ '. │ │ ./ │
│ \. .│ │ ./ │
0 │ '-.____.-' │ -3 │ ./ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ| Lossfunction | Derivative |
|---|---|
| $L(a) = {\left( 1 - a \right)}^2$ | $L'(a) = 2 \left( a - 1 \right)$ |
L2HingeLoss
LossFunctions.L2HingeLoss — Type.L2HingeLoss <: MarginLossThe truncated least squares loss quadratically penalizes every predicition where the resulting agreement < 1. It is locally Lipschitz continuous and convex, but not strictly convex.
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
5 │ . │ 0 │ ,r------│
│ '. │ │ ,/ │
│ '\ │ │ ,/ │
│ \ │ │ ,/ │
L │ '. │ L' │ ./ │
│ '. │ │ ./ │
│ \. │ │ ./ │
0 │ '-.________│ -5 │ ./ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ| Lossfunction | Derivative |
|---|---|
| $L(a) = \max \{ 0, 1 - a \} ^2$ | $L'(a) = \begin{cases} 2 \left( a - 1 \right) & \quad \text{if } a < 1 \\ 0 & \quad \text{otherwise}\\ \end{cases}$ |
LogitMarginLoss
LossFunctions.LogitMarginLoss — Type.LogitMarginLoss <: MarginLossThe margin version of the logistic loss. It is infinitely many times differentiable, strictly convex, and Lipschitz continuous.
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
2 │ \. │ 0 │ ._--/""│
│ \. │ │ ../' │
│ \. │ │ ./ │
│ \.. │ │ ./' │
L │ '-_ │ L' │ .,' │
│ '-_ │ │ ./ │
│ '\-._ │ │ .,/' │
0 │ '""*-│ -1 │__.--'' │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -4 4
y ⋅ ŷ y ⋅ ŷ| Lossfunction | Derivative |
|---|---|
| $L(a) = \ln (1 + e^{-a})$ | $L'(a) = - \frac{1}{1 + e^a}$ |
ExpLoss
LossFunctions.ExpLoss — Type.ExpLoss <: MarginLossThe margin-based exponential loss for classification, which penalizes every prediction exponentially. It is infinitely many times differentiable, locally Lipschitz continuous and strictly convex, but not clipable.
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
5 │ \. │ 0 │ _,,---:'""│
│ l │ │ _r/"' │
│ l. │ │ .r/' │
│ ": │ │ .r' │
L │ \. │ L' │ ./ │
│ "\.. │ │ .' │
│ '":,_ │ │ ,' │
0 │ ""---:.__│ -5 │ ./ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ| Lossfunction | Derivative |
|---|---|
| $L(a) = e^{-a}$ | $L'(a) = - e^{-a}$ |
SigmoidLoss
LossFunctions.SigmoidLoss — Type.SigmoidLoss <: MarginLossContinuous loss which penalizes every prediction with a loss within in the range (0,2). It is infinitely many times differentiable, Lipschitz continuous but nonconvex.
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
2 │""'--,. │ 0 │.. ..│
│ '\. │ │ "\. ./" │
│ '. │ │ ', ,' │
│ \. │ │ \ / │
L │ "\. │ L' │ \ / │
│ \. │ │ \. ./ │
│ \, │ │ \. ./ │
0 │ '"-:.__│ -1 │ ',_,' │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ| Lossfunction | Derivative |
|---|---|
| $L(a) = 1 - \tanh(a)$ | $L'(a) = - \textrm{sech}^2 (a)$ |