Margin-based Losses

Margin-based loss functions are particularly useful for binary classification. In contrast to the distance-based losses, these do not care about the difference between true target and prediction. Instead they penalize predictions based on how well they agree with the sign of the target.

This section lists all the subtypes of MarginLoss that are implemented in this package.

ZeroOneLoss

LossFunctions.ZeroOneLoss — Type.

ZeroOneLoss <: MarginLoss

The classical classification loss. It penalizes every misclassified observation with a loss of 1 while every correctly classified observation has a loss of 0. It is not convex nor continuous and thus seldom used directly. Instead one usually works with some classification-calibrated surrogate loss, such as L1HingeLoss.

\[L(a) = \begin{cases} 1 & \quad \text{if } a < 0 \\ 0 & \quad \text{if } a >= 0\\ \end{cases}\]

              Lossfunction                     Derivative
      ┌────────────┬────────────┐      ┌────────────┬────────────┐
    1 │------------┐            │    1 │                         │
      │            |            │      │                         │
      │            |            │      │                         │
      │            |            │      │_________________________│
      │            |            │      │                         │
      │            |            │      │                         │
      │            |            │      │                         │
    0 │            └------------│   -1 │                         │
      └────────────┴────────────┘      └────────────┴────────────┘
      -2                        2      -2                        2
                y * h(x)                         y * h(x)

source

Lossfunction	Derivative

$L(a) = \begin{cases} 1 & \quad \text{if } a < 0 \\ 0 & \quad \text{otherwise}\\ \end{cases}$	$L'(a) = 0$

PerceptronLoss

LossFunctions.PerceptronLoss — Type.

PerceptronLoss <: MarginLoss

The perceptron loss linearly penalizes every prediction where the resulting agreement <= 0. It is Lipschitz continuous and convex, but not strictly convex.

\[L(a) = \max \{ 0, -a \}\]

              Lossfunction                     Derivative
      ┌────────────┬────────────┐      ┌────────────┬────────────┐
    2 │\.                       │    0 │            ┌------------│
      │ '..                     │      │            |            │
      │   \.                    │      │            |            │
      │     '.                  │      │            |            │
    L │      '.                 │   L' │            |            │
      │        \.               │      │            |            │
      │         '.              │      │            |            │
    0 │           \.____________│   -1 │------------┘            │
      └────────────┴────────────┘      └────────────┴────────────┘
      -2                        2      -2                        2
                 y ⋅ ŷ                            y ⋅ ŷ

source

Lossfunction	Derivative

$L(a) = \max \{ 0, - a \}$	$L'(a) = \begin{cases} -1 & \quad \text{if } a < 0 \\ 0 & \quad \text{otherwise}\\ \end{cases}$

L1HingeLoss

LossFunctions.L1HingeLoss — Type.

L1HingeLoss <: MarginLoss

The hinge loss linearly penalizes every predicition where the resulting agreement < 1 . It is Lipschitz continuous and convex, but not strictly convex.

\[L(a) = \max \{ 0, 1 - a \}\]

              Lossfunction                     Derivative
      ┌────────────┬────────────┐      ┌────────────┬────────────┐
    3 │'\.                      │    0 │                  ┌------│
      │  ''_                    │      │                  |      │
      │     \.                  │      │                  |      │
      │       '.                │      │                  |      │
    L │         ''_             │   L' │                  |      │
      │            \.           │      │                  |      │
      │              '.         │      │                  |      │
    0 │                ''_______│   -1 │------------------┘      │
      └────────────┴────────────┘      └────────────┴────────────┘
      -2                        2      -2                        2
                 y ⋅ ŷ                            y ⋅ ŷ

source

Lossfunction	Derivative

$L(a) = \max \{ 0, 1 - a \}$	$L'(a) = \begin{cases} -1 & \quad \text{if } a < 1 \\ 0 & \quad \text{otherwise}\\ \end{cases}$

SmoothedL1HingeLoss

LossFunctions.SmoothedL1HingeLoss — Type.

SmoothedL1HingeLoss <: MarginLoss

As the name suggests a smoothed version of the L1 hinge loss. It is Lipschitz continuous and convex, but not strictly convex.

\[L(a) = \begin{cases} \frac{0.5}{\gamma} \cdot \max \{ 0, 1 - a \} ^2 & \quad \text{if } a \ge 1 - \gamma \\ 1 - \frac{\gamma}{2} - a & \quad \text{otherwise}\\ \end{cases}\]

              Lossfunction (γ=2)               Derivative
      ┌────────────┬────────────┐      ┌────────────┬────────────┐
    2 │\.                       │    0 │                 ,r------│
      │ '.                      │      │               ./'       │
      │   \.                    │      │              ,/         │
      │     '.                  │      │            ./'          │
    L │      '.                 │   L' │           ,'            │
      │        \.               │      │         ,/              │
      │          ',             │      │       ./'               │
    0 │            '*-._________│   -1 │______./                 │
      └────────────┴────────────┘      └────────────┴────────────┘
      -2                        2      -2                        2
                 y ⋅ ŷ                            y ⋅ ŷ

source

Lossfunction	Derivative

$L(a) = \begin{cases} \frac{1}{2 \gamma} \cdot \max \{ 0, 1 - a \} ^2 & \quad \text{if } a \ge 1 - \gamma \\ 1 - \frac{\gamma}{2} - a & \quad \text{otherwise}\\ \end{cases}$	$L'(a) = \begin{cases} - \frac{1}{\gamma} \cdot \max \{ 0, 1 - a \} & \quad \text{if } a \ge 1 - \gamma \\ - 1 & \quad \text{otherwise}\\ \end{cases}$

ModifiedHuberLoss

LossFunctions.ModifiedHuberLoss — Type.

ModifiedHuberLoss <: MarginLoss

A special (4 times scaled) case of the SmoothedL1HingeLoss with γ=2. It is Lipschitz continuous and convex, but not strictly convex.

\[L(a) = \begin{cases} \max \{ 0, 1 - a \} ^2 & \quad \text{if } a \ge -1 \\ - 4 a & \quad \text{otherwise}\\ \end{cases}\]

              Lossfunction                     Derivative
      ┌────────────┬────────────┐      ┌────────────┬────────────┐
    5 │    '.                   │    0 │                .+-------│
      │     '.                  │      │              ./'        │
      │      '\                 │      │             ,/          │
      │        \                │      │           ,/            │
    L │         '.              │   L' │         ./              │
      │          '.             │      │       ./'               │
      │            \.           │      │______/'                 │
    0 │              '-.________│   -5 │                         │
      └────────────┴────────────┘      └────────────┴────────────┘
      -2                        2      -2                        2
                 y ⋅ ŷ                            y ⋅ ŷ

source

Lossfunction	Derivative

$L(a) = \begin{cases} \max \{ 0, 1 - a \} ^2 & \quad \text{if } a \ge -1 \\ - 4 a & \quad \text{otherwise}\\ \end{cases}$	$L'(a) = \begin{cases} - 2 \cdot \max \{ 0, 1 - a \} & \quad \text{if } a \ge -1 \\ - 4 & \quad \text{otherwise}\\ \end{cases}$

DWDMarginLoss

LossFunctions.DWDMarginLoss — Type.

DWDMarginLoss <: MarginLoss

The distance weighted discrimination margin loss. It is a differentiable generalization of the L1HingeLoss that is different than the SmoothedL1HingeLoss. It is Lipschitz continuous and convex, but not strictly convex.

\[L(a) = \begin{cases} 1 - a & \quad \text{if } a \ge \frac{q}{q+1} \\ \frac{1}{a^q} \frac{q^q}{(q+1)^{q+1}} & \quad \text{otherwise}\\ \end{cases}\]

              Lossfunction (q=1)               Derivative
      ┌────────────┬────────────┐      ┌────────────┬────────────┐
    2 │      ".                 │    0 │                     ._r-│
      │        \.               │      │                   ./    │
      │         ',              │      │                 ./      │
      │           \.            │      │                 /       │
    L │            "\.          │   L' │                .        │
      │              \.         │      │                /        │
      │               ":__      │      │               ;         │
    0 │                   '""---│   -1 │---------------┘         │
      └────────────┴────────────┘      └────────────┴────────────┘
      -2                        2      -2                        2
                 y ⋅ ŷ                            y ⋅ ŷ

source

Lossfunction	Derivative

$L(a) = \begin{cases} 1 - a & \quad \text{if } a \le \frac{q}{q+1} \\ \frac{1}{a^q} \frac{q^q}{(q+1)^{q+1}} & \quad \text{otherwise}\\ \end{cases}$	$L'(a) = \begin{cases} - 1 & \quad \text{if } a \le \frac{q}{q+1} \\ - \frac{1}{a^{q+1}} \left( \frac{q}{q+1} \right)^{q+1} & \quad \text{otherwise}\\ \end{cases}$

L2MarginLoss

LossFunctions.L2MarginLoss — Type.

L2MarginLoss <: MarginLoss

The margin-based least-squares loss for classification, which penalizes every prediction where agreement != 1 quadratically. It is locally Lipschitz continuous and strongly convex.

\[L(a) = {\left( 1 - a \right)}^2\]

              Lossfunction                     Derivative
      ┌────────────┬────────────┐      ┌────────────┬────────────┐
    5 │     .                   │    2 │                       ,r│
      │     '.                  │      │                     ,/  │
      │      '\                 │      │                   ,/    │
      │        \                │      ├                 ,/      ┤
    L │         '.              │   L' │               ./        │
      │          '.             │      │             ./          │
      │            \.          .│      │           ./            │
    0 │              '-.____.-' │   -3 │         ./              │
      └────────────┴────────────┘      └────────────┴────────────┘
      -2                        2      -2                        2
                 y ⋅ ŷ                            y ⋅ ŷ

source

Lossfunction	Derivative

$L(a) = {\left( 1 - a \right)}^2$	$L'(a) = 2 \left( a - 1 \right)$

L2HingeLoss

LossFunctions.L2HingeLoss — Type.

L2HingeLoss <: MarginLoss

The truncated least squares loss quadratically penalizes every predicition where the resulting agreement < 1. It is locally Lipschitz continuous and convex, but not strictly convex.

\[L(a) = \max \{ 0, 1 - a \}^2\]

              Lossfunction                     Derivative
      ┌────────────┬────────────┐      ┌────────────┬────────────┐
    5 │     .                   │    0 │                 ,r------│
      │     '.                  │      │               ,/        │
      │      '\                 │      │             ,/          │
      │        \                │      │           ,/            │
    L │         '.              │   L' │         ./              │
      │          '.             │      │       ./                │
      │            \.           │      │     ./                  │
    0 │              '-.________│   -5 │   ./                    │
      └────────────┴────────────┘      └────────────┴────────────┘
      -2                        2      -2                        2
                 y ⋅ ŷ                            y ⋅ ŷ

source

Lossfunction	Derivative

$L(a) = \max \{ 0, 1 - a \} ^2$	$L'(a) = \begin{cases} 2 \left( a - 1 \right) & \quad \text{if } a < 1 \\ 0 & \quad \text{otherwise}\\ \end{cases}$

LogitMarginLoss

LossFunctions.LogitMarginLoss — Type.

LogitMarginLoss <: MarginLoss

The margin version of the logistic loss. It is infinitely many times differentiable, strictly convex, and Lipschitz continuous.

\[L(a) = \ln (1 + e^{-a})\]

              Lossfunction                     Derivative
      ┌────────────┬────────────┐      ┌────────────┬────────────┐
    2 │ \.                      │    0 │                  ._--/""│
      │   \.                    │      │               ../'      │
      │     \.                  │      │              ./         │
      │       \..               │      │            ./'          │
    L │         '-_             │   L' │          .,'            │
      │            '-_          │      │         ./              │
      │               '\-._     │      │      .,/'               │
    0 │                    '""*-│   -1 │__.--''                  │
      └────────────┴────────────┘      └────────────┴────────────┘
      -2                        2      -4                        4
                 y ⋅ ŷ                            y ⋅ ŷ

source

Lossfunction	Derivative

$L(a) = \ln (1 + e^{-a})$	$L'(a) = - \frac{1}{1 + e^a}$

ExpLoss

LossFunctions.ExpLoss — Type.

ExpLoss <: MarginLoss

The margin-based exponential loss for classification, which penalizes every prediction exponentially. It is infinitely many times differentiable, locally Lipschitz continuous and strictly convex, but not clipable.

\[L(a) = e^{-a}\]

              Lossfunction                     Derivative
      ┌────────────┬────────────┐      ┌────────────┬────────────┐
    5 │  \.                     │    0 │               _,,---:'""│
      │   l                     │      │           _r/"'         │
      │    l.                   │      │        .r/'             │
      │     ":                  │      │      .r'                │
    L │       \.                │   L' │     ./                  │
      │        "\..             │      │    .'                   │
      │           '":,_         │      │   ,'                    │
    0 │                ""---:.__│   -5 │  ./                     │
      └────────────┴────────────┘      └────────────┴────────────┘
      -2                        2      -2                        2
                 y ⋅ ŷ                            y ⋅ ŷ

source

Lossfunction	Derivative

$L(a) = e^{-a}$	$L'(a) = - e^{-a}$

SigmoidLoss

LossFunctions.SigmoidLoss — Type.

SigmoidLoss <: MarginLoss

Continuous loss which penalizes every prediction with a loss within in the range (0,2). It is infinitely many times differentiable, Lipschitz continuous but nonconvex.

\[L(a) = 1 - \tanh(a)\]

              Lossfunction                     Derivative
      ┌────────────┬────────────┐      ┌────────────┬────────────┐
    2 │""'--,.                  │    0 │..                     ..│
      │      '\.                │      │ "\.                 ./" │
      │         '.              │      │    ',             ,'    │
      │           \.            │      │      \           /      │
    L │            "\.          │   L' │       \         /       │
      │              \.         │      │        \.     ./        │
      │                \,       │      │         \.   ./         │
    0 │                  '"-:.__│   -1 │          ',_,'          │
      └────────────┴────────────┘      └────────────┴────────────┘
      -2                        2      -2                        2
                 y ⋅ ŷ                            y ⋅ ŷ

source

Lossfunction	Derivative

$L(a) = 1 - \tanh(a)$	$L'(a) = - \textrm{sech}^2 (a)$