Working with Losses
Even though they are called loss "functions", this package implements them as immutable types instead of true Julia functions. There are good reasons for that. For example it allows us to specify the properties of losse functions explicitly (e.g. isconvex(myloss)
). It also makes for a more consistent API when it comes to computing the value or the derivative. Some loss functions even have additional parameters that need to be specified, such as the $\epsilon$ in the case of the $\epsilon$-insensitive loss. Here, types allow for member variables to hide that information away from the method signatures.
In order to avoid potential confusions with true Julia functions, we will refer to "loss functions" as "losses" instead. The available losses share a common interface for the most part. This section will provide an overview of the basic functionality that is available for all the different types of losses. We will discuss how to create a loss, how to compute its value and derivative, and how to query its properties.
Instantiating a Loss
Losses are immutable types. As such, one has to instantiate one in order to work with it. For most losses, the constructors do not expect any parameters.
julia> L2DistLoss()
LPDistLoss{2}()
julia> HingeLoss()
L1HingeLoss()
We just said that we need to instantiate a loss in order to work with it. One could be inclined to belief, that it would be more memory-efficient to "pre-allocate" a loss when using it in more than one place.
julia> loss = L2DistLoss()
LPDistLoss{2}()
julia> value(loss, 2, 3)
1
However, that is a common oversimplification. Because all losses are immutable types, they can live on the stack and thus do not come with a heap-allocation overhead.
Even more interesting in the example above, is that for such losses as L2DistLoss
, which do not have any constructor parameters or member variables, there is no additional code executed at all. Such singletons are only used for dispatch and don't even produce any additional code, which you can observe for yourself in the code below. As such they are zero-cost abstractions.
julia> v1(loss,t,y) = value(loss,t,y)
julia> v2(t,y) = value(L2DistLoss(),t,y)
julia> @code_llvm v1(loss, 2, 3)
define i64 @julia_v1_70944(i64, i64) #0 {
top:
%2 = sub i64 %1, %0
%3 = mul i64 %2, %2
ret i64 %3
}
julia> @code_llvm v2(2, 3)
define i64 @julia_v2_70949(i64, i64) #0 {
top:
%2 = sub i64 %1, %0
%3 = mul i64 %2, %2
ret i64 %3
}
On the other hand, some types of losses are actually more comparable to whole families of losses instead of just a single one. For example, the immutable type L1EpsilonInsLoss
has a free parameter $\epsilon$. Each concrete $\epsilon$ results in a different concrete loss of the same family of epsilon-insensitive losses.
julia> L1EpsilonInsLoss(0.5)
L1EpsilonInsLoss{Float64}(0.5)
julia> L1EpsilonInsLoss(1)
L1EpsilonInsLoss{Float64}(1.0)
For such losses that do have parameters, it can make a slight difference to pre-instantiate a loss. While they will live on the stack, the constructor usually performs some assertions and conversion for the given parameter. This can come at a slight overhead. At the very least it will not produce the same exact code when pre-instantiated. Still, the fact that they are immutable makes them very efficient abstractions with little to no performance overhead, and zero memory allocations on the heap.
Computing the Values
The first thing we may want to do is compute the loss for some observation (singular). In fact, all losses are implemented on single observations under the hood. The core function to compute the value of a loss is value
. We will see throughout the documentation that this function allows for a lot of different method signatures to accomplish a variety of tasks.
LearnBase.value
— Method.value(loss, target::Number, output::Number) -> Number
Compute the (non-negative) numeric result for the loss-function denoted by the parameter loss
and return it. Note that target
and output
can be of different numeric type, in which case promotion is performed in the manner appropriate for the given loss.
Note: This function should always be type-stable. If it isn't, you likely found a bug.
Arguments
loss::SupervisedLoss
: The loss-function $L$ we want to compute the value with.target::Number
: The ground truth $y \in Y$ of the observation.output::Number
: The predicted output $\hat{y} \in \mathbb{R}$ for the observation.
Examples
# loss y ŷ
julia> value(L1DistLoss(), 1.0, 2.0)
1.0
julia> value(L1DistLoss(), 1, 2)
1
julia> value(L1HingeLoss(), -1, 2)
3
julia> value(L1HingeLoss(), -1f0, 2f0)
3.0f0
It may be interesting to note, that this function also supports broadcasting and all the syntax benefits that come with it. Thus, it is quite simple to make use of preallocated memory for storing the element-wise results.
julia> value.(L1DistLoss(), [1,2,3], [2,5,-2])
3-element Array{Int64,1}:
1
3
5
julia> buffer = zeros(3); # preallocate a buffer
julia> buffer .= value.(L1DistLoss(), [1.,2,3], [2,5,-2])
3-element Array{Float64,1}:
1.0
3.0
5.0
Furthermore, with the loop fusion changes that were introduced in Julia 0.6, one can also easily weight the influence of each observation without allocating a temporary array.
julia> buffer .= value.(L1DistLoss(), [1.,2,3], [2,5,-2]) .* [2,1,0.5]
3-element Array{Float64,1}:
2.0
3.0
2.5
Even though broadcasting is supported, we do expose a vectorized method natively. This is done mainly for API consistency reasons. Internally it even uses broadcast itself, but it does provide the additional benefit of a more reliable type-inference.
LearnBase.value
— Method.value(loss, targets::AbstractArray, outputs::AbstractArray)
Compute the result of the loss function for each index-pair in targets
and outputs
individually and return the result as an array of the appropriate size.
In the case that the two parameters are arrays with a different number of dimensions, broadcast will be performed. Note that the given parameters are expected to have the same size in the dimensions they share.
Note: This function should always be type-stable. If it isn't, you likely found a bug.
Arguments
loss::SupervisedLoss
: The loss-function $L$ we are working with.targets::AbstractArray
: The array of ground truths $\mathbf{y}$.outputs::AbstractArray
: The array of predicted outputs $\mathbf{\hat{y}}$.
Examples
julia> value(L2DistLoss(), [1.0, 2.0, 3.0], [2, 5, -2])
3-element Array{Float64,1}:
1.0
9.0
25.0
We also provide a mutating version for the same reasons. It even utilizes broadcast!
underneath.
LearnBase.value!
— Method.value!(buffer::AbstractArray, loss, targets::AbstractArray, outputs::AbstractArray) -> buffer
Compute the result of the loss function for each index-pair in targets
and outputs
individually, and store them in the preallocated buffer
. Note that buffer
has to be of the appropriate size.
In the case that the two parameters, targets
and outputs
, are arrays with a different number of dimensions, broadcast will be performed. Note that the given parameters are expected to have the same size in the dimensions they share.
Note: This function should always be type-stable. If it isn't, you likely found a bug.
Arguments
buffer::AbstractArray
: Array to store the computed values in. Old values will be overwritten and lost.loss::SupervisedLoss
: The loss-function $L$ we are working with.targets::AbstractArray
: The array of ground truths $\mathbf{y}$.outputs::AbstractArray
: The array of predicted outputs $\mathbf{\hat{y}}$.
Examples
julia> buffer = zeros(3); # preallocate a buffer
julia> value!(buffer, L2DistLoss(), [1.0, 2.0, 3.0], [2, 5, -2])
3-element Array{Float64,1}:
1.0
9.0
25.0
Computing the 1st Derivatives
Maybe the more interesting aspect of loss functions are their derivatives. In fact, most of the popular learning algorithm in Supervised Learning, such as gradient descent, utilize the derivatives of the loss in one way or the other during the training process.
To compute the derivative of some loss we expose the function deriv
. It supports the same exact method signatures as value
. It may be interesting to note explicitly, that we always compute the derivative in respect to the predicted output
, since we are interested in deducing in which direction the output should change.
LearnBase.deriv
— Method.deriv(loss, target::Number, output::Number) -> Number
Compute the derivative for the loss-function (denoted by the parameter loss
) in respect to the output
. Note that target
and output
can be of different numeric type, in which case promotion is performed in the manner appropriate for the given loss.
Note: This function should always be type-stable. If it isn't, you likely found a bug.
Arguments
loss::SupervisedLoss
: The loss-function $L$ we want to compute the derivative with.target::Number
: The ground truth $y \in Y$ of the observation.output::Number
: The predicted output $\hat{y} \in \mathbb{R}$ for the observation.
Examples
# loss y ŷ
julia> deriv(L2DistLoss(), 1.0, 2.0)
2.0
julia> deriv(L2DistLoss(), 1, 2)
2
julia> deriv(L2HingeLoss(), -1, 2)
6
julia> deriv(L2HingeLoss(), -1f0, 2f0)
6.0f0
Similar to value
, this function also supports broadcasting and all the syntax benefits that come with it. Thus, one can make use of preallocated memory for storing the element-wise derivatives.
julia> deriv.(L2DistLoss(), [1,2,3], [2,5,-2])
3-element Array{Int64,1}:
2
6
-10
julia> buffer = zeros(3); # preallocate a buffer
julia> buffer .= deriv.(L2DistLoss(), [1.,2,3], [2,5,-2])
3-element Array{Float64,1}:
2.0
6.0
-10.0
Furthermore, with the loop fusion changes that were introduced in Julia 0.6, one can also easily weight the influence of each observation without allocating a temporary array.
julia> buffer .= deriv.(L2DistLoss(), [1.,2,3], [2,5,-2]) .* [2,1,0.5]
3-element Array{Float64,1}:
4.0
6.0
-5.0
While broadcast is supported, we do expose a vectorized method natively. This is done mainly for API consistency reasons. Internally it even uses broadcast itself, but it does provide the additional benefit of a more reliable type-inference.
LearnBase.deriv
— Method.deriv(loss, targets::AbstractArray, outputs::AbstractArray)
Compute the derivative of the loss function in respect to the output for each index-pair in targets
and outputs
individually and return the result as an array of the appropriate size.
In the case that the two parameters are arrays with a different number of dimensions, broadcast will be performed. Note that the given parameters are expected to have the same size in the dimensions they share.
Note: This function should always be type-stable. If it isn't, you likely found a bug.
Arguments
loss::SupervisedLoss
: The loss-function $L$ we are working with.targets::AbstractArray
: The array of ground truths $\mathbf{y}$.outputs::AbstractArray
: The array of predicted outputs $\mathbf{\hat{y}}$.
Examples
julia> deriv(L2DistLoss(), [1.0, 2.0, 3.0], [2, 5, -2])
3-element Array{Float64,1}:
2.0
6.0
-10.0
We also provide a mutating version for the same reasons. It even utilizes $broadcast!$ underneath.
LearnBase.deriv!
— Method.deriv!(buffer::AbstractArray, loss, targets::AbstractArray, outputs::AbstractArray) -> buffer
Compute the derivative of the loss function in respect to the output for each index-pair in targets
and outputs
individually, and store them in the preallocated buffer
. Note that buffer
has to be of the appropriate size.
In the case that the two parameters, targets
and outputs
, are arrays with a different number of dimensions, broadcast will be performed. Note that the given parameters are expected to have the same size in the dimensions they share.
Note: This function should always be type-stable. If it isn't, you likely found a bug.
Arguments
buffer::AbstractArray
: Array to store the computed values in. Old values will be overwritten and lost.loss::SupervisedLoss
: The loss-function $L$ we are working with.targets::AbstractArray
: The array of ground truths $\mathbf{y}$.outputs::AbstractArray
: The array of predicted outputs $\mathbf{\hat{y}}$.
Examples
julia> buffer = zeros(3); # preallocate a buffer
julia> deriv!(buffer, L2DistLoss(), [1.0, 2.0, 3.0], [2, 5, -2])
3-element Array{Float64,1}:
2.0
6.0
-10.0
It is also possible to compute the value and derivative at the same time. For some losses that means less computation overhead.
LearnBase.value_deriv
— Method.value_deriv(loss, target::Number, output::Number) -> Tuple
Return the results of value
and deriv
as a tuple, in which the first element is the value and the second element the derivative.
In some cases this function can yield better performance, because the losses can make use of shared variables when computing the results. Note that target
and output
can be of different numeric type, in which case promotion is performed in the manner appropriate for the given loss.
Note: This function should always be type-stable. If it isn't, you likely found a bug.
Arguments
loss::SupervisedLoss
: The loss-function $L$ we are working with.target::Number
: The ground truth $y \in Y$ of the observation.output::Number
: The predicted output $\hat{y} \in \mathbb{R}$
Examples
# loss y ŷ
julia> value_deriv(L2DistLoss(), -1.0, 3.0)
(16.0, 8.0)
Computing the 2nd Derivatives
Additionally to the first derivative, we also provide the corresponding methods for the second derivative through the function deriv2
. Note again, that we always compute the derivative in respect to the predicted output
.
LearnBase.deriv2
— Method.deriv2(loss, target::Number, output::Number) -> Number
Compute the second derivative for the loss-function (denoted by the parameter loss
) in respect to the output
. Note that target
and output
can be of different numeric type, in which case promotion is performed in the manner appropriate for the given loss.
Note: This function should always be type-stable. If it isn't, you likely found a bug.
Arguments
loss::SupervisedLoss
: The loss-function $L$ we want to compute the second derivative with.target::Number
: The ground truth $y \in Y$ of the observation.output::Number
: The predicted output $\hat{y} \in \mathbb{R}$ for the observation.
Examples
# loss y ŷ
julia> deriv2(LogitDistLoss(), -0.5, 0.3)
0.42781939304058886
julia> deriv2(LogitMarginLoss(), -1f0, 2f0)
0.104993574f0
Just like deriv
and value
, this function also supports broadcasting and all the syntax benefits that come with it. Thus, one can make use of preallocated memory for storing the element-wise derivatives.
julia> deriv2.(LogitDistLoss(), [-0.5, 1.2, 3], [0.3, 2.3, -2])
3-element Array{Float64,1}:
0.42781939304058886
0.3747397590950412
0.013296113341580313
julia> buffer = zeros(3); # preallocate a buffer
julia> buffer .= deriv2.(LogitDistLoss(), [-0.5, 1.2, 3], [0.3, 2.3, -2])
3-element Array{Float64,1}:
0.42781939304058886
0.3747397590950412
0.013296113341580313
Furthermore deriv2
supports all the same method signatures as deriv
does.
LearnBase.deriv2
— Method.deriv2(loss, targets::AbstractArray, outputs::AbstractArray)
Compute the second derivative of the loss function in respect to the output for each index-pair in targets
and outputs
individually and return the result as an array of the appropriate size.
In the case that the two parameters are arrays with a different number of dimensions, broadcast will be performed. Note that the given parameters are expected to have the same size in the dimensions they share.
Note: This function should always be type-stable. If it isn't, you likely found a bug.
Arguments
loss::SupervisedLoss
: The loss-function $L$ we are working with.targets::AbstractArray
: The array of ground truths $\mathbf{y}$.outputs::AbstractArray
: The array of predicted outputs $\mathbf{\hat{y}}$.
Examples
julia> deriv2(L2DistLoss(), [1.0, 2.0, 3.0], [2, 5, -2])
3-element Array{Float64,1}:
2.0
2.0
2.0
LossFunctions.deriv2!
— Method.deriv2!(buffer::AbstractArray, loss, targets::AbstractArray, outputs::AbstractArray) -> buffer
Compute the second derivative of the loss function in respect to the output for each index-pair in targets
and outputs
individually, and store them in the preallocated buffer
. Note that buffer
has to be of the appropriate size.
In the case that the two parameters, targets
and outputs
, are arrays with a different number of dimensions, broadcast will be performed. Note that the given parameters are expected to have the same size in the dimensions they share.
Note: This function should always be type-stable. If it isn't, you likely found a bug.
Arguments
buffer::AbstractArray
: Array to store the computed values in. Old values will be overwritten and lost.loss::SupervisedLoss
: The loss-function $L$ we are working with.targets::AbstractArray
: The array of ground truths $\mathbf{y}$.outputs::AbstractArray
: The array of predicted outputs $\mathbf{\hat{y}}$.
Examples
julia> buffer = zeros(3); # preallocate a buffer
julia> deriv2!(buffer, L2DistLoss(), [1.0, 2.0, 3.0], [2, 5, -2])
3-element Array{Float64,1}:
2.0
2.0
2.0
Function Closures
In some circumstances it may be convenient to have the loss function or its derivative as a proper Julia function. Instead of exporting special function names for every implemented loss (like l2distloss(...)
), we provide the ability to generate a true function on the fly for any given loss.
LossFunctions.value_fun
— Method.value_fun(loss::SupervisedLoss) -> Function
Returns a new function that computes the value
for the given loss
. This new function will support all the signatures that value
does.
julia> f = value_fun(L2DistLoss());
julia> f(-1.0, 3.0) # computes the value of L2DistLoss
16.0
julia> f.([1.,2], [4,7])
2-element Array{Float64,1}:
9.0
25.0
LossFunctions.deriv_fun
— Method.deriv_fun(loss::SupervisedLoss) -> Function
Returns a new function that computes the deriv
for the given loss
. This new function will support all the signatures that deriv
does.
julia> g = deriv_fun(L2DistLoss());
julia> g(-1.0, 3.0) # computes the deriv of L2DistLoss
8.0
julia> g.([1.,2], [4,7])
2-element Array{Float64,1}:
6.0
10.0
LossFunctions.deriv2_fun
— Method.deriv2_fun(loss::SupervisedLoss) -> Function
Returns a new function that computes the deriv2
(i.e. second derivative) for the given loss
. This new function will support all the signatures that deriv2
does.
julia> g2 = deriv2_fun(L2DistLoss());
julia> g2(-1.0, 3.0) # computes the second derivative of L2DistLoss
2.0
julia> g2.([1.,2], [4,7])
2-element Array{Float64,1}:
2.0
2.0
LossFunctions.value_deriv_fun
— Method.value_deriv_fun(loss::SupervisedLoss) -> Function
Returns a new function that computes the value_deriv
for the given loss
. This new function will support all the signatures that value_deriv
does.
julia> fg = value_deriv_fun(L2DistLoss());
julia> fg(-1.0, 3.0) # computes the second derivative of L2DistLoss
(16.0, 8.0)
Properties of a Loss
In some situations it can be quite useful to assert certain properties about a loss-function. One such scenario could be when implementing an algorithm that requires the loss to be strictly convex or Lipschitz continuous. Note that we will only skim over the defintions in most cases. A good treatment of all of the concepts involved can be found in either [BOYD2004] or [STEINWART2008].
Stephen Boyd and Lieven Vandenberghe. "Convex Optimization". Cambridge University Press, 2004.
Steinwart, Ingo, and Andreas Christmann. "Support vector machines". Springer Science & Business Media, 2008.
This package uses functions to represent individual properties of a loss. It follows a list of implemented property-functions defined in LearnBase.jl.
LearnBase.isconvex
— Function.isconvex(loss::SupervisedLoss) -> Bool
Return true
if the given loss
denotes a convex function. A function $f : \mathbb{R}^n \rightarrow \mathbb{R}$ is convex if its domain is a convex set and if for all $x, y$ in that domain, with $\theta$ such that for $0 \leq \theta \leq 1$, we have
Examples
julia> isconvex(LPDistLoss(0.5))
false
julia> isconvex(ZeroOneLoss())
false
julia> isconvex(L1DistLoss())
true
julia> isconvex(L2DistLoss())
true
LearnBase.isstrictlyconvex
— Function.isstrictlyconvex(loss::SupervisedLoss) -> Bool
Return true
if the given loss
denotes a strictly convex function. A function $f : \mathbb{R}^n \rightarrow \mathbb{R}$ is strictly convex if its domain is a convex set and if for all $x, y$ in that domain where $x \neq y$, with $\theta$ such that for $0 < \theta < 1$, we have
Examples
julia> isstrictlyconvex(L1DistLoss())
false
julia> isstrictlyconvex(LogitDistLoss())
true
julia> isstrictlyconvex(L2DistLoss())
true
LearnBase.isstronglyconvex
— Function.isstronglyconvex(loss::SupervisedLoss) -> Bool
Return true
if the given loss
denotes a strongly convex function. A function $f : \mathbb{R}^n \rightarrow \mathbb{R}$ is $m$-strongly convex if its domain is a convex set and if $\forall x,y \in$ dom $f$ where $x \neq y$, and $\theta$ such that for $0 \le \theta \le 1$ , we have
In a more familiar setting, if the loss function is differentiable we have
Examples
julia> isstronglyconvex(L1DistLoss())
false
julia> isstronglyconvex(LogitDistLoss())
false
julia> isstronglyconvex(L2DistLoss())
true
LearnBase.isdifferentiable
— Function.isdifferentiable(loss::SupervisedLoss, [x::Number]) -> Bool
Return true
if the given loss
is differentiable (optionally limited to the given point x
if specified).
A function $f : \mathbb{R}^{n} \rightarrow \mathbb{R}^{m}$ is differentiable at a point $x \in$ int dom $f$, if there exists a matrix $Df(x) \in \mathbb{R}^{m \times n}$ such that it satisfies:
A function is differentiable if its domain is open and it is differentiable at every point $x$.
Examples
julia> isdifferentiable(L1DistLoss())
false
julia> isdifferentiable(L1DistLoss(), 1)
true
julia> isdifferentiable(L2DistLoss())
true
LearnBase.istwicedifferentiable
— Function.istwicedifferentiable(loss::SupervisedLoss, [x::Number]) -> Bool
Return true
if the given loss
is differentiable (optionally limited to the given point x
if specified).
A function $f : \mathbb{R}^{n} \rightarrow \mathbb{R}$ is said to be twice differentiable at a point $x \in$ int dom $f$, if the function derivative for $\nabla f$ exists at $x$.
A function is twice differentiable if its domain is open and it is twice differentiable at every point $x$.
Examples
julia> isdifferentiable(L1DistLoss())
false
julia> isdifferentiable(L1DistLoss(), 1)
true
julia> isdifferentiable(L2DistLoss())
true
LearnBase.islocallylipschitzcont
— Function.islocallylipschitzcont(loss::SupervisedLoss) -> Bool
Return true
if the given loss
function is locally-Lipschitz continous.
A supervised loss $L : Y \times \mathbb{R} \rightarrow [0, \infty)$ is called locally Lipschitz continuous if $\forall a \ge 0$ there exists a constant :math:c_a \ge 0
, such that
Every convex function is locally lipschitz continuous
Examples
julia> islocallylipschitzcont(ExpLoss())
true
julia> islocallylipschitzcont(SigmoidLoss())
true
LearnBase.islipschitzcont
— Function.islipschitzcont(loss::SupervisedLoss) -> Bool
Return true
if the given loss
function is Lipschitz continuous.
A supervised loss function $L : Y \times \mathbb{R} \rightarrow [0, \infty)$ is Lipschitz continous, if there exists a finite constant $M < \infty$ such that
Examples
julia> islipschitzcont(SigmoidLoss())
true
julia> islipschitzcont(ExpLoss())
false
LearnBase.isnemitski
— Function.isnemitski(loss::SupervisedLoss) -> Bool
Return true
if the given loss
denotes a Nemitski loss function.
We call a supervised loss function $L : Y \times \mathbb{R} \rightarrow [0,\infty)$ a Nemitski loss if there exist a measurable function $b : Y \rightarrow [0, \infty)$ and an increasing function $h : [0, \infty) \rightarrow [0, \infty)$ such that
If a loss if locally lipsschitz continuous then it is a Nemitski loss
LearnBase.isclipable
— Function.isclipable(loss::SupervisedLoss) -> Bool
Return true
if the given loss
function is clipable. A supervised loss $L : Y \times \mathbb{R} \rightarrow [0, \infty)$ can be clipped at $M > 0$ if, for all $(y,t) \in Y \times \mathbb{R}$,
where $\hat{t}$ denotes the clipped value of $t$ at $\pm M$. That is
Examples
julia> isclipable(ExpLoss())
false
julia> isclipable(L2DistLoss())
true
LearnBase.ismarginbased
— Function.ismarginbased(loss::SupervisedLoss) -> Bool
Return true
if the given loss
is a margin-based loss.
A supervised loss function $L : Y \times \mathbb{R} \rightarrow [0, \infty)$ is said to be margin-based, if there exists a representing function $\psi : \mathbb{R} \rightarrow [0, \infty)$ satisfying
Examples
julia> ismarginbased(HuberLoss(2))
false
julia> ismarginbased(L2MarginLoss())
true
LearnBase.isclasscalibrated
— Function.isclasscalibrated(loss::SupervisedLoss) -> Bool
LearnBase.isdistancebased
— Function.isdistancebased(loss::SupervisedLoss) -> Bool
Return true
ifthe given loss
is a distance-based loss.
A supervised loss function $L : Y \times \mathbb{R} \rightarrow [0, \infty)$ is said to be distance-based, if there exists a representing function $\psi : \mathbb{R} \rightarrow [0, \infty)$ satisfying $\psi (0) = 0$ and
Examples
julia> isdistancebased(HuberLoss(2))
true
julia> isdistancebased(L2MarginLoss())
false
LinearAlgebra.issymmetric
— Function.issymmetric(loss::SupervisedLoss) -> Bool
Return true
if the given loss is a symmetric loss.
A function $f : \mathbb{R} \rightarrow [0,\infty)$ is said to be symmetric about origin if we have
A distance-based loss is said to be symmetric if its representing function is symmetric.
Examples
julia> issymmetric(QuantileLoss(0.2))
false
julia> issymmetric(LPDistLoss(2))
true