Transforms

Below is the list of transforms that are are available in this package.

Select

TableTransforms.SelectType
Select(col₁, col₂, ..., colₙ)
Select([col₁, col₂, ..., colₙ])
Select((col₁, col₂, ..., colₙ))

The transform that selects columns col₁, col₂, ..., colₙ.

Select(col₁ => newcol₁, col₂ => newcol₂, ..., colₙ => newcolₙ)

Selects the columns col₁, col₂, ..., colₙ and rename them to newcol₁, newcol₂, ..., newcolₙ.

Select(regex)

Selects the columns that match with regex.

Examples

Select(1, 3, 5)
Select([:a, :c, :e])
Select(("a", "c", "e"))
Select(1 => :x, 3 => :y)
Select(:a => :x, :b => :y)
Select("a" => "x", "b" => "y")
Select(r"[ace]")
source

Reject

TableTransforms.RejectType
Reject(col₁, col₂, ..., colₙ)
Reject([col₁, col₂, ..., colₙ])
Reject((col₁, col₂, ..., colₙ))

The transform that discards columns col₁, col₂, ..., colₙ.

Reject(regex)

Discards the columns that match with regex.

Examples

Reject(:b, :d, :f)
Reject(["b", "d", "f"])
Reject((2, 4, 6))
Reject(r"[bdf]")
source

Satisfies

TableTransforms.SatisfiesType
Satisfies(pred)

Selects the columns where pred(column) returns true.

Examples

Satisfies(allunique)
Satisfies(x -> sum(x) > 100)
Satisfies(x -> eltype(x) <: Integer)
source

Only

TableTransforms.OnlyFunction
Only(S)

Selects the columns that have scientific type S.

Examples

import DataScienceTraits as DST
Only(DST.Continuous)
source

Except

TableTransforms.ExceptFunction
Except(S)

Selects the columns that don't have scientific type S.

Examples

import DataScienceTraits as DST
Except(DST.Categorical)
source

Rename

TableTransforms.RenameType
Rename(:col₁ => :newcol₁, :col₂ => :newcol₂, ..., :colₙ => :newcolₙ)

The transform that renames col₁, col₂, ..., colₙ to newcol₁, newcol₂, ..., newcolₙ.

Examples

Rename(1 => :x, 3 => :y)
Rename(:a => :x, :c => :y)
Rename("a" => "x", "c" => "y")
source

StdNames

TableTransforms.StdNamesType
StdNames(spec = :uppersnake)

Standardizes column names according to given spec. Default to :uppersnake case specification.

Specs

  • :uppersnake - Upper Snake Case, e.g. COLUMN_NAME
  • :uppercamel - Upper Camel Case, e.g. ColumnName
  • :upperflat - Upper Flat Case, e.g. COLUMNNAME
  • :snake - Snake Case, e.g. column_name
  • :camel - Camel Case, e.g. columnName
  • :flat - Flat Case, e.g. columnname
source

StdFeats

TableTransforms.StdFeatsType
StdFeats()

Standardizes the columns of the table based on scientific types:

  • Continuous: ZScore
  • Categorical: Identity
  • Unknown: Identity
source

Sort

TableTransforms.SortType
Sort(col₁, col₂, ..., colₙ; kwargs...)
Sort([col₁, col₂, ..., colₙ]; kwargs...)
Sort((col₁, col₂, ..., colₙ); kwargs...)

Sort the rows of selected columns col₁, col₂, ..., colₙ by forwarding the kwargs to the sortperm function.

Sort(regex; kwargs...)

Sort the rows of columns that match with regex.

Examples

Sort(:a)
Sort(:a, :c, rev=true)
Sort([1, 3, 5], by=row -> abs.(row))
Sort(("a", "c", "e"))
Sort(r"[ace]")
source

Sample

TableTransforms.SampleType
Sample(size, [weights]; replace=true, ordered=false, rng=GLOBAL_RNG)

Sample size rows of table using weights with or without replacement depending on the option replace. The option ordered can be used to return samples in the same order of the original table.

Examples

Sample(1000)
Sample(1000, replace=false)
Sample(1000, replace=false, ordered=true)

# with rng
using Random
rng = MersenneTwister(2)
Sample(1000, rng=rng)

# with weights
Sample(10, rand(100))
source

Filter

TableTransforms.FilterType
Filter(pred)

Filters the table returning only the rows where the predicate pred is true.

Examples

Filter(row -> sum(row) > 10)
Filter(row -> row.a == true && row.b < 30)
Filter(row -> row."a" == true && row."b" < 30)
Filter(row -> row[1] == true && row[2] < 30)
Filter(row -> row[:a] == true && row[:b] < 30)
Filter(row -> row["a"] == true && row["b"] < 30)

Notes

  • The schema of the table is preserved by the transform.
source

DropMissing

TableTransforms.DropMissingType
DropMissing()
DropMissing(:)

Drop all rows with missing values in table.

DropMissing(col₁, col₂, ..., colₙ)
DropMissing([col₁, col₂, ..., colₙ])
DropMissing((col₁, col₂, ..., colₙ))

Drop all rows with missing values in selected columns col₁, col₂, ..., colₙ.

DropMissing(regex)

Drop all rows with missing values in columns that match with regex.

Examples

DropMissing()
DropMissing("b", "c", "e")
DropMissing([2, 3, 5])
DropMissing((:b, :c, :e))
DropMissing(r"[bce]")

Notes

  • The transform can alter the element type of columns from Union{Missing,T} to T.
  • If the transformed column has only missing values, it will be converted to an empty column of type Any.
source

DropExtrema

TableTransforms.DropExtremaType
DropExtrema(col; low=0.25, high=0.75)

Drops the rows where the values in the column col are outside the interval [quantile(col, low), quantile(col, high)].

Examples

DropExtrema(1)
DropExtrema(:a, low=0.2, high=0.8)
DropExtrema("a", low=0.3, high=0.7)
source

DropUnits

TableTransforms.DropUnitsType
DropUnits()
DropUnits(:)

Drop units from all columns in the table.

DropUnits(col₁, col₂, ..., colₙ)
DropUnits([col₁, col₂, ..., colₙ])
DropUnits((col₁, col₂, ..., colₙ))

Drop units from selected columns col₁, col₂, ..., colₙ.

DropUnits(regex)

Drop units from columns that match with regex.

Examples

DropUnits()
DropUnits([2, 3, 5])
DropUnits([:b, :c, :e])
DropUnits(("b", "c", "e"))
DropUnits(r"[bce]")
source

AbsoluteUnits

TableTransforms.AbsoluteUnitsType
AbsoluteUnits()
AbsoluteUnits(:)

Converts the units of all columns in the table to absolute units.

AbsoluteUnits(col₁, col₂, ..., colₙ)
AbsoluteUnits([col₁, col₂, ..., colₙ])
AbsoluteUnits((col₁, col₂, ..., colₙ))

Converts the units of selected columns col₁, col₂, ..., colₙ to absolute units.

AbsoluteUnits(regex)

Converts the units of columns that match with regex to absolute units.

Examples

AbsoluteUnits()
AbsoluteUnits([2, 3, 5])
AbsoluteUnits([:b, :c, :e])
AbsoluteUnits(("b", "c", "e"))
AbsoluteUnits(r"[bce]")
source

Map

TableTransforms.MapType
Map(cols₁ => fun₁ => target₁, cols₂ => fun₂, ..., colsₙ => funₙ => targetₙ)

Applies the funᵢ function to the columns selected by colsᵢ using the map function and saves the result in a new column named targetᵢ.

The column selection can be a single column identifier (index or name), a collection of identifiers or a regular expression (regex).

Passing a target column name is optional and when omitted a new name is generated by joining the selected column names with the function name. If the target column already exists in the table, the original column will be replaced.

Examples

Map(1 => sin)
Map(:a => sin, "b" => cos => :b_cos)
Map([2, 3] => ((b, c) -> 2b + c))
Map([:a, :c] => ((a, c) -> 2a * 3c) => :col1)
Map(["c", "a"] => ((c, a) -> 3c / a) => :col1, "c" => tan)
Map(r"[abc]" => ((a, b, c) -> a^2 - 2b + c) => "col1")

Notes

  • Anonymous functions must be passed with parentheses as in the examples above.
source

Replace

TableTransforms.ReplaceType
Replace(cols₁ => pred₁ => new₁, pred₂ => new₂, ..., colsₙ => predₙ => newₙ)

Replaces all values where predᵢ predicate returns true with newᵢ value in the the columns selected by colsᵢ.

Passing a column selection is optional and when omitted all columns in the table will be selected. The column selection can be a single column identifier (index or name), a collection of identifiers, or a regular expression (regex).

The predicate can be a function that accepts a single argument and returns a boolean, or a value. If the predicate is a value, it will be transformed into the following function: x -> x === value.

Examples

Replace(1 => -1, 5 => -5)
Replace(2 => 0.0 => 1.5, 5.0 => 5.5)
Replace(:b => 0.0 => 1.5, 5.0 => 5.5)
Replace("b" => 0.0 => 1.5, 5.0 => 5.5)
Replace([1, 3] => >(5) => 5)
Replace([:a, :c] => isequal(2) => -2)
Replace(["a", "c"] => (x -> 4 < x < 6) => 0)
Replace(r"[abc]" => (x -> isodd(x) && x > 10) => 2)

Notes

  • Anonymous functions must be passed with parentheses as in the examples above.
  • Replacements are applied in the sequence in which they are defined, therefore, if there is more than one replacement for the same column, the first valid one will be applied.
source

Coalesce

TableTransforms.CoalesceType
Coalesce(; value)

Replaces all missing values from the table with value.

Coalesce(col₁, col₂, ..., colₙ; value)
Coalesce([col₁, col₂, ..., colₙ]; value)
Coalesce((col₁, col₂, ..., colₙ); value)

Replaces all missing values from the columns col₁, col₂, ..., colₙ with value.

Coalesce(regex; value)

Replaces all missing values from the columns that match with regex with value.

Examples

Coalesce(value=0)
Coalesce(1, 3, 5, value=1)
Coalesce([:a, :c, :e], value=2)
Coalesce(("a", "c", "e"), value=3)
Coalesce(r"[ace]", value=4)

Notes

  • The transform can alter the element type of columns from Union{Missing,T} to T.
source

Coerce

TableTransforms.CoerceType
Coerce(col₁ => S₁, col₂ => S₂, ..., colₙ => Sₙ)

Return a copy of the table, ensuring that the scientific types of the columns match the new specification.

This transform uses the DataScienceTraits.coerce function. Please see their docstring for more details.

Examples

import DataScienceTraits as DST
Coerce(1 => DST.Continuous, 2 => DST.Continuous)
Coerce(:a => DST.Continuous, :b => DST.Continuous)
Coerce("a" => DST.Continuous, "b" => DST.Continuous)
source

Levels

TableTransforms.LevelsType
Levels(col₁ => levels₁, col₂ => levels₂, ..., colₙ => levelsₙ; ordered=nothing)

Convert columns col₁, col₂, ..., colₙ to categorical arrays with given levels levels₁, levels₂, ..., levelsₙ. Optionally, specify which columns are ordered.

Examples

Levels(1 => 1:3, 2 => ["a", "b"], ordered=r"a")
Levels(:a => 1:3, :b => ["a", "b"], ordered=[:a])
Levels("a" => 1:3, "b" => ["a", "b"], ordered=["b"])
source

Indicator

TableTransforms.IndicatorType
Indicator(col; k=10, scale=:quantile, categ=false)

Transforms continuous variable into k indicator variables defined by half-intervals of col values in a given scale. Optionally, specify the categ option to return binary categorical values as opposed to raw 1s and 0s.

Given a sequence of increasing threshold values t1 < t2 < ... < tk, the indicator transform converts a continuous variable Z into a sequence of k variables Z_1 = Z <= t1, Z_2 = Z <= t2, ..., Z_k = Z <= tk.

Scales:

  • :quantile - threshold values are calculated using the quantile(Z, p) function with a linear range of probabilities.
  • :linear - threshold values are calculated using a linear range.

Examples

Indicator(1, k=3)
Indicator(:a, k=6, scale=:linear)
Indicator("a", k=9, scale=:linear, categ=true)
source

OneHot

TableTransforms.OneHotType
OneHot(col; categ=false)

Transforms categorical column col into one-hot columns of levels returned by the levels function of CategoricalArrays.jl. The categ option can be used to convert resulting columns to categorical arrays as opposed to boolean vectors.

Examples

OneHot(1)
OneHot(:a)
OneHot("a")
OneHot("a", categ=true)
source

Identity

Center

TableTransforms.CenterType
Center()

Applies the center transform to all columns of the table. The center transform of the column x, with mean μ, is defined by x .- μ.

Center(col₁, col₂, ..., colₙ)
Center([col₁, col₂, ..., colₙ])
Center((col₁, col₂, ..., colₙ))

Applies the Center transform on columns col₁, col₂, ..., colₙ.

Center(regex)

Applies the Center transform on columns that match with regex.

Examples

Center(1, 3, 5)
Center([:a, :c, :e])
Center(("a", "c", "e"))
Center(r"[ace]")
source

Scale

TableTransforms.ScaleType
Scale(; low=0.25, high=0.75)

Applies the Scale transform to all columns of the table. The scale transform of the column x is defined by (x .- xl) ./ (xh - xl), where xl = quantile(x, low) and xh = quantile(x, high).

Scale(col₁, col₂, ..., colₙ; low=0.25, high=0.75)
Scale([col₁, col₂, ..., colₙ]; low=0.25, high=0.75)
Scale((col₁, col₂, ..., colₙ); low=0.25, high=0.75)

Applies the Scale transform on columns col₁, col₂, ..., colₙ.

Scale(regex; low=0.25, high=0.75)

Applies the Scale transform on columns that match with regex.

Examples

Scale()
Scale(low=0, high=1)
Scale(low=0.3, high=0.7)
Scale(1, 3, 5, low=0, high=1)
Scale([:a, :c, :e], low=0.3, high=0.7)
Scale(("a", "c", "e"), low=0.25, high=0.75)
Scale(r"[ace]", low=0.3, high=0.7)
source

MinMax

TableTransforms.MinMaxFunction
MinMax()

Applies the MinMax transform to all columns of the table. The MinMax transform is equivalent to Scale(low=0, high=1).

MinMax(col₁, col₂, ..., colₙ)
MinMax([col₁, col₂, ..., colₙ])
MinMax((col₁, col₂, ..., colₙ))

Applies the MinMax transform on columns col₁, col₂, ..., colₙ.

MinMax(regex)

Applies the MinMax transform on columns that match with regex.

Examples

MinMax(1, 3, 5)
MinMax([:a, :c, :e])
MinMax(("a", "c", "e"))
MinMax(r"[ace]")

See also Scale.

source

Interquartile

TableTransforms.InterquartileFunction
Interquartile()

Applies the Interquartile transform to all columns of the table. The Interquartile transform is equivalent to Scale(low=0.25, high=0.75).

Interquartile(col₁, col₂, ..., colₙ)
Interquartile([col₁, col₂, ..., colₙ])
Interquartile((col₁, col₂, ..., colₙ))

Applies the Interquartile transform on columns col₁, col₂, ..., colₙ.

Interquartile(regex)

Applies the Interquartile transform on columns that match with regex.

Examples

Interquartile(1, 3, 5)
Interquartile([:a, :c, :e])
Interquartile(("a", "c", "e"))
Interquartile(r"[ace]")

See also Scale.

source

ZScore

TableTransforms.ZScoreType
ZScore()

Applies the z-score transform (a.k.a. normal score) to all columns of the table. The z-score transform of the column x, with mean μ and standard deviation σ, is defined by (x .- μ) ./ σ.

ZScore(col₁, col₂, ..., colₙ)
ZScore([col₁, col₂, ..., colₙ])
ZScore((col₁, col₂, ..., colₙ))

Applies the ZScore transform on columns col₁, col₂, ..., colₙ.

ZScore(regex)

Applies the ZScore transform on columns that match with regex.

Examples

ZScore(1, 3, 5)
ZScore([:a, :c, :e])
ZScore(("a", "c", "e"))
ZScore(r"[ace]")
source

Quantile

TableTransforms.QuantileType
Quantile(; dist=Normal())

The quantile transform to a given distribution.

Quantile(col₁, col₂, ..., colₙ; dist=Normal())
Quantile([col₁, col₂, ..., colₙ]; dist=Normal())
Quantile((col₁, col₂, ..., colₙ); dist=Normal())

Applies the Quantile transform on columns col₁, col₂, ..., colₙ.

Quantile(regex; dist=Normal())

Applies the Quantile transform on columns that match with regex.

Examples

using Distributions

Quantile()
Quantile(dist=Normal())
Quantile(1, 3, 5, dist=Beta())
Quantile([:a, :c, :e], dist=Gamma())
Quantile(("a", "c", "e"), dist=Beta())
Quantile(r"[ace]", dist=Normal())
source

Functional

TableTransforms.FunctionalType
Functional(fun)

The transform that applies a fun elementwise.

Functional(col₁ => fun₁, col₂ => fun₂, ..., colₙ => funₙ)

Apply the corresponding funᵢ function to each colᵢ column.

Examples

Functional(exp)
Functional(log)
Functional(1 => exp, 2 => log)
Functional(:a => exp, :b => log)
Functional("a" => exp, "b" => log)
source

EigenAnalysis

TableTransforms.EigenAnalysisType
EigenAnalysis(proj; [maxdim], [pratio])

The eigenanalysis of the covariance with a given projection proj. Optionally specify the maximum number of dimensions in the output maxdim and the percentage of variance to retain pratio.

Projections

  • :V - Uncorrelated variables (PCA transform)
  • :VD - Uncorrelated variables and variance one (DRS transform)
  • :VDV - Uncorrelated variables and variance one (SDS transformation)

The :V projection used in the PCA transform projects the data on the eigenvectors V of the covariance matrix.

The :VD projection used in the DRS transform. Similar to the :V projection, but the eigenvectors are multiplied by the squared inverse of the eigenvalues D.

The :VDV projection used in the SDS transform. Similar to the :VD transform, but the data is projected back to the basis of the original variables using the Vᵀ matrix.

See https://geostatisticslessons.com/lessons/sphereingmaf for more details about these three variants of eigenanalysis.

Examples

EigenAnalysis(:V)
EigenAnalysis(:VD)
EigenAnalysis(:VDV)
EigenAnalysis(:V, maxdim=3)
EigenAnalysis(:VD, pratio=0.99)
EigenAnalysis(:VDV, maxdim=3, pratio=0.99)
source

PCA

TableTransforms.PCAFunction
PCA([options])

Principal component analysis.

See EigenAnalysis for detailed description of the available options.

Examples

PCA(maxdim=2)
PCA(pratio=0.86)
PCA(maxdim=2, pratio=0.86)

Notes

  • PCA() is shortcut for ZScore() → EigenAnalysis(:V).
source

DRS

TableTransforms.DRSFunction
DRS([options])

Dimension reduction sphering.

See EigenAnalysis for detailed description of the available options.

Examples

DRS(maxdim=3)
DRS(pratio=0.87)
DRS(maxdim=3, pratio=0.87)

Notes

  • DRS() is shortcut for ZScore() → EigenAnalysis(:VD).
source

SDS

TableTransforms.SDSFunction
SDS([options])

Standardized data sphering.

See EigenAnalysis for detailed description of the available options.

Examples

SDS()
SDS(maxdim=4)
SDS(pratio=0.88)
SDS(maxdim=4, pratio=0.88)

Notes

  • SDS() is shortcut for ZScore() → EigenAnalysis(:VDV).
source

ProjectionPursuit

TableTransforms.ProjectionPursuitType
ProjectionPursuit(; tol=1e-6, maxiter=100, deg=5, perc=0.9, n=100, rng=Random.GLOBAL_RNG)

The projection pursuit multivariate transform converts any multivariate distribution into the standard multivariate Gaussian distribution.

This iterative algorithm repeatedly finds a direction of projection α that maximizes a score of non-Gaussianity known as the projection index I(α). The samples projected along α are then transformed with the Quantile transform to remove the non-Gaussian structure. The other coordinates in the rotated orthonormal basis Q = [α ...] are left untouched.

The non-singularity of Q is controlled by assuring that norm(det(Q)) ≥ tol. The iterative process terminates whenever the transformed samples are "more Gaussian" than perc% of n randomly generated samples from the standard multivariate Gaussian distribution, or when the number of iterations reaches a maximum maxiter.

Examples

ProjectionPursuit()
ProjectionPursuit(deg=10)
ProjectionPursuit(perc=0.85, n=50)
ProjectionPursuit(tol=1e-4, maxiter=250, deg=5, perc=0.95, n=100)

# with rng
using Random
rng = MersenneTwister(2)
ProjectionPursuit(perc=0.85, n=50, rng=rng)

See https://doi.org/10.2307/2289161 for further details.

source

Closure

TableTransforms.ClosureType
Closure()

The transform that applies the closure operation (i.e. x ./ sum(x)), to all rows of the input table. The rows of the output table sum to one.

See also Remainder.

source

Remainder

TableTransforms.RemainderType
Remainder([total])

The transform that takes a table with columns x₁, x₂, …, xₙ and returns a new table with an additional column containing the remainder value xₙ₊₁ = total .- (x₁ + x₂ + ⋯ + xₙ) If the total value is not specified, then default to the maximum sum across rows.

See also Closure.

source

Compose

TableTransforms.ComposeType
Compose(; as=:CODA)

Converts all columns of the table into parts of a composition in a new column named as, using the CoDa.compose function.

Compose(col₁, col₂, ..., colₙ; as=:CODA)
Compose([col₁, col₂, ..., colₙ]; as=:CODA)
Compose((col₁, col₂, ..., colₙ); as=:CODA)

Converts the selected columns col₁, col₂, ..., colₙ into parts of a composition.

Compose(regex; as=:CODA)

Converts the columns that match with regex into parts of a composition.

Examples

Compose(as=:comp)
Compose([2, 3, 5])
Compose([:b, :c, :e])
Compose(("b", "c", "e"))
Compose(r"[bce]", as="COMP")
source

ALR

TableTransforms.ALRType
ALR([refvar])

Additive log-ratio transform.

Optionally, specify the reference variable refvar for the ratios. Default to the last column of the input table.

source

CLR

ILR

TableTransforms.ILRType
ILR([refvar])

Isometric log-ratio transform.

Optionally, specify the reference variable refvar for the ratios. Default to the last column of the input table.

source

RowTable

ColTable