Transforms

Below is the list of transforms that are are available in this package.

Assert

TableTransforms.Assert — Type

Assert(; cond, msg="")

Asserts all columns of the table by throwing a AssertionError(msg) if cond(column) returns false, otherwise returns the input table.

The msg argument can be a string, or a function that receives the column name and returns a string, e.g.: nm -> "error in column $nm".

Assert(col₁, col₂, ..., colₙ; cond, msg="")
Assert([col₁, col₂, ..., colₙ]; cond, msg="")
Assert((col₁, col₂, ..., colₙ); cond, msg="")

Asserts the selected columns col₁, col₂, ..., colₙ.

Assert(regex; cond, msg="")

Asserts the columns that match with regex.

Examples

Assert(cond=allunique, msg="assertion error")
Assert([2, 3, 5], cond=x -> sum(x) > 100)
Assert([:b, :c, :e], cond=x -> eltype(x) <: Integer)
Assert(("b", "c", "e"), cond=allunique, msg=nm -> "error in column $nm")
Assert(r"[bce]", cond=x -> sum(x) > 100)

source

StdNames

TableTransforms.StdNames — Type

StdNames(spec = :uppersnake)

Standardizes column names according to given spec. Default to :uppersnake case specification.

Specs

:uppersnake - Upper Snake Case, e.g. COLUMN_NAME
:uppercamel - Upper Camel Case, e.g. ColumnName
:upperflat - Upper Flat Case, e.g. COLUMNNAME
:snake - Snake Case, e.g. column_name
:camel - Camel Case, e.g. columnName
:flat - Flat Case, e.g. columnname

source

StdFeats

TableTransforms.StdFeats — Type

StdFeats()

Standardizes the columns of the table based on scientific types:

Continuous: ZScore
Categorical: Identity
Unknown: Identity

source

Filter

TableTransforms.Filter — Type

Filter(pred)

Filters the table returning only the rows where the predicate pred is true.

Examples

Filter(row -> sum(row) > 10)
Filter(row -> row.a == true && row.b < 30)
Filter(row -> row."a" == true && row."b" < 30)
Filter(row -> row[1] == true && row[2] < 30)
Filter(row -> row[:a] == true && row[:b] < 30)
Filter(row -> row["a"] == true && row["b"] < 30)

Notes

The schema of the table is preserved by the transform.

source

DropMissing

TableTransforms.DropMissing — Type

DropMissing()
DropMissing(:)

Drop all rows with missing values in table.

DropMissing(col₁, col₂, ..., colₙ)
DropMissing([col₁, col₂, ..., colₙ])
DropMissing((col₁, col₂, ..., colₙ))

Drop all rows with missing values in selected columns col₁, col₂, ..., colₙ.

DropMissing(regex)

Drop all rows with missing values in columns that match with regex.

Examples

DropMissing()
DropMissing("b", "c", "e")
DropMissing([2, 3, 5])
DropMissing((:b, :c, :e))
DropMissing(r"[bce]")

Notes

The transform can alter the element type of columns from Union{Missing,T} to T.
If the transformed column has only missing values, it will be converted to an empty column of type Any.

source

DropConstant

TableTransforms.DropConstant — Type

DropConstant()

Drops the constant columns using the allequal function.

source

Unitify

TableTransforms.Unitify — Type

Unitify()

Add units to columns of the table using bracket syntax. A column named col [unit] will be renamed to a unitful column col with a valid unit from Unitful.jl.

In the case that the unit is not recognized by Unitful.jl, no units are added. Empty brackets are also allowed to represent columns without units, e.g. col [].

source

Map

TableTransforms.Map — Type

Map(cols₁ => fun₁ => target₁, cols₂ => fun₂, ..., colsₙ => funₙ => targetₙ)

Applies the funᵢ function to the columns selected by colsᵢ using the map function and saves the result in a new column named targetᵢ.

The column selection can be a single column identifier (index or name), a collection of identifiers or a regular expression (regex).

Passing a target column name is optional and when omitted a new name is generated by joining the function name with the selected column names. If the target column already exists in the table, the original column will be replaced.

Examples

Map(1 => sin)
Map(:a => sin, "b" => cos => :cos_b)
Map([2, 3] => ((b, c) -> 2b + c))
Map([:a, :c] => ((a, c) -> 2a * 3c) => :col1)
Map(["c", "a"] => ((c, a) -> 3c / a) => :col1, "c" => tan)
Map(r"[abc]" => ((a, b, c) -> a^2 - 2b + c) => "col1")

Notes

Anonymous functions must be passed with parentheses as in the examples above;
Some function names are treated in a special way, they are:
- Anonymous functions: #1 -> f1;
- Composed functions: outer ∘ inner -> outer_inner;
- Base.Fix1 functions: Base.Fix1(f, x) -> fix1_f;
- Base.Fix2 functions: Base.Fix2(f, x) -> fix2_f;

source

Replace

TableTransforms.Replace — Type

Replace(cols₁ => pred₁ => new₁, pred₂ => new₂, ..., colsₙ => predₙ => newₙ)

Replaces all values where predᵢ predicate returns true with newᵢ value in the the columns selected by colsᵢ.

Passing a column selection is optional and when omitted all columns in the table will be selected. The column selection can be a single column identifier (index or name), a collection of identifiers, or a regular expression (regex).

The predicate can be a function that accepts a single argument and returns a boolean, or a value. If the predicate is a value, it will be transformed into the following function: x -> x === value.

Examples

Replace(1 => -1, 5 => -5)
Replace(2 => 0.0 => 1.5, 5.0 => 5.5)
Replace(:b => 0.0 => 1.5, 5.0 => 5.5)
Replace("b" => 0.0 => 1.5, 5.0 => 5.5)
Replace([1, 3] => >(5) => 5)
Replace([:a, :c] => isequal(2) => -2)
Replace(["a", "c"] => (x -> 4 < x < 6) => 0)
Replace(r"[abc]" => (x -> isodd(x) && x > 10) => 2)

Notes

Anonymous functions must be passed with parentheses as in the examples above.
Replacements are applied in the sequence in which they are defined, therefore, if there is more than one replacement for the same column, the first valid one will be applied.

source

Coalesce

TableTransforms.Coalesce — Type

Coalesce(; value)

Replaces all missing values from the table with value.

Coalesce(col₁, col₂, ..., colₙ; value)
Coalesce([col₁, col₂, ..., colₙ]; value)
Coalesce((col₁, col₂, ..., colₙ); value)

Replaces all missing values from the columns col₁, col₂, ..., colₙ with value.

Coalesce(regex; value)

Replaces all missing values from the columns that match with regex with value.

Examples

Coalesce(value=0)
Coalesce(1, 3, 5, value=1)
Coalesce([:a, :c, :e], value=2)
Coalesce(("a", "c", "e"), value=3)
Coalesce(r"[ace]", value=4)

Notes

The transform can alter the element type of columns from Union{Missing,T} to T.

source

Coerce

TableTransforms.Coerce — Type

Coerce(col₁ => S₁, col₂ => S₂, ..., colₙ => Sₙ)

Return a copy of the table, ensuring that the scientific types of the columns match the new specification.

Coerce(S)

Coerce all columns of the table with scientific type S.

This transform uses the DataScienceTraits.coerce function. Please see their docstring for more details.

Examples

using DataScienceTraits
Coerce(1 => Continuous, 2 => Continuous)
Coerce(:a => Continuous, :b => Continuous)
Coerce("a" => Continuous, "b" => Continuous)

source

Levels

TableTransforms.Levels — Type

Levels(col₁ => levels₁, col₂ => levels₂, ..., colₙ => levelsₙ; ordered=nothing)

Convert columns col₁, col₂, ..., colₙ to categorical arrays with given levels levels₁, levels₂, ..., levelsₙ. Optionally, specify which columns are ordered.

Examples

Levels(1 => 1:3, 2 => ["a", "b"], ordered=r"a")
Levels(:a => 1:3, :b => ["a", "b"], ordered=[:a])
Levels("a" => 1:3, "b" => ["a", "b"], ordered=["b"])

source

Indicator

TableTransforms.Indicator — Type

Indicator(col; k=10, scale=:quantile, categ=false)

Transforms continuous variable into k indicator variables defined by half-intervals of col values in a given scale. Optionally, specify the categ option to return binary categorical values as opposed to raw 1s and 0s.

Given a sequence of increasing threshold values t1 < t2 < ... < tk, the indicator transform converts a continuous variable Z into a sequence of k variables Z_1 = Z <= t1, Z_2 = Z <= t2, ..., Z_k = Z <= tk.

Scales:

:quantile - threshold values are calculated using the quantile(Z, p) function with a linear range of probabilities.
:linear - threshold values are calculated using a linear range.

Examples

Indicator(1, k=3)
Indicator(:a, k=6, scale=:linear)
Indicator("a", k=9, scale=:linear, categ=true)

source

OneHot

TableTransforms.OneHot — Type

OneHot(col; categ=false)

Transforms categorical column col into one-hot columns of levels returned by the levels function of CategoricalArrays.jl. The categ option can be used to convert resulting columns to categorical arrays as opposed to boolean vectors.

Examples

OneHot(1)
OneHot(:a)
OneHot("a")
OneHot("a", categ=true)

source

Identity

TransformsBase.Identity — Type

Identity()

The identity transform that maps any object to itself.

source

Center

TableTransforms.Center — Type

Center()

Applies the center transform to all columns of the table. The center transform of the column x, with mean μ, is defined by x .- μ.

Center(col₁, col₂, ..., colₙ)
Center([col₁, col₂, ..., colₙ])
Center((col₁, col₂, ..., colₙ))

Applies the Center transform on columns col₁, col₂, ..., colₙ.

Center(regex)

Applies the Center transform on columns that match with regex.

Examples

Center(1, 3, 5)
Center([:a, :c, :e])
Center(("a", "c", "e"))
Center(r"[ace]")

source

LowHigh

TableTransforms.LowHigh — Type

LowHigh(; low=0.25, high=0.75)

Applies the LowHigh transform to all columns of the table. The LowHigh transform of the column x is defined by (x .- xl) ./ (xh - xl), where xl = quantile(x, low) and xh = quantile(x, high).

LowHigh(col₁, col₂, ..., colₙ; low=0.25, high=0.75)
LowHigh([col₁, col₂, ..., colₙ]; low=0.25, high=0.75)
LowHigh((col₁, col₂, ..., colₙ); low=0.25, high=0.75)

Applies the LowHigh transform on columns col₁, col₂, ..., colₙ.

LowHigh(regex; low=0.25, high=0.75)

Applies the LowHigh transform on columns that match with regex.

Examples

LowHigh()
LowHigh(low=0, high=1)
LowHigh(low=0.3, high=0.7)
LowHigh(1, 3, 5, low=0, high=1)
LowHigh([:a, :c, :e], low=0.3, high=0.7)
LowHigh(("a", "c", "e"), low=0.25, high=0.75)
LowHigh(r"[ace]", low=0.3, high=0.7)

source

MinMax

TableTransforms.MinMax — Function

MinMax()

Applies the MinMax transform to all columns of the table. The MinMax transform is equivalent to LowHigh(low=0, high=1).

MinMax(col₁, col₂, ..., colₙ)
MinMax([col₁, col₂, ..., colₙ])
MinMax((col₁, col₂, ..., colₙ))

Applies the MinMax transform on columns col₁, col₂, ..., colₙ.

MinMax(regex)

Applies the MinMax transform on columns that match with regex.

Examples

MinMax(1, 3, 5)
MinMax([:a, :c, :e])
MinMax(("a", "c", "e"))
MinMax(r"[ace]")

Interquartile

TableTransforms.Interquartile — Function

Interquartile()

Applies the Interquartile transform to all columns of the table. The Interquartile transform is equivalent to LowHigh(low=0.25, high=0.75).

Interquartile(col₁, col₂, ..., colₙ)
Interquartile([col₁, col₂, ..., colₙ])
Interquartile((col₁, col₂, ..., colₙ))

Applies the Interquartile transform on columns col₁, col₂, ..., colₙ.

Interquartile(regex)

Applies the Interquartile transform on columns that match with regex.

Examples

Interquartile(1, 3, 5)
Interquartile([:a, :c, :e])
Interquartile(("a", "c", "e"))
Interquartile(r"[ace]")

ZScore

TableTransforms.ZScore — Type

ZScore()

Applies the z-score transform (a.k.a. normal score) to all columns of the table. The z-score transform of the column x, with mean μ and standard deviation σ, is defined by (x .- μ) ./ σ.

ZScore(col₁, col₂, ..., colₙ)
ZScore([col₁, col₂, ..., colₙ])
ZScore((col₁, col₂, ..., colₙ))

Applies the ZScore transform on columns col₁, col₂, ..., colₙ.

ZScore(regex)

Applies the ZScore transform on columns that match with regex.

Examples

ZScore(1, 3, 5)
ZScore([:a, :c, :e])
ZScore(("a", "c", "e"))
ZScore(r"[ace]")

source

Quantile

TableTransforms.Quantile — Type

Quantile(; dist=Normal())

The quantile transform to a given distribution.

Quantile(col₁, col₂, ..., colₙ; dist=Normal())
Quantile([col₁, col₂, ..., colₙ]; dist=Normal())
Quantile((col₁, col₂, ..., colₙ); dist=Normal())

Applies the Quantile transform on columns col₁, col₂, ..., colₙ.

Quantile(regex; dist=Normal())

Applies the Quantile transform on columns that match with regex.

Examples

using Distributions

Quantile()
Quantile(dist=Normal())
Quantile(1, 3, 5, dist=Beta())
Quantile([:a, :c, :e], dist=Gamma())
Quantile(("a", "c", "e"), dist=Beta())
Quantile(r"[ace]", dist=Normal())

source

Functional

TableTransforms.Functional — Type

Functional(fun)

The transform that applies a fun elementwise.

Functional(col₁ => fun₁, col₂ => fun₂, ..., colₙ => funₙ)

Apply the corresponding funᵢ function to each colᵢ column.

Examples

Functional(exp)
Functional(log)
Functional(1 => exp, 2 => log)
Functional(:a => exp, :b => log)
Functional("a" => exp, "b" => log)

source

EigenAnalysis

TableTransforms.EigenAnalysis — Type

EigenAnalysis(proj; [maxdim], [pratio])

The eigenanalysis of the covariance with a given projection proj. Optionally specify the maximum number of dimensions in the output maxdim and the percentage of variance to retain pratio.

Projections

:V - Uncorrelated variables (PCA transform)
:VD - Uncorrelated variables and variance one (DRS transform)
:VDV - Uncorrelated variables and variance one (SDS transformation)

The :V projection used in the PCA transform projects the data on the eigenvectors V of the covariance matrix.

The :VD projection used in the DRS transform. Similar to the :V projection, but the eigenvectors are multiplied by the squared inverse of the eigenvalues D.

The :VDV projection used in the SDS transform. Similar to the :VD transform, but the data is projected back to the basis of the original variables using the Vᵀ matrix.

See https://geostatisticslessons.com/lessons/sphereingmaf for more details about these three variants of eigenanalysis.

Examples

EigenAnalysis(:V)
EigenAnalysis(:VD)
EigenAnalysis(:VDV)
EigenAnalysis(:V, maxdim=3)
EigenAnalysis(:VD, pratio=0.99)
EigenAnalysis(:VDV, maxdim=3, pratio=0.99)

source

PCA

TableTransforms.PCA — Function

PCA([options])

Principal component analysis.

See EigenAnalysis for detailed description of the available options.

Examples

PCA(maxdim=2)
PCA(pratio=0.86)
PCA(maxdim=2, pratio=0.86)

Notes

PCA() is shortcut for ZScore() → EigenAnalysis(:V).

source

DRS

TableTransforms.DRS — Function

DRS([options])

Dimension reduction sphering.

See EigenAnalysis for detailed description of the available options.

Examples

DRS(maxdim=3)
DRS(pratio=0.87)
DRS(maxdim=3, pratio=0.87)

Notes

DRS() is shortcut for ZScore() → EigenAnalysis(:VD).

source

SDS

TableTransforms.SDS — Function

SDS([options])

Standardized data sphering.

See EigenAnalysis for detailed description of the available options.

Examples

SDS()
SDS(maxdim=4)
SDS(pratio=0.88)
SDS(maxdim=4, pratio=0.88)

Notes

SDS() is shortcut for ZScore() → EigenAnalysis(:VDV).

source

ProjectionPursuit

TableTransforms.ProjectionPursuit — Type

ProjectionPursuit(; tol=1e-6, maxiter=100, deg=5, perc=0.9, n=100, rng=Random.default_rng())

The projection pursuit multivariate transform converts any multivariate distribution into the standard multivariate Gaussian distribution.

This iterative algorithm repeatedly finds a direction of projection α that maximizes a score of non-Gaussianity known as the projection index I(α). The samples projected along α are then transformed with the Quantile transform to remove the non-Gaussian structure. The other coordinates in the rotated orthonormal basis Q = [α ...] are left untouched.

The non-singularity of Q is controlled by assuring that norm(det(Q)) ≥ tol. The iterative process terminates whenever the transformed samples are "more Gaussian" than perc% of n randomly generated samples from the standard multivariate Gaussian distribution, or when the number of iterations reaches a maximum maxiter.

Examples

ProjectionPursuit()
ProjectionPursuit(deg=10)
ProjectionPursuit(perc=0.85, n=50)
ProjectionPursuit(tol=1e-4, maxiter=250, deg=5, perc=0.95, n=100)

# with rng
using Random
rng = MersenneTwister(2)
ProjectionPursuit(perc=0.85, n=50, rng=rng)

See https://doi.org/10.2307/2289161 for further details.

source

KMedoids

TableTransforms.KMedoids — Type

KMedoids(k; tol=1e-4, maxiter=10, weights=nothing, rng=Random.default_rng())

Assign labels to rows of table using the k-medoids algorithm.

The iterative algorithm is interrupted if the relative change on the average distance to medoids is smaller than a tolerance tol or if the number of iterations exceeds the maximum number of iterations maxiter.

Optionally, specify a dictionary of weights for each column to affect the underlying table distance from TableDistances.jl, and a random number generator rng to obtain reproducible results.

Examples

KMedoids(3)
KMedoids(4, maxiter=20)
KMedoids(5, weights=Dict(:col1 => 1.0, :col2 => 2.0))

References

Kaufman, L. & Rousseeuw, P. J. 1990. Partitioning Around Medoids (Program PAM)
Kaufman, L. & Rousseeuw, P. J. 1991. Finding Groups in Data: An Introduction to Cluster Analysis

source

Closure

TableTransforms.Closure — Type

Closure()

The transform that applies the closure operation (i.e. x ./ sum(x)), to all rows of the input table. The rows of the output table sum to one.

Remainder

TableTransforms.Remainder — Type

Remainder([total])

The transform that takes a table with columns x₁, x₂, …, xₙ and returns a new table with an additional column containing the remainder value xₙ₊₁ = total .- (x₁ + x₂ + ⋯ + xₙ) If the total value is not specified, then default to the maximum sum across rows.

Compose

TableTransforms.Compose — Type

Compose(; as=:coda)

Converts all columns of the table into parts of a composition in a new column named as, using the CoDa.compose function.

Compose(col₁, col₂, ..., colₙ; as=:coda)
Compose([col₁, col₂, ..., colₙ]; as=:coda)
Compose((col₁, col₂, ..., colₙ); as=:coda)

Converts the selected columns col₁, col₂, ..., colₙ into parts of a composition.

Compose(regex; as=:coda)

Converts the columns that match with regex into parts of a composition.

Examples

Compose(as=:composition)
Compose([2, 3, 5])
Compose([:b, :c, :e])
Compose(("b", "c", "e"))
Compose(r"[bce]", as="composition")

source

ALR

TableTransforms.ALR — Type

ALR([refvar])

Additive log-ratio transform.

Optionally, specify the reference variable refvar for the ratios. Default to the last column of the input table.

source

CLR

TableTransforms.CLR — Type

CLR()

Centered log-ratio transform.

source

ILR

TableTransforms.ILR — Type

ILR([refvar])

Isometric log-ratio transform.

Optionally, specify the reference variable refvar for the ratios. Default to the last column of the input table.

source

RowTable

TableTransforms.RowTable — Type

RowTable()

The transform that applies the function Tables.rowtable to to the input table.

source

ColTable

TableTransforms.ColTable — Type

ColTable()

The transform that applies the function Tables.columntable to to the input table.

source