ML Utils

The Following page lists all the extra functions that can be useful when performing Machine Learning algorithms.

Features Engineering

The following utility functions concern all about ML features engineering: normalization, adding of polynomial features..

dataset &quotek::ml::normalize(dataset &X)

normalize takes a data sample and performs 0-mean and unit variance transformations on it.

Return
reference to transformed dataset (X)
Parameters
  • X -

    dataset to normalize.

dataset &quotek::ml::add_bias(dataset &X)

Adds a bias column composed of ones to the dataset. *

Return
reference to the modified dataset.
Parameters
  • X -

    dataset to add ones for.

dataset quotek::ml::pca(dataset &X, int feats)

pca performs Principal Component Analysis on a given n-dimensions dataset.

Return
PCA-reduced dataset.
Parameters
  • X -

    dataset to perform PCA on.

  • feats -

    Number of features to keep after PCA reduction. Note: feats must be between 1 and columns(X) - 1

dvector quotek::ml::kmeans(dataset &X, int nb_clusters)

kmeans performs a K-means clustering algorithm on a given dataset to in order to labelize its samples. This algorithm works very nicelly when you don’t have pre-labelled data and don’t want to labelize them manually.

Return
lebels vector for each row of the dataset.
Parameters
  • X -

    dataset to labelize

  • nb_clusters -

    integer which defines the number of categories wanted for clustering.

dataset quotek::ml::polynomial_features(dataset &X, int degree)

polynomial_features is a function that takes a dataset to increase its number of dimensions with polynomial elements: for instance [a,b] features become at degree 2: [1, a, b, ab, a^2, b^2 ].

Return
dimentionality improved dataset
Parameters
  • X -

    dataset create poly features for.

  • degree -

    Number of dimensions to add.

Non-Linearity

The following section gathers all the non-linear transfert functions: Sigmoid, Tanh, ReLU..

double quotek::ml::nl_sigmoid(double input)

computes the sigmoid function of a single double value.

Return
the sigmoid value for input.
Parameters
  • input -

    input to compute sigmoid for.

dataset quotek::ml::nl_sigmoid(dataset &input)

vectorized version of nl_sigmoid.

dataset quotek::ml::nl_tanh(dataset &input)

vectorized implementation of tanh.

double quotek::ml::nl_rectifier(double input)

nl_rectifier, or Rectified Linear Unit, computes max(0,input) for a single value.

Return
the rectified value for input.
Parameters
  • input -

    input to rectify.

dataset quotek::ml::nl_rectifier(dataset &input)

vectorized version of nl_rectifier.