Logistic Regressions

Logistic regressions (known as classifiers), are also a simple class of Machine Learning Algorithms. Their goal is to tell (once they are trained ) if the data in input belongs to a specific class or not.

_images/ml_logreg.png

Typical Logistic regression graph.




















Note: If you want to classify data on multiple classes, you just have to train multiple classifier objects, and take the classifier output that has the highest score.

class quotek::ml::logisticRegression

The logisticRegression class allows to perform logistic regression learning algorithms on some datasets.

Public Functions

logisticRegression()

Class simplest constructor

~logisticRegression()

Object Destructor

int train(dataset &X)

train takes a dataset and creates a fitting model according to the provided data. Note: it assusmes that the last column of the dataset stores the expected results (y).

Parameters
  • X -

    Dataset to modelize.

int train(dataset &X, dvector &y)

Same than regular train() except that it assumes the expected results are in a splitted, m-dimensions vector.

Parameters
  • X -

    Dataset to modelize.

  • y -

    results vector.

int predict(dataset &X, std::vector<int> &y)

predict takes a small dataset and try to guess the output according to the previously learned model.

Parameters
  • X -

    Data to predict outputs for.

  • y -

    Predicted outputs, stored as a vector of floats.

int predict(dataset &X)

predict takes a small dataset and try to guess the output according to the previously learned model.

Return
predicted output, as a float.
Parameters
  • X -

    Data to predict output for (array must have 1 single line).

Public Members

bool regularize

stores wether we will use regularization or not.

double thereshold

thereshold is the number in [0,1] that will make that an element will be labeled as being part of a class, or not. Default value is 0.5.

VectorXd coefficients

stores the coefficients for each dimension of the dataset.

Exemple

#include <quotek/quotek.hpp>
#include <quotek/logisticregression.hpp>

int main(int argc, char** argv) {

  //We declare both a dataset and an expected result object.
  quotek::ml::dataset X;
  quotek::ml::dvector y;

  X =  MatrixXd(22,1);
  y =  VectorXd(22);

  /* This dataset will train the classifier so
   * that any number < 10 is in class, and any number > 10 is not.
   */
  X << 10,89,112,8,32,9,12,5,23,8,56,2,3,45,6,7,15,11,13,1,999,189;
  y << 1, 0, 0, 1,0, 1,0, 1,0, 1,0, 1,1,0, 1, 1, 0, 0, 0,1,0,  0  ;

  quotek::ml::logisticRegression lr1;

  lr1.thereshold = 0.4;
  lr1.train(X,y);

  quotek::ml::dataset Xpred = MatrixXd(1,1);
  Xpred << 92, 5;

  //We create a vector to store the predictions results.
  std::vector<int> mpred;

  //We make our prediction.
  lr1.predict(Xpred, mpred);

  //We should get 2 prediction results.
  assert(mpred.size() == 2);

  //92 is not in class.
  assert(mpred[0] == 0);

  //5 is in class.
  assert(mpred[1] == 1);

}