Regression In Machine Learning

Regression models linear and non-linear are used for predicting a real unknown values by set of known similar values.

Types Of Regression Models

  • Simple Linear Regression

Ordinary list squares - is a method allowing to find best linear regression approximation between all other possible.

  • Multiple Linear Regression
  • Polynomial Regression
  • Support Vector for Regression (SVR)
  • Decision Tree Regression
  • Random Forest Regression

Assumptions Of Linear Regression

Linear regression fit not every scenario. Before applying it, need to test some assumptions on our data set, which will allow to understand if it is suitable.

  • There should be approximate linear relation between dependent and independent parameters.
  • Homoscedasticity - there should not be a cone shape of dataset distribution
  • There should be a normal distribution of our dataset
  • Independence of dataset. Which means that each value in our dataset does not depend on any characteristic of other values in this dataset
  • Coefficients should not correlate to each other
  • Absence of outliers

Categorical Data

To represent categorical data each discreet value of category split into separate coefficient with value 1 if it is a value of this category and 0 otherwise.

Dummy variable trap - important to note that there should be always n-1 coefficient if there is n discreet values of categorical value. Otherwise model will work unreliably.

Logistic Regression

Allows to predict categorical values. Usually yes or no and their likelihood. For example predict whether customer with given characteristics will buy or not.