Logistic Regression

By: Vineeta Tawney

CNESystems

What is Logistic regression?

It is a classification algorithm for predictive analysis. It is used to describe datasets also used to explain the relationship between one dependent variable and one or more independent variables.

Dependent variable: Predictable variable. It should be binary or dichotomous.

Independent variable: other than the dependent variable which influences the dependent variable.

Estimating the probabilities using the logistic function is logistic regression. Logistic regression is used when the response variable is categorical in nature.

Logistic regression is used in spam detection, credit card fraud, Health, Marketing, Banking.

Linear regression is not appropriate because it predicts the value outside the range. Linear regression is used when the response is a continuous variable. And used to predict the linear relationship between a target and san analyst.

The equation of linear regression is,

Y=a+bX

Where X is the explanatory variable,

Y is the dependent variable,

a is a slope of the line,

B is intercept.

On the other hand, logistic regression produces results between values 0 and 1, which is limited to 2 values.

Linear regression and logistic regression.

Logistic Function

It is also called as a sigmoid function. It is an S-shaped curve as shown in the figure.

It can map numbers into the values between 0 and 1.

1 / (1 + e^-value)

Here, e is the base of a natural algorithm.

Logistic Function.

It is the curve of the values between -5 to 5 converted into the range of 0 and 1.

Equation of logistic function can be written as,

Y(t)=A+{K-A \over (C+Qe^{{-Bt}})^{{1/\nu }}}

It can also be written as,

Y(t)=A+{K-A \over (C+e^{{-B(t-M)}})^{{1/\nu }}}

Where M is starting time,

(t_{0})=A+{K-A \over (C+1)^{{1/\nu }}} )

Including Q and M,

Y(t)=A+{K-A \over (C+Qe^{{-B(t-M)}})^{{1/\nu }}}

Where, A = Lower asymptote;

K= Lower asymptote;

B= the growth rate;

Q= related to value Y (0);

C= typically takes a value of 1.

Types of logistic regression:

• Binary logistic regression: It can have only two best fitting outputs, e.g. email is spam or not spam. It is an extension for logistic regression, where the dependent variable is binary or dichotomous. Binary logistic regression predicts that the dependent variable is stochastic in nature. It is the task of predicting the log odds of an event. There should be no noise in the data and highly correlated inputs should not be present in the data.

• Multinomial logistic regression: We can consider it as an extension of binary logistic regression. It can give us more than two best fitting outcomes. The Dependent variable is nominal with two or more values. We can use multinomial logistic regression to understand which type of drink customer likes on the basis of location in the US and age.

• Ordinal logistic regression: It is the extension of multinomial logistic regression. In ordinal regression, we can determine which of the independent variable have a statically important effect on the dependent variable. It can be used to describe dataset and to explain the relationship between one dependent nominal variable and one or more continuous level dependent variable.

Preparing data for logistic regression:

The assumption made in logistic regression is the same as the assumption made in linear regression.

Binary output variable: It will calculate the probability of an instance which can be 0 or 1.

Logistic regression is planned for a two-class classification problem.

Remove noise: There should be no error in the output variable, removing outlier and misclassified data from the input data is necessary.

Removing correlated inputs: If we use correlated inputs model can overfit.

Gaussian distribution: It is a linear algorithm, Data transforms of the input variable

That better discover this linear relationship can produce a more accurate model.

Fail to converge: If the data is sparse this can happen also we should avoid correlated data.

The coefficients in logistic regression are predicted using a procedure called as maximum likelihood estimation.

Logistic regression coefficient:

Application of logistic regression:

• It can be used for image categorization and segmentation.

• It is used for the image processing of many applications.

• It is used for handwriting recognition.

• In the healthcare industry for analysis of millions of people for myocardial infarction within a period of 10 years.

• Reconstructing the budget of organization or industry.