p(X) = Pr(Y = 1|X)
Machine Learning: Understanding Logistic Regression
Upasana | May 22, 2019 | 3 min read | 695 views
Introduction
We will be getting familiar with algorithm : Logistic regression here like
-
When to use Logistic Regression
-
How it works
-
Logistic function
-
Working of Logistic Function
-
Analysing Model results
Classification
Classification is a process to classify data into the subjected labels(unique values in response variables). Classification modelling in machine learning can be supervised and unsupervised both.
But why classification, why not regression?
Suppose that we are trying to predict the medical condition of a patient in the emergency room on the basis of her symptoms. In this simplified example, there are three possible diagnoses: stroke, drug overdose, and epileptic seizure. We could consider encoding these values as a quantitative response variable, Y , as follows:
Y = 1 (if stroke), 2(if drug overdose), 3 (if epileptic seizure)
Using this coding, least squares could be used to fit a linear regression model to predict Y on the basis of a set of predictors X1 , . . . , Xp . Unfortunately, this coding implies an ordering on the outcomes, putting drug overdose in between stroke and epileptic seizure, and insisting that the difference between stroke and drug overdose is the same as the difference between drug overdose and epileptic seizure. (ref: ISLR)
Above conclusion is totally wrong as values in Y are categorical, not continuous. This is why we should consider classification.
Classification would imply a totally different relationship among the three conditions. It will be considering these encodings as categorical only and will be trying to predict probability based on set of observations.
There are many classification techniques and here we will be discussing Logistic Regression.
Logistic Regression
In Logistic Regression, we predict the probability of response variable, Y being true given a set of observations. This implies Logistic is based on conditional probability.
As we will be predicting probabilities, that means value is going to be stay between 0 and 1 always.
Logistic Function
Now, we need to define relationship between X and p(X) which can be defined by logistic function
This function can be re-defined to find odds and then logit as follows:
\$(p(X))/(1-p(X))\$ is odds and it ranges from 0 and infinity.
Probability and odds have different set of properties. Here, odds represents the continuous effect of X on the likelihood that Y will be true.
Now, if we take logs on both sides:
In above equation, log(p(X)/(1-p(X)) is called log-odds or logits.
Working of Logistic Function
By referring Logistic function, we can see that
-
if X increases by 1 unit , it changes the log odds by β1 or it multiplies the odds by eβ1.
-
Change in p(X) due to 1 unit change in X will depend on current value of X.
-
if β1 is positive then increasing X will also be leading to increasing p(X)
-
if β1 is negative then increasing X will also be leading to decreasing p(X)
Here, we can also conclude that Logistic function would work better for Binary Classification which means Logistic model would be better in the case where we have binary response target variable.
Another conclusion is that, this function is non-linear function so Logistic model is a non linear model and relationship between p(X) and X will be in S shape.
Analysing Model results
Null Hypothesis is going to be
In Logistic model, we have z-statistic. when you will see its value corresponding to one of feature, lets say β1 will be always equal to the ratio of β1 to the Standard error of β1
Such that, a large value of z-statistic will indicate evidence against null hypothesis which will also means that p-value is small. When p-value is small, it rejects null hypothesis. That means, we can ignore this feature in data.
This is how, we can work on other features of the data for evaluations if they have significant effect on response variable or not.
Evaluating model
Below are the accuracy metrics that can be used to evaluate model
-
Accuracy score
-
Confusion matrix
-
In case, data is imbalanced(ratio of frequency of unique values in response variable is not around 1) then use F-score, Precision and Recall to evaluate model.
Thanks for reading this article.
Top articles in this category:
- Top 100 interview questions on Data Science & Machine Learning
- Machine Learning based Multiple choice questions
- Google Data Scientist interview questions with answers
- Why use feature selection in machine learning
- Introduction to regression, correlation, multi collinearity and 99th percentile
- Flask Interview Questions
- Configure Logging in gunicorn based application in docker container