Usually in Linear Regression we consider X as a explanatory variable whose columns are X1,X2.....Xp are the variables which we use predict are the independent variable y , we measure these values on a continuous scale,When the dependent variable y is dichotomous such as, Male or Female , Pass or Fail , Malignant or Benign.
When we have dependent variable y is a qualitative, we can indicate it by indicator variable such as
Remember first column of independent variable matrix X is 1 , for the constant β0
Our dependent variable y , that we have to predict is indicator suppose it takes two values , assume y follows a bernoulli distribution
yi=1withP(yi=1)=πiyi=0withP(yi=0)=1−πi
Assuming E(ϵi)=0,
E(yi)=1⋅πi+0⋅(1−πi)=πiE(yi)=Xβ=π
where
π=[π1π2π3..πn]T
Now we know in Linear Regression ϵ is supposed to follow normal distribution , whereas here we cannot suppose ϵ to follow normal distribution, because here it take only two discrete values
so we have E(yi)=πi=β0+β1xi1+β2xi2+.....+βpxip where E(yi)∈[0,1] that put bound on the expected value of y
In logistic regression we use Standard logistic function , some people call it a Sigmoid function. It can be given by
Our main work in logistic regression our main aim is to predict π , the bernoulli parameter for Y , and generally we took decision by πi greater than 0.5 or less than 0.5
Link Function
Usually every model have a link function which relates the linear predictor ηi to the mean response μi. First of all we have to understand what is linear predictor, it is a systematic component where ηi=E(y∣xi) ,So if g(.) is a link function then
g(μi)=ηiorμi=g−1(ηi)
In the Linear regression this link is a identity link , whereas in the logistic regression μi=E(yi)=πi so the relation between πi and ηi=E(y∣xi)=β0+β1xi1+β2xi2+.....+βpxip is a logistic regression so
g(Xβ)=π
We have similar equation \eqref1 we can use that to get link function
π=1+exp(Xβ)exp(Xβ)Xβ=η=ln(1−ππ)
where 1−ππ is odds and its log is known as log-odds ,this transformation is logit transformation.
It is very hard to estimate β theoretically , so we choose gradient-descent algorithm for calculation of the parameter