Saturday, January 25, 2020

Logistic Regression(Loss method) in Machine Learning using Python- 

Purchased through an advertisement on social media.


Logistic Regression
Logistic regression is similar to linear regression, but it is used when the output is binary (i.e. when outcome can have only two possible values). The prediction for this final output will be a non-linear S-shaped function called the logistic function, g().
This logistic function maps the intermediate outcome values into an outcome variable Y with values ranging from 0 to 1. These values can then be interpreted as the probability of occurrence of Y. The properties of the S-shaped logistic function make logistic regression better for classification tasks.




Logistic Model
Consider a model with features x1, x2, x3 … xn. Let the binary output be denoted by Y, that can take the values 0 or 1.
Let p be the probability of Y = 1, we can denote it as p = P(Y=1).
The mathematical relationship between these variables can be denoted as:
Loss Function
The loss is basically the error in our predicted value. In other words it is a difference between our predicted value and the actual value. We will be using the L2 Loss Function to calculate the error. Theoretically you can use any function to calculate the error. This function can be broken down as:
  1. Let the actual value be yᵢ. Let the value predicted using our model be denoted as ȳᵢ.Find the difference between the actual and predicted value.
  2. Square this difference.
  3. Find the sum across all the values in training data.

  
Now that we have the error, we need to update the values of our parameters to minimize this error. This is where the “learning” actually happens, since our model is updating itself based on it’s previous output to obtain a more accurate output in the next step. Hence with each iteration our model becomes more and more accurate. We will be using the Gradient Descent Algorithm to estimate our parameters. Another commonly used algorithm is the Maximum Likelihood Estimation.
The loss or error on the y axis and number of iterations on the x axis.





Implementing the Model
The data was taken from kaggle and describes information about a product being purchased through an advertisement on social media. We will be predicting the value of Purchased and consider a single feature, Age to predict the values of Purchased. You can have multiple features as well.

# Product being Purchased through an advertisement on social media.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from math import exp
plt.rcParams["figure.figsize"] = (10, 6)
## Load the Input Data
data = pd.read_csv("Social_Network_Ads.csv")
data.head()
User ID
Gender
Age
EstimatedSalary
Purchased
0
15624510
Male
19
19000
0
1
15810944
Male
35
20000
0
2
15668575
Female
26
43000
0
3
15603246
Female
27
57000
0
4
15804002
Male
19
76000
0
# Lets Viswalise the given data.
plt.scatter(data['Age'], data['Purchased'])
plt.show()

# We need to normalize our training data by which we can shift the mean to the origin.(i.e : 0). Now we will do normalize method.
# Creating the logistic regression model
# Helper function to normalize data
def normalize(X):
    return X - X.mean()
# Method to make predictions
def predict(X, b0, b1):
    return np.array([1 / (1 + exp(-1*b0 + -1*b1*x)) for x in X])
# Method to train the model
def logistic_regression(X, Y):
   X = normalize(X)
# Initializing variables
    b0 = 0
    b1 = 0
    L = 0.001
    epochs = 300
    for epoch in range(epochs):
        y_pred = predict(X, b0, b1)
        D_b0 = -2 * sum((Y - y_pred) * y_pred * (1 - y_pred))  # Derivative of loss wrt b0
        D_b1 = -2 * sum(X * (Y - y_pred) * y_pred * (1 - y_pred))  # Derivative of loss wrt b1
        # Update b0 and b1
        b0 = b0 - L * D_b0
        b1 = b1 - L * D_b1
    return b0, b1
# Training the model
b0, b1 = logistic_regression(X_train, y_train)
# Making predictions
X_test_norm = normalize(X_test)
y_pred = predict(X_test_norm, b0, b1)
y_pred = [1 if p >= 0.5 else 0 for p in y_pred]
plt.clf()
plt.scatter(X_test, y_test)
plt.scatter(X_test, y_pred, c="red")
plt.show()
# The accuracy
accuracy = 0
for i in range(len(y_pred)):
    if y_pred[i] == y_test.iloc[i]:
        accuracy += 1
print(f"Accuracy = {accuracy / len(y_pred)}")

Accuracy = 0.85

# Making predictions using scikit learn
from sklearn.linear_model import LogisticRegression

# Create an instance and fit the model 
lr_model = LogisticRegression()
lr_model.fit(X_train.values.reshape(-1, 1), y_train.values.reshape(-1, 1))

# Making predictions
y_pred_sk = lr_model.predict(X_test.values.reshape(-1, 1))
plt.clf()
plt.scatter(X_test, y_test)
plt.scatter(X_test, y_pred_sk, c="red")
plt.show()

# Accuracy
print(f"Accuracy = {lr_model.score(X_test.values.reshape(-1, 1), y_test.values.reshape(-1, 1))}")


Accuracy = 0.8625
Thus we have implemented a seemingly complicated algorithm easily using python from scratch and also compared it with a standard model in sklearn that does the same. I think the most crucial part here is the gradient descent algorithm, and learning how to the weights are updated at each step. Once you have learned this basic concept, then you will be able to estimate parameters for any function.

Now, to predict whether a user will purchase the product or not, one needs to find out the relationship between Age and Estimated Salary. Here User ID and Gender are not important factors for finding out this.

# input 
x = data.iloc[:, [2, 3]].values 

# output 
y = data.iloc[:, 4].values 
from sklearn.preprocessing import StandardScaler
xtrain, xtest, ytrain, ytest = train_test_split( 
                        x, y, test_size = 0.25, random_state = 0) 

sc_x = StandardScaler() 
xtrain = sc_x.fit_transform(xtrain) 
xtest = sc_x.transform(xtest) 

print (xtrain[0:10, :])

[[ 0.58164944 -0.88670699]
 [-0.60673761  1.46173768]
 [-0.01254409 -0.5677824 ]
 [-0.60673761  1.89663484]
 [ 1.37390747 -1.40858358]
 [ 1.47293972  0.99784738]
 [ 0.08648817 -0.79972756]
 [-0.01254409 -0.24885782]
 [-0.21060859 -0.5677824 ]
 [-0.21060859 -0.19087153]]

from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(xtrain, ytrain)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=0, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False)

y_pred = classifier.predict(xtest)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(ytest, y_pred)

print ("Confusion Matrix : \n", cm)

Confusion Matrix :
 [[65  3]
 [ 8 24]]

from sklearn.metrics import accuracy_score
print ("Accuracy : ", accuracy_score(ytest, y_pred))

Accuracy :  0.89

from matplotlib.colors import ListedColormap
X_set, y_set = xtest, ytest
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1,
                                                     stop = X_set[:, 0].max() + 1, step = 0.01),
                                      np.arange(start = X_set[:, 1].min() - 1,
                                                     stop = X_set[:, 1].max() + 1, step = 0.01))

plt.contourf(X1, X2, classifier.predict(
                       np.array([X1.ravel(), X2.ravel()]).T).reshape(
                       X1.shape), alpha = 0.75, cmap = ListedColormap(('red', 'green')))

plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):
        plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                               c = ListedColormap(('red', 'green'))(i), label = j)
       
plt.title('Classifier (Test set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

from matplotlib.colors import ListedColormap
X_set, y_set = xtest, ytest
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1,
                                                     stop = X_set[:, 0].max() + 1, step = 0.01),
                                      np.arange(start = X_set[:, 1].min() - 1,
                                                     stop = X_set[:, 1].max() + 1, step = 0.01))

plt.contourf(X1, X2, classifier.predict(
                       np.array([X1.ravel(), X2.ravel()]).T).reshape(
                       X1.shape), alpha = 0.75, cmap = ListedColormap(('red', 'green')))

plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):
        plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                               c = ListedColormap(('red', 'green'))(i), label = j)










































3 comments:

  1. Though i didnt understand to the core, hope it will be useful for those who stepping into Data science. Awesome efforts , keep going !!

    ReplyDelete