logistic_regression_with_gradient_descent

利用梯度下降构造一个简单实用的分类器。

# 序

通过不同函数实现计算损失、计算梯度及梯度下降的功能，每一步会更明确、简洁，同时逻辑也更加清晰。本贴基于上述方式记录利用梯度下降实现 logistic regression 的过程。

# import modules

import numpy
import pandas

# sigmoid function

后续计算损失、梯度时会经常使用该激活函数，因此不妨构造该函数以便后续调用。激活函数的公式如下：

logistic regression 基于该激活函数实现，其模型可以表示为：

代码实现如下所示。需要注意的是，参数 z 需要传入 wx+b 。

def sigmoid(z):
    return 1 / (1 + numpy.exp(-z))

# compute cost

构造一个能便捷地得到每次迭代损失的函数，有助于记录每次迭代的损失，便于调试与出图。损失由一下公式计算：

代码实现如下：

def compute_cost(X, y, w_in, b_in):
    """
    compute the cost over all examples
    :param X: (ndarray Shape(m, n)) dataset, m samples by n features
    :param y: (array_like Shape(m,)) target values for all samples
    :param w_in: (array_like Shape(n,)) values of parameters of the model
    :param b_in: scalar Values of bias parameter of the model
    :return: the loss
    """
    m, n = X.shape
    cost = 0
    for i in range(m):
        cost += (y[i] * numpy.log(sigmoid(numpy.dot(w_in, X[i]) + b_in)) + (1 - y[i]) * numpy.log(1 - sigmoid(numpy.dot(w_in, X[i]) + b_in)))
    return (-1 / m) * cost

# compute gradient

在梯度下降过程中经历数次迭代，其中每次迭代需重新计算梯度。不妨构造一枚计算梯度的函数以便调用并简化梯度下降的代码。

其中，计算梯度的数学公式如下：

代码实现如下：

def compute_gradient(X, y, w_in, b_in):
    """
    compute gradient for each iteration
    :param X: (ndarray Shape(m, n)) dataset, m samples by n features
    :param y: (array_like Shape(m,)) target values for all samples
    :param w_in: (array_like Shape(n,)) values of parameters of the model
    :param b_in: scalar Values of bias parameter of the model
    :return: the gradient dj_dw and dj_db
    """
    m, n = X.shape
    dj_dw = numpy.zeros(n)
    dj_db = 0
    for j in range(n):
        for i in range(m):
            dj_dw[j] += (sigmoid(numpy.dot(w_in, X[i]) + b_in) - y[i]) * X[i][j]
    dj_dw /= m
    for i in range(m):
        dj_db += (sigmoid(numpy.dot(w_in, X[i]) + b_in) - y[i])
    dj_db /= m
    return dj_dw, dj_db

# gradient descent

传入训练数据、模型初始参数、学习率及迭代次数并调用损失计算函数及激活函数进行梯度下降。其中，梯度下降同步更新数学公式如下：

代码实现如下：

def gradient_descent(X, y, w_in, b_in, alpha, iters):
    """
    run gradient descent
    :param X: (ndarray Shape(m, n)) dataset, m samples by n features
    :param y: (array_like Shape(m,)) target values for all samples
    :param w_in: (array_like Shape(n,)) values of parameters of the model
    :param b_in: scalar Values of bias parameter of the model
    :param alpha: learning rate α
    :param iters: iterations for gradient descent
    :return:
    """
    m_in, n_in = X.shape
    for inum in range(iters):
        dj_dw, dj_db = compute_gradient(X, y, w_in, b_in)
        for j in range(n_in):
            w_in[j] = w_in[j] - alpha * dj_dw[j]
        b_in = b_in - alpha * dj_db
    loss_in = compute_cost(X, y, w_in, b_in)
    return w_in, b_in, loss_in

# predict

分类准确率是一个能直观评价模型优劣的指标之一。将学习得到的参数赋予 logistic regression 并计算出预测值，然后与原始的输出变量比较得到准确率。对该分类算法来说，当激活函数输出值大于等于0.5时预测该样本属于正例，否则属于负例。该函数仅对目标变量进行预测，并不是最终的准确率。想要得到准确率，还需将预测结果与真实结果对比。

预测目标变量的函数的代码实现如下：

def predict(X_pred, w_pred, b_pred):
    """
    make predictions with learned w and b
    :param X_pred: data set with m samples and n features
    :param w_pred: values of parameters of the model
    :param b_pred: scalar value of bias parameter of the model
    :return:
    """
    predictions = sigmoid(numpy.dot(X_pred, w_pred) + b_pred)
    p = [1 if item >= 0.5 else 0 for item in predictions]
    return numpy.array(p)

需要注意的是，返回的为 ndarray 。若与 ndarray 进行值的比较，可能需要额外的维度变换（reshape）。

之前输出的 list 与 ndarray 对比总是出问题。

# main

在此导入数据，并定义数据集和标记的列表，然后结合上述函数实现一个基于梯度下降的 logistic regression 。

# 导入数据到一个DataFrame
data = pandas.read_csv('data/data.txt', header=None, names=['x1', 'x2', 'target'])
# 获取列数（包括输出变量）
colNum = data.shape[1]
# 定义数据集（不含输出变量）
X_train = data.iloc[:, :colNum - 1]
X_train = numpy.array(X_train.values)
# 定义输出空间
y_train = data.iloc[:, colNum - 1: colNum]
y_train = numpy.array(y_train.values)
# 获取样本量和特征数
m, n = X_train.shape
numpy.random.seed(1)
w_init = numpy.zeros(n)
b_init = 0
w, b, loss = gradient_descent(X_train, y_train, w_init, b_init, alpha=0.001, iters=10000)
print(f'w = {w}\nb = {b}\nloss = {loss}')
# 准确率
pred = predict(X_train, w, b)
print('Train Accuracy: %f'%(numpy.mean(pred == y_train.reshape(-1)) * 100))

# 补充

绘图和特征工程将在 regularized_logistic_regression 一贴中展示，因为该实例仅有两个不含高次项的特征，决策边界仅为一条直线。

完整代码

GitHub