当前位置:网站首页>Wu Enda logistic regression 2
Wu Enda logistic regression 2
2022-07-25 17:07:00 【starmultiple】
Regularized logistic regression
In this part of the exercise , You will implement regularized logistic regression
Predict whether the microchip from the manufacturer passes the quality assurance
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
1. Data visualization
plotData Used to generate a
Pictured Shown , Where the axis is the two test scores , Positive (y = 1, Accept ) And negation (y = 0, Refuse ) Examples are shown as different tags .
path = 'ex2data2.txt'
df = pd.read_csv(path, header=None, names=['Microchip Test1', 'Microchip Test2', 'Accepted'])
df.head()
df.describe()
pos = df[df['Accepted'].isin([1])]
neg = df[df['Accepted'].isin([0])]
fig, ax = plt.subplots(figsize=(12, 8))
ax.scatter(pos['Microchip Test1'], pos['Microchip Test2'], s=50, c='black', marker='+', label='Accepted')
ax.scatter(neg['Microchip Test1'], neg['Microchip Test2'], s=50, c='y', marker='o', label='Rejected')
ax.legend()
ax.set_xlabel('Test1 Score')
ax.set_ylabel('Test2 Score')
plt.show()

Feature mapping
One way to better fit data is to create more features from each data point . In the function provided mapFeature.m in , We map features to x 1 and x 2 All polynomial terms of , Until the sixth power .
def feature_mapping(x, y, power, as_ndarray=False):
data = {
'f{0}{1}'.format(i-p, p): np.power(x, i-p) * np.power(y, p)
for i in range(0, power+1)
for p in range(0, i+1)
}
if as_ndarray:
return pd.DataFrame(data).values
else:
return pd.DataFrame(data)
x1 = df.Test1.values
x2 = df.Test2.values
Y = df.Accepted
data = feature_mapping(x1, x2, power=6)
# data = data.sort_index(axis=1, ascending=True)
data.head()
data.describe()
3、 ... and
Cost function and gradient . Now you will implement the code to calculate the cost function and gradient
Regularized logistic regression . complete costFunctionReg.m The code in
Return cost and gradient .
Think about it , The regularization cost function in logistic regression is 
theta = np.zeros(data.shape[1])
X = feature_mapping(x1, x2, power=6, as_ndarray=True)
X.shape, Y.shape, theta.shape
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def cost(theta, X, Y):
first = Y * np.log(sigmoid([email protected].T))
second = (1 - Y) * np.log(1 - sigmoid([email protected].T))
return -1 * np.mean(first + second)
def regularized_cost(theta, X, Y, l=1):
theta_1n = theta[1:]
regularized_term = l / (2 * len(X)) * np.power(theta_1n, 2).sum()
return cost(theta, X, Y) + regularized_term
cost(theta, X, Y)
regularized_cost(theta, X, Y, l=1)
def gradient(theta, X, Y):
return (1/len(X) * X.T @ (sigmoid(X @ theta.T) - Y))
def regularized_gradient(theta, X, Y, l=1):
theta_1n = theta[1:]
regularized_theta = l / len(X) * theta_1n
# regularized_theta[0] = 0
regularized_term = np.concatenate([np.array([0]), regularized_theta])
return gradient(theta, X, Y) + regularized_term
# return gradient(theta, X, Y) + regularized_theta
gradient(theta, X, Y)
regularized_gradient(theta, X, Y)
import scipy.optimize as opt
res = opt.minimize(fun=regularized_cost, x0=theta, args=(X, Y), method='Newton-CG', jac=regularized_gradient)
res
def predict(theta, X):
probability = sigmoid(X @ theta.T)
return probability >= 0.5
return [1 if x>=0.5 else 0 for x in probability]
from sklearn.metrics import classification_report
Y_pred = predict(res.x, X)
print(classification_report(Y, Y_pred))
# obtain theta
def find_theta(power, l):
''' power: int raise x1, x2 to polynomial power l: int lambda constant for regularization term '''
path = 'ex2data2.txt'
df = pd.read_csv(path, header=None, names=['Test1', 'Test2', 'Accepted'])
df.head()
Y = df.Accepted
x1 = df.Test1.values
x2 = df.Test2.values
X = feature_mapping(x1, x2, power, as_ndarray=True)
theta = np.zeros(X.shape[1])
# res = opt.minimize(fun=regularized_cost, x0=theta, args=(X, Y, l), method='Newton-CG', jac=regularized_gradient)
res = opt.minimize(fun=regularized_cost, x0=theta, args=(X, Y, l), method='TNC', jac=regularized_gradient)
return res.x
# Decision boundaries ,thetaX = 0, thetaX <= threshhold
def find_decision_boundary(density, power, theta, threshhold):
t1 = np.linspace(-1, 1.2, density)
t2 = np.linspace(-1, 1.2, density)
cordinates = [(x, y) for x in t1 for y in t2]
x_cord, y_cord = zip(*cordinates)
mapped_cord = feature_mapping(x_cord, y_cord, power)
pred = mapped_cord.values @ theta.T
decision = mapped_cord[np.abs(pred) <= threshhold]
return decision.f10, decision.f01
# Draw decision boundaries
def draw_boundary(power, l):
density = 1000
threshhold = 2 * 10 ** -3
theta = find_theta(power, l)
x, y = find_decision_boundary(density, power, theta, threshhold)
pos = df[df['Accepted'].isin([1])]
neg = df[df['Accepted'].isin([0])]
fig, ax = plt.subplots(figsize=(12, 8))
ax.scatter(pos['Test1'], pos['Test2'], s=50, c='black', marker='+', label='y=1')
ax.scatter(neg['Test1'], neg['Test2'], s=50, c='y', marker='o', label='y=0')
ax.scatter(x, y, s=50, c='g', marker='.', label='Decision Boundary')
ax.legend()
ax.set_xlabel('Test1 Score')
ax.set_ylabel('Test2 Score')
plt.show()
draw_boundary(6, l=1)

边栏推荐
- Rosen's QT journey 99 QML table control tableview
- 152. 乘积最大子数组
- After 20 years of agitation, the chip production capacity has started from zero to surpass that of the United States, which is another great achievement made in China
- Rainbond插件扩展:基于Mysql-Exporter监控Mysql
- jenkins的Role-based Authorization Strategy安装配置
- 月薪1万在中国是什么水平?答案揭露残酷的收入真相
- Chapter III data types and variables
- Replicate swin on Huawei ascend910_ transformer
- Step by step introduction of sqlsugar based development framework (13) -- package the upload component based on elementplus, which is convenient for the project
- EasyUI DataGrid control uses
猜你喜欢

Postdoctoral recruitment | West Lake University Machine Intelligence Laboratory recruitment postdoctoral / Assistant Researcher / scientific research assistant

The gas is exhausted! After 23 years of operation, the former "largest e-commerce website in China" has become yellow...

【南京航空航天大学】考研初试复试资料分享

Fudan University emba2022 graduation season - graduation does not forget the original intention and glory to embark on the journey again

失意的互联网人拼命叩开Web3大门

3D语义分割——PVD

MySQL linked table query, common functions, aggregate functions

气数已尽!运营 23 年,昔日“国内第一大电商网站”黄了。。。
![[redis] redis installation](/img/4a/750a0b8ca72ec957987fc34e55992f.png)
[redis] redis installation

3D semantic segmentation - PVD
随机推荐
[target detection] yolov5 Runtong voc2007 dataset (repair version)
ReBudget:通过运行时重新分配预算的方法,在基于市场的多核资源分配中权衡效率与公平性
Talk about how to use redis to realize distributed locks?
Rebudget: balance efficiency and fairness in market-based multi-core resource allocation by reallocating the budget at run time
How to deploy applications on IPFs using 4everland cli
7.依赖注入
Data analysis and privacy security become the key factors for the success or failure of Web3.0. How do enterprises layout?
ACL 2022 | comparative learning based on optimal transmission to achieve interpretable semantic text similarity
Frustrated Internet people desperately knock on the door of Web3
IAAs infrastructure cloud cloud network
From digitalization to intelligent operation and maintenance: what are the values and challenges?
Is the online account opening of Founder futures reliable and safe?
【南京航空航天大学】考研初试复试资料分享
What is the monthly salary of 10000 in China? The answer reveals the cruel truth of income
Dynamic planning topic record
Rosen's QT journey 99 QML table control tableview
[knowledge atlas] practice -- Practice of question and answer system based on medical knowledge atlas (Part5 end): information retrieval and result assembly
C#入门基础教程
超越 ConvNeXt、RepLKNet | 看 51×51 卷积核如何破万卷!
[target detection] yolov5 Runtong visdrone data set