机器学习入门：用线性回归实现一个房价预测的模型（pytorch实现） - 博客

[{"createTime":1735734952000,"id":1,"img":"hwy_ms_500_252.jpeg","link":"https://activity.huaweicloud.com/cps.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=V1g3MDY4NTY=&utm_medium=cps&utm_campaign=201905","name":"华为云秒杀","status":9,"txt":"华为云38元秒杀","type":1,"updateTime":1735747411000,"userId":3},{"createTime":1736173885000,"id":2,"img":"txy_480_300.png","link":"https://cloud.tencent.com/act/cps/redirect?redirect=1077&cps_key=edb15096bfff75effaaa8c8bb66138bd&from=console","name":"腾讯云秒杀","status":9,"txt":"腾讯云限量秒杀","type":1,"updateTime":1736173885000,"userId":3},{"createTime":1736177492000,"id":3,"img":"aly_251_140.png","link":"https://www.aliyun.com/minisite/goods?userCode=pwp8kmv3","memo":"","name":"阿里云","status":9,"txt":"阿里云2折起","type":1,"updateTime":1736177492000,"userId":3},{"createTime":1735660800000,"id":4,"img":"vultr_560_300.png","link":"https://www.vultr.com/?ref=9603742-8H","name":"Vultr","status":9,"txt":"Vultr送$100","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":5,"img":"jdy_663_320.jpg","link":"https://3.cn/2ay1-e5t","name":"京东云","status":9,"txt":"京东云特惠专区","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":6,"img":"new_ads.png","link":"https://www.iodraw.com/ads","name":"发布广告","status":9,"txt":"发布广告","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":7,"img":"yun_910_50.png","link":"https://activity.huaweicloud.com/discount_area_v5/index.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=aXhpYW95YW5nOA===&utm_medium=cps&utm_campaign=201905","name":"底部","status":9,"txt":"高性能云服务器2折起","type":2,"updateTime":1735660800000,"userId":3}]

import torch # 导入库 torch.cuda.is_available()
为了解释线性回归，我们举一个实际的例子：我们希望根据房屋的面积（平方英尺）和房龄（年）来估算房屋价格（美元）

目标（房屋价格）可以表示为特征（面积和房龄）的加权和，如下式子：

p r i c e = w a r e a × a r e a + w a g e × a g e + b price = w_{area} \times
area + w_{age} \times age + bprice=warea×area+wage×age+b

w a r e a w_{area} warea和 w a g e w_{age} wage
称为权重（weight），权重决定了每个特征对我们预测值的影响。

b 称为偏置（bias）、偏移量（offset）或截距（intercept）。偏置是指当所有特征都取值为0时，预测值应该为多少。
即使现实中不会有任何房子的面积是0或房龄正好是0年，我们仍然需要偏置项。如果没有偏置项，我们模型的表达能力将受到限制。

<>Gradient descent summary

So far in this course, you have developed a linear model that predicts f w ,
b ( x ( i ) ) f_{w,b}(x^{(i)})fw,b(x(i)):
f w , b ( x ( i ) ) = w x ( i ) + b (1) f_{w,b}(x^{(i)}) = wx^{(i)} + b
\tag{1}fw,b(x(i))=wx(i)+b(1)
In linear regression, you utilize input training data to fit the parameters
w ww, b b b by minimizing a measure of the error between our predictions f w ,
b ( x ( i ) ) f_{w,b}(x^{(i)})fw,b(x(i)) and the actual data y ( i ) y^{(i)} y
(i). The measure is called the c o s t cost cost, J ( w , b ) J(w,b) J(w,b).
In training you measure the cost over all of our training samples x ( i ) , y (
i ) x^{(i)},y^{(i)}x(i),y(i)
J ( w , b ) = 1 2 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) ) 2 (2)
J(w,b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) -
y^{(i)})^2\tag{2}J(w,b)=2m1i=0∑m−1(fw,b(x(i))−y(i))2(2)

In lecture, gradient descent was described as:

repeat until convergence: { w = w − α ∂ J ( w , b ) ∂ w b = b − α ∂
J ( w , b ) ∂ b } \begin{align*} \text{repeat}&\text{ until convergence:} \;
\lbrace \newline \; w &= w - \alpha \frac{\partial J(w,b)}{\partial w} \tag{3}
\; \newline b &= b - \alpha \frac{\partial J(w,b)}{\partial b} \newline \rbrace
\end{align*}repeatwb} until convergence:{=w−α∂w∂J(w,b)=b−α∂b∂J(w,b)(3)
where, parameters w w w, b b b are updated simultaneously.
The gradient is defined as:
∂ J ( w , b ) ∂ w = 1 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) ) x ( i
) ∂ J ( w , b ) ∂ b = 1 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) )
\begin{align} \frac{\partial J(w,b)}{\partial w} &= \frac{1}{m} \sum\limits_{i
= 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})x^{(i)} \tag{4}\\ \frac{\partial
J(w,b)}{\partial b} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)})
- y^{(i)}) \tag{5}\\ \end{align}∂w∂J(w,b)∂b∂J(w,b)=m1i=0∑m−1(fw,b(x(i))−y(
i))x(i)=m1i=0∑m−1(fw,b(x(i))−y(i))(4)(5)

**Here simultaniously means that you calculate the partial derivatives for all
the parameters before updating any of the parameters. **

In lecture, gradient descent was described as:

repeat until convergence: { w = w − α ∂ J ( w , b ) ∂ w b = b − α ∂
J ( w , b ) ∂ b } \begin{align*} \text{repeat}&\text{ until convergence:} \;
\lbrace \newline \; w &= w - \alpha \frac{\partial J(w,b)}{\partial w} \tag{3}
\; \newline b &= b - \alpha \frac{\partial J(w,b)}{\partial b} \newline \rbrace
\end{align*}repeatwb} until convergence:{=w−α∂w∂J(w,b)=b−α∂b∂J(w,b)(3)
where, parameters w w w, b b b are updated simultaneously.
The gradient is defined as:
∂ J ( w , b ) ∂ w = 1 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) ) x ( i
) ∂ J ( w , b ) ∂ b = 1 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) )
\begin{align} \frac{\partial J(w,b)}{\partial w} &= \frac{1}{m} \sum\limits_{i
= 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})x^{(i)} \tag{4}\\ \frac{\partial
J(w,b)}{\partial b} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)})
- y^{(i)}) \tag{5}\\ \end{align}∂w∂J(w,b)∂b∂J(w,b)=m1i=0∑m−1(fw,b(x(i))−y(
i))x(i)=m1i=0∑m−1(fw,b(x(i))−y(i))(4)(5)

**Here simultaniously means that you calculate the partial derivatives for all
the parameters before updating any of the parameters. **

首先我们自己生成一些数据，这些数据也许跟现实没有任何联系，但是可以用它们指代一个线性关系：即房屋面积以及房龄两个自变量与房价（因变量）之间存在的某种关系。
# 生成数据（使用线性模型参数 w = [ 2 , − 3.4 ] ⊤ b = 4.2 和噪声项 ϵ 生成数据集及其标签）： # y = X w + b +
ϵ # w, b, 样本数 def synthetic_data(w, b, num_examples): # X 为矩阵，表示生成样本数*len(w)
的（0，1）正态分布的矩阵。 # len(w) 表示特征数 X = torch.normal(0, 1, (num_examples, len(w))) #
X matrix (1000,2) (1000个样例，2个输入特征：面积、房龄) # w vector （2） y = torch.matmul(X, w) +
b# y vector （1000） # y = Xw + b + 噪声，噪声呈现标准差为 0.01 的正态分布。 y += torch.normal(0,
0.01, y.shape) # (-1, 1) 代表作为列向量返回（指定列数为1，行数自动确定。） #
在torch中，如果要区分行向量与列向量的区别，则必须用矩阵表示，对于计算机来说，单纯的一列或一行都是一个一维数组 return X, y.reshape((-
1, 1)) # 预先设置好真实的参数w，b（现实中我们可能是不知道这个参数的，需要从大量数据中发现这个规律，从而可以预测新的数据） true_w =
torch.tensor([2, -3.4]) true_b = 4.2 features, labels = synthetic_data(true_w,
true_b, 1000) print(f"features:{features[:3]},\n labels:{labels[:3]}")
查看前三个样本，其中第三个样本的含义就是，当房屋面积为1.3822、房龄为0.9644时，该房屋的房价为3.6739.
features:tensor([[-0.3011, -0.6250],
[ 0.5002, -0.3313],
[ 1.3822, 0.9644]]),
labels:tensor([[5.7323],
[6.3256],
[3.6739]])

注意，上述数据都是伪造的，包括真实的权重参数w和b，接下来我们的任务就是：假设我们不知道真实的权重参数，根据这些大量的样本，如何推断出权重参数（或者尽可能地逼近真实值），这就是线性回归的任务。

首先我们将w，b参数初始化为0
# 随机初始化参数，并计算梯度。 # 划重点：需要 w，b 进行更新，所以才将 requires_grad设置为 True w = torch.tensor
([0., 0.], requires_grad=True) b = torch.tensor([0.], requires_grad=True) w,b
定义平方损失函数
# 计算样本总的损失 def calc_cost(X,w,b,labels,m): # 批量梯度下降，求取全部样本的损失相加 prices = torch.
matmul(X,w)+b cost = 0 # 对标上面的公式（2） for i in range(m): cost += (prices[i] -
labels[i])**2 total_cost = 1 / (2 * m) * cost return total_cost
定义批量梯度下降算法
# 批量梯度下降算法，参数，学习率 def sgd(params, lr): # 更新时不参与梯度计算（simultaneously
update，同时对w和b求偏导，然后再更新参数w，b）（对应公式（3）） with torch.no_grad(): #
当requires_grad设置为False时,反向传播时就不会自动求导了，因此大大节约了显存或者说内存 for param in params: param
-= lr * param.grad print(f"参数更新为：{param}") param.grad.zero_() #
每一个批次都要对每个参数的梯度进行清零（这里的一个批次就是全部样本）
训练过程如下，可以自行更改迭代次数和学习率。
# 训练 # 学习率 lr = 0.008 # 迭代次数 num_epochs = 600 loss = calc_cost for epoch in
range(num_epochs): total_cost = calc_cost(features,w,b,labels,len(labels)) #
批量总损失（对标公式（2）） print(f"第{epoch+1}轮迭代的总损失：{total_cost}") total_cost.backward() #
自动求导（对标公式（4）（5）） print(f"梯度分别为：{w.grad},{b.grad}") # 临时偏导数（梯度） sgd([w,b],lr) #
根据梯度，更新参数（对应公式（3）） print(f"面积权重、房龄权重的真实参数为：{true_w}\n偏置的真实参数为：{true_b}\n") 第597
轮迭代的总损失：tensor([0.0014], grad_fn=<MulBackward0>) 梯度分别为：tensor([-0.0192, 0.0320])
,tensor([-0.0352]) 参数更新为：tensor([ 1.9793, -3.3672], requires_grad=True)
参数更新为：tensor([4.1645], requires_grad=True) 面积权重、房龄权重的真实参数为：tensor([ 2.0000, -
3.4000]) 偏置的真实参数为：4.2 第598轮迭代的总损失：tensor([0.0014], grad_fn=<MulBackward0>)
梯度分别为：tensor([-0.0190, 0.0318]),tensor([-0.0350]) 参数更新为：tensor([ 1.9795, -3.3674
], requires_grad=True) 参数更新为：tensor([4.1648], requires_grad=True)
面积权重、房龄权重的真实参数为：tensor([ 2.0000, -3.4000]) 偏置的真实参数为：4.2 第599轮迭代的总损失：tensor([
0.0013], grad_fn=<MulBackward0>) 梯度分别为：tensor([-0.0189, 0.0315]),tensor([-0.0347
]) 参数更新为：tensor([ 1.9796, -3.3677], requires_grad=True) 参数更新为：tensor([4.1651],
requires_grad=True) 面积权重、房龄权重的真实参数为：tensor([ 2.0000, -3.4000]) 偏置的真实参数为：4.2 第600
轮迭代的总损失：tensor([0.0013], grad_fn=<MulBackward0>) 梯度分别为：tensor([-0.0188, 0.0313])
,tensor([-0.0344]) 参数更新为：tensor([ 1.9798, -3.3679], requires_grad=True)
参数更新为：tensor([4.1653], requires_grad=True) 面积权重、房龄权重的真实参数为：tensor([ 2.0000, -
3.4000]) 偏置的真实参数为：4.2
可以看到，在训练的过程中，w，b参数在不断地根据梯度下降的方向进行调整，损失也随之不断减小，最终损失下降到接近0，w，b参数也调整至接近真实参数。

技术

Java1212 篇
Python927 篇
开发语言608 篇
c语言463 篇
算法461 篇
MySQL438 篇
数据库394 篇
前端387 篇
更多...