Pytorch Analysis of loss function BCELoss And BCEWithLogitsLoss - Blog

[{"createTime":1735734952000,"id":1,"img":"hwy_ms_500_252.jpeg","link":"https://activity.huaweicloud.com/cps.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=V1g3MDY4NTY=&utm_medium=cps&utm_campaign=201905","name":"华为云秒杀","status":9,"txt":"华为云38元秒杀","type":1,"updateTime":1735747411000,"userId":3},{"createTime":1736173885000,"id":2,"img":"txy_480_300.png","link":"https://cloud.tencent.com/act/cps/redirect?redirect=1077&cps_key=edb15096bfff75effaaa8c8bb66138bd&from=console","name":"腾讯云秒杀","status":9,"txt":"腾讯云限量秒杀","type":1,"updateTime":1736173885000,"userId":3},{"createTime":1736177492000,"id":3,"img":"aly_251_140.png","link":"https://www.aliyun.com/minisite/goods?userCode=pwp8kmv3","memo":"","name":"阿里云","status":9,"txt":"阿里云2折起","type":1,"updateTime":1736177492000,"userId":3},{"createTime":1735660800000,"id":4,"img":"vultr_560_300.png","link":"https://www.vultr.com/?ref=9603742-8H","name":"Vultr","status":9,"txt":"Vultr送$100","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":5,"img":"jdy_663_320.jpg","link":"https://3.cn/2ay1-e5t","name":"京东云","status":9,"txt":"京东云特惠专区","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":6,"img":"new_ads.png","link":"https://www.iodraw.com/ads","name":"发布广告","status":9,"txt":"发布广告","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":7,"img":"yun_910_50.png","link":"https://activity.huaweicloud.com/discount_area_v5/index.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=aXhpYW95YW5nOA===&utm_medium=cps&utm_campaign=201905","name":"底部","status":9,"txt":"高性能云服务器2折起","type":2,"updateTime":1735660800000,"userId":3}]

1. Let's start with the conclusion

nn.BCEWithLogitsLoss be equal to nn.BCELoss+nn.Sigmoid.
It is mainly used for binary classification , Multi label classification problem .

The picture shows Pytorch Document about BCEWithLogitsLoss Description of , This loss function combines Sigmoid and BCELoss.

2. Formula decomposition

*
BCEWithLogitsLoss
Suppose there is N individual batch, each batch forecast n Tags , be Loss by ：
L o s s = { l 1 , . . , l N } , l n = − [ y n ⋅ log ⁡ ( σ ( x n ) ) + ( 1
− y n ) ⋅ log ⁡ ( 1 − σ ( x n ) ) ] Loss = \{ l_1 , ... , l_N \} , \ l_n = - [
y_n \cdot \log ( \sigma { ( x_n ) }) + ( 1 - y_n ) \cdot \log ( 1 - \sigma { (
x_n ) } ) ]Loss={l1,...lN}, ln=−[yn⋅log(σ(xn))+(1−yn)⋅log(1−σ(xn))]
among σ ( x n ) σ(x_n) σ(xn) by Sigmoid function , You can x Map to (0, 1) The interval of ：
σ ( x ) = 1 1 + exp ⁡ ( − x ) \sigma ( x ) = \frac { 1 } { 1 + \exp ( -x ) } σ
(x)=1+exp(−x)1

*
BCELoss
Again, suppose there is N individual batch, each batch forecast n Tags , be Loss by ：
L o s s = { l 1 , . . , l N } , l n = − [ y n ⋅ log ⁡ ( x n ) + ( 1 − y n
) ⋅ log ⁡ ( 1 − x n ) ] Loss = \{ l_1 , ... , l_N \} , \ l_n = - [ y_n \cdot
\log ( x_n ) + ( 1 - y_n ) \cdot \log ( 1 - x_n ) ]Loss={l1,...lN}, ln=−[yn
⋅log(xn)+(1−yn)⋅log(1−xn)]
Visible and BCEWithLogitsLoss One short σ ( x ) \sigma(x) σ(x) function

3. Experiment code
# Random initialization label value , Two Batch, Each containing 3 Tags label = torch.empty((2, 3)).random_(2) #
Note that this is a multi label problem , Therefore, each sample may correspond to multiple tags at the same time # Within each label is a binary classification problem , Belongs to or does not belong to this label # tensor([[0. 1. 0.], #
[0. 1. 1.]]) # Random initialization x value , Represents the predicted value of the model x = torch.randn((2, 3)) # tensor([[-0.6117,
0.1446, 0.0415], # [-1.5376, -0.2599, -0.9680]]) sigmoid = nn.Sigmoid() x1 =
sigmoid(x) # Normalized to (0, 1) section # tensor([[0.3517, 0.5361, 0.5104], # [0.1769,
0.4354, 0.2753]]) bceloss = nn.BCELoss() bceloss(x1, label) # tensor(0.6812) #
Reuse BCEWithLogitsLoss calculation , Comparison results bce_with_logits_loss = nn.BCEWithLogitsLoss()
bce_with_logits_loss(x, label) # tensor(0.6812)
4.log-sum-exp Numerical stability
When we use BCEWithLogitsLoss Loss function time , Except that compared to BCELoss More convenient outside , But also because of the integration Sigmoid function , To achieve LogSumExp Tips for , achieve
Advantages of numerical stability .

But it's been tested , Simple use Sigmoid+BCELoss It didn't show up either inf ⁡ \inf inf and − inf ⁡ -\inf −inf
The overflow of , I hope you can give me some advice .
x = torch.tensor(1e+10) x1 = sigmoid(x) # tensor(1.) label = torch.tensor(1.)
bceloss(x1, label) # tensor(0.) bce_with_logits_loss(x, label) # tensor(0.)

Technology

Java296 blogs
Python265 blogs
Vue125 blogs
C Language122 blogs
Algorithm108 blogs
MySQL96 blogs
Flow Chart85 blogs
JavaScript79 blogs
More...