1. Let's start with the conclusion

nn.BCEWithLogitsLoss be equal to nn.BCELoss+nn.Sigmoid.
It is mainly used for binary classification , Multi label classification problem .

The picture shows Pytorch Document about BCEWithLogitsLoss Description of , This loss function combines Sigmoid and BCELoss.

2. Formula decomposition

*
BCEWithLogitsLoss
Suppose there is N individual batch, each batch forecast n Tags , be Loss by :
L o s s = { l 1 , . . , l N } ,   l n = − [ y n ⋅ log ⁡ ( σ ( x n ) ) + ( 1
− y n ) ⋅ log ⁡ ( 1 − σ ( x n ) ) ] Loss = \{ l_1 , ... , l_N \} , \ l_n = - [
y_n \cdot \log ( \sigma { ( x_n ) }) + ( 1 - y_n ) \cdot \log ( 1 - \sigma { (
x_n ) } ) ]Loss={l1​,...lN​}, ln​=−[yn​⋅log(σ(xn​))+(1−yn​)⋅log(1−σ(xn​))]
among σ ( x n ) σ(x_n) σ(xn​) by Sigmoid function , You can x Map to (0, 1) The interval of :
σ ( x ) = 1 1 + exp ⁡ ( − x ) \sigma ( x ) = \frac { 1 } { 1 + \exp ( -x ) } σ
(x)=1+exp(−x)1​

*
BCELoss
Again, suppose there is N individual batch, each batch forecast n Tags , be Loss by :
L o s s = { l 1 , . . , l N } ,   l n = − [ y n ⋅ log ⁡ ( x n ) + ( 1 − y n
) ⋅ log ⁡ ( 1 − x n ) ] Loss = \{ l_1 , ... , l_N \} , \ l_n = - [ y_n \cdot
\log ( x_n ) + ( 1 - y_n ) \cdot \log ( 1 - x_n ) ]Loss={l1​,...lN​}, ln​=−[yn​
⋅log(xn​)+(1−yn​)⋅log(1−xn​)]
Visible and BCEWithLogitsLoss One short σ ( x ) \sigma(x) σ(x) function

3. Experiment code
# Random initialization label value , Two Batch, Each containing 3 Tags label = torch.empty((2, 3)).random_(2) #
Note that this is a multi label problem , Therefore, each sample may correspond to multiple tags at the same time # Within each label is a binary classification problem , Belongs to or does not belong to this label # tensor([[0. 1. 0.], #
[0. 1. 1.]]) # Random initialization x value , Represents the predicted value of the model x = torch.randn((2, 3)) # tensor([[-0.6117,
0.1446, 0.0415], # [-1.5376, -0.2599, -0.9680]]) sigmoid = nn.Sigmoid() x1 =
sigmoid(x) # Normalized to (0, 1) section # tensor([[0.3517, 0.5361, 0.5104], # [0.1769,
0.4354, 0.2753]]) bceloss = nn.BCELoss() bceloss(x1, label) # tensor(0.6812) #
Reuse BCEWithLogitsLoss calculation , Comparison results bce_with_logits_loss = nn.BCEWithLogitsLoss()
bce_with_logits_loss(x, label) # tensor(0.6812)
4.log-sum-exp Numerical stability
When we use BCEWithLogitsLoss Loss function time , Except that compared to BCELoss More convenient outside , But also because of the integration Sigmoid function , To achieve LogSumExp Tips for , achieve
Advantages of numerical stability .

But it's been tested , Simple use Sigmoid+BCELoss It didn't show up either inf ⁡ \inf inf and − inf ⁡ -\inf −inf
The overflow of , I hope you can give me some advice .
x = torch.tensor(1e+10) x1 = sigmoid(x) # tensor(1.) label = torch.tensor(1.)
bceloss(x1, label) # tensor(0.) bce_with_logits_loss(x, label) # tensor(0.)

Technology