1. Let's start with the conclusion
nn.BCEWithLogitsLoss be equal to nn.BCELoss+nn.Sigmoid.
It is mainly used for binary classification , Multi label classification problem .
The picture shows Pytorch Document about BCEWithLogitsLoss Description of , This loss function combines Sigmoid and BCELoss.
2. Formula decomposition
*
BCEWithLogitsLoss
Suppose there is N individual batch, each batch forecast n Tags , be Loss by :
L o s s = { l 1 , . . , l N } , l n = − [ y n ⋅ log ( σ ( x n ) ) + ( 1
− y n ) ⋅ log ( 1 − σ ( x n ) ) ] Loss = \{ l_1 , ... , l_N \} , \ l_n = - [
y_n \cdot \log ( \sigma { ( x_n ) }) + ( 1 - y_n ) \cdot \log ( 1 - \sigma { (
x_n ) } ) ]Loss={l1,...lN}, ln=−[yn⋅log(σ(xn))+(1−yn)⋅log(1−σ(xn))]
among σ ( x n ) σ(x_n) σ(xn) by Sigmoid function , You can x Map to (0, 1) The interval of :
σ ( x ) = 1 1 + exp ( − x ) \sigma ( x ) = \frac { 1 } { 1 + \exp ( -x ) } σ
(x)=1+exp(−x)1
*
BCELoss
Again, suppose there is N individual batch, each batch forecast n Tags , be Loss by :
L o s s = { l 1 , . . , l N } , l n = − [ y n ⋅ log ( x n ) + ( 1 − y n
) ⋅ log ( 1 − x n ) ] Loss = \{ l_1 , ... , l_N \} , \ l_n = - [ y_n \cdot
\log ( x_n ) + ( 1 - y_n ) \cdot \log ( 1 - x_n ) ]Loss={l1,...lN}, ln=−[yn
⋅log(xn)+(1−yn)⋅log(1−xn)]
Visible and BCEWithLogitsLoss One short σ ( x ) \sigma(x) σ(x) function
3. Experiment code
# Random initialization label value , Two Batch, Each containing 3 Tags label = torch.empty((2, 3)).random_(2) #
Note that this is a multi label problem , Therefore, each sample may correspond to multiple tags at the same time # Within each label is a binary classification problem , Belongs to or does not belong to this label # tensor([[0. 1. 0.], #
[0. 1. 1.]]) # Random initialization x value , Represents the predicted value of the model x = torch.randn((2, 3)) # tensor([[-0.6117,
0.1446, 0.0415], # [-1.5376, -0.2599, -0.9680]]) sigmoid = nn.Sigmoid() x1 =
sigmoid(x) # Normalized to (0, 1) section # tensor([[0.3517, 0.5361, 0.5104], # [0.1769,
0.4354, 0.2753]]) bceloss = nn.BCELoss() bceloss(x1, label) # tensor(0.6812) #
Reuse BCEWithLogitsLoss calculation , Comparison results bce_with_logits_loss = nn.BCEWithLogitsLoss()
bce_with_logits_loss(x, label) # tensor(0.6812)
4.log-sum-exp Numerical stability
When we use BCEWithLogitsLoss Loss function time , Except that compared to BCELoss More convenient outside , But also because of the integration Sigmoid function , To achieve LogSumExp Tips for , achieve
Advantages of numerical stability .
But it's been tested , Simple use Sigmoid+BCELoss It didn't show up either inf \inf inf and − inf -\inf −inf
The overflow of , I hope you can give me some advice .
x = torch.tensor(1e+10) x1 = sigmoid(x) # tensor(1.) label = torch.tensor(1.)
bceloss(x1, label) # tensor(0.) bce_with_logits_loss(x, label) # tensor(0.)
Technology