use nn.Linear() The neural network layer is constructed more simply
The multi category problem before
It's handwritten on each layer
In fact, we can use it nn.Linear, You don't have to write
The first parameter here is in, The second parameter is out, It's in line with our normal thinking habits
With the activation function
If we want to implement a network structure of our own
No need to implement backward(),nn.Module It will be provided automatically ,pytorch Of autograd The package will automatically implement the function of backward derivation
class MLP(nn.Module): def __init__(self): super(MLP,self).__init__()
self.model = nn.Sequential( nn.Linear(784,200), nn.ReLU(inplace=True),
nn.Linear(200,200), nn.ReLU(inplace=True), nn.Linear(200,10),
nn.ReLU(inplace=True), ) def forward(self,x): x = self.model(x) return x
In the training data section
net = MLP() optimizer = optim.SGD(net.parameters(), lr=learning_rate) criteon
= nn.CrossEntropyLoss() for epoch in range(epochs): for batch_idx, (data,
target) in enumerate(train_loader): data = data.view(-1, 28*28) logits =
net(data) loss = criteon(logits, target) optimizer.zero_grad() loss.backward()
# print(w1.grad.norm(), w2.grad.norm()) optimizer.step()
We were in Deep learning and neural network ( Four ) This is what the optimizer wrote in the actual combat
It's a w,b Write a parameter of list
Now our net Inherited from nn.module, Will be able to w,b The parameter of is added automatically nn.parameters inside
Rewrite the previous multi classification problem
import torch import torch.nn as nn import torch.nn.functional as F import
torch.optim as optim from torchvision import datasets, transforms
batch_size=200 learning_rate=0.01 epochs=10 train_loader =
torch.utils.data.DataLoader( datasets.MNIST('dataset', train=True,
download=True, transform=transforms.Compose([ transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,)) ])), batch_size=batch_size,
shuffle=True) test_loader = torch.utils.data.DataLoader(
datasets.MNIST('dataset', train=False, transform=transforms.Compose([
transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)) ])),
batch_size=batch_size, shuffle=True) class MLP(nn.Module): def __init__(self):
super(MLP, self).__init__() self.model = nn.Sequential( nn.Linear(784, 200),
nn.ReLU(inplace=True), nn.Linear(200, 200), nn.ReLU(inplace=True),
nn.Linear(200, 10), nn.ReLU(inplace=True), ) def forward(self, x): x =
self.model(x) return x net = MLP() optimizer = optim.SGD(net.parameters(),
lr=learning_rate) criteon = nn.CrossEntropyLoss() for epoch in range(epochs):
for batch_idx, (data, target) in enumerate(train_loader): data = data.view(-1,
28*28) logits = net(data) loss = criteon(logits, target) optimizer.zero_grad()
loss.backward() # print(w1.grad.norm(), w2.grad.norm()) optimizer.step() if
batch_idx % 100 == 0: print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss:
{:.6f}'.format( epoch, batch_idx * len(data), len(train_loader.dataset), 100. *
batch_idx / len(train_loader), loss.item())) test_loss = 0 correct = 0 for
data, target in test_loader: data = data.view(-1, 28 * 28) logits = net(data)
test_loss += criteon(logits, target).item() pred = logits.data.max(1)[1]
correct += pred.eq(target.data).sum() test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset), 100. * correct /
len(test_loader.dataset)))
train The effect is basically the same
In the previous method, we will have initialization problems , Not here
Here we are w and b The parameter of has been returned to nn.Linear() Managed , Not exposed to us , We can't initialize it directly
Then, when we use its interface, it has its own initialization method , We don't have to worry about it
Layers and structure of fully connected networks
Fully connected networks are also called linear layers
Calculate the number of layers , The input layer is not computed , Output layer to be calculated
So this network has 4 layer
And if you ask how many hidden layers there are 3 Yes
For a certain floor , We generally mean that the weight of this layer and the output of this layer are added together to call a layer
For the second layer, it means this
This network is used to process very simple data sets ——MNIST Of the image dataset
MNIST Each image in the dataset has 28*28 Pixels , So the input is 28*28;MNIST There are altogether 10 class , So the output is 10 Points
That is, the input is 28*28 Matrix of , In order to facilitate the processing of full connection layer , We'll make it even 784 Vector of layers , And then the intermediate nodes are all 256 Points
Let's calculate for such a network , How many parameters are needed
The parameter quantity of neural network is how many lines there are
784*256+256*256+256*256+256*10 = 390K
So the parameter is 390K
And then each parameter uses one 4 A floating-point number of bytes , therefore 390k*4 = 1.6MB
So we need to 1.6M Memory or video memory ( If you use GPU The words of )
That number now looks small
however MNIST It's in 80 It was born in the age , At that time, it was still 386 When
At that time, the processors were probably only tens to hundreds KB
For such a simple network , It doesn't fit into memory
Technology
Daily Recommendation