[{"createTime":1735734952000,"id":1,"img":"hwy_ms_500_252.jpeg","link":"https://activity.huaweicloud.com/cps.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=V1g3MDY4NTY=&utm_medium=cps&utm_campaign=201905","name":"华为云秒杀","status":9,"txt":"华为云38元秒杀","type":1,"updateTime":1735747411000,"userId":3},{"createTime":1736173885000,"id":2,"img":"txy_480_300.png","link":"https://cloud.tencent.com/act/cps/redirect?redirect=1077&cps_key=edb15096bfff75effaaa8c8bb66138bd&from=console","name":"腾讯云秒杀","status":9,"txt":"腾讯云限量秒杀","type":1,"updateTime":1736173885000,"userId":3},{"createTime":1736177492000,"id":3,"img":"aly_251_140.png","link":"https://www.aliyun.com/minisite/goods?userCode=pwp8kmv3","memo":"","name":"阿里云","status":9,"txt":"阿里云2折起","type":1,"updateTime":1736177492000,"userId":3},{"createTime":1735660800000,"id":4,"img":"vultr_560_300.png","link":"https://www.vultr.com/?ref=9603742-8H","name":"Vultr","status":9,"txt":"Vultr送$100","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":5,"img":"jdy_663_320.jpg","link":"https://3.cn/2ay1-e5t","name":"京东云","status":9,"txt":"京东云特惠专区","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":6,"img":"new_ads.png","link":"https://www.iodraw.com/ads","name":"发布广告","status":9,"txt":"发布广告","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":7,"img":"yun_910_50.png","link":"https://activity.huaweicloud.com/discount_area_v5/index.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=aXhpYW95YW5nOA===&utm_medium=cps&utm_campaign=201905","name":"底部","status":9,"txt":"高性能云服务器2折起","type":2,"updateTime":1735660800000,"userId":3}]
示例:
from torch.utils.data import DataLoader train_loader =
DataLoader(dataset=train_data, batch_size=batch, shuffle=True, num_worker=4)
valid_loader = DataLoader(dataset=valid_data, batch_size=batch, num_worker=4)
1、num_workers是加载数据(batch)的线程数目
num_workers通过影响数据加载速度,从而影响训练速度
。每轮dataloader加载数据时:dataloader一次性创建num_worker个worker,worker就是普通的工作进程,并用batch_sampler将指定batch分配给指定worker,worker将它负责的batch加载进RAM。然后,dataloader从RAM中找本轮迭代要用的batch,如果找到了,就使用。如果没找到,就要num_worker个worker继续加载batch到内存,直到dataloader在RAM中找到目标batch。
num_worker个worker --> RAM --> dataloader
num_worker设置得大,好处是寻batch速度快,
因为下一轮迭代的batch很可能在上一轮迭代时已经加载好了。坏处是内存开销大,加重了CPU负担。num_workers的设置值一般是自己电脑的CPU核心数,如果CPU很强、RAM也很充足,就可以设置得更大些。
如果num_worker设为0,意味着每一轮迭代时,dataloader不再有自主加载数据到RAM这一步骤(因为没有worker了),而是在RAM中找batch,找不到时再加载相应的batch。缺点是速度更慢。
2、num_workers的选取
当加载batch的时间 < 数据训练的时间
GPU每次训练完都可以直接从CPU中取到next batch的数据
无需额外的等待,因此也不需要多余的worker,即使增加worker也不会影响训练速度
当加载batch的时间 > 数据训练的时间
GPU每次训练完都需要等待CPU完成数据的载入
若增加worker,即使worker_1还未就绪,GPU也可以取worker_2的数据来训练
3、可能出现的问题
问题一:
设置了num_workers的数量为cpu核心数,但是我的cpu利用率没有到100%,基本上在20%-30%之间,但是有的时候又能到100%。这是因为worker加载数据的瓶颈一般是在硬盘的速度而不是cpu的速度,cpu利用率不到100%很正常。
问题二:当你设置了num_workers = 8,4,2 ;有的时候会出现每一个epoch的速度都没有设置为0 快,反而设置为0的时候最快。解决方法是加上
os.environ[‘KMP_DUPLICATE_LIB_OK’]=‘True’。把 dataloader包装在 if name==“main”中。