pytorch SubsetRandomSampler 用法和说明

官网:https://pytorch.org/docs/stable/data.html?highlight=subsetrandomsampler#torch.utils.data.SubsetRandomSampler

推荐参考:https://www.sohu.com/a/291959747_197042

https://www.jianshu.com/p/a32ae0294223

https://www.cnblogs.com/marsggbo/p/10496696.html

理解一下:

DataLoader其实就是先根据sampler方法先采样,再切分出batch(比如样本有10个,SubsetRandomSampler返回一个下标,比如0到7,那么取出这8个数据,然后按照batch_size切分出一个个的batch)

实际应用:

from torch.utils.data import DataLoader
from torch.utils.data import sampler

train_data = CriteoDataset('./data', train=True) #自己定义 split_num = int(len(train_data) * 0.8) index_list = list(range(len(train_data))) train_idx, valid_idx = index_list[:split_num], index_list[split_num:] tr_sampler = sampler.SubsetRandomSampler(train_idx) val_sampler = sampler.SubsetRandomSampler(valid_idx) loader_train = DataLoader(train_data, batch_size=100, sampler=tr_sampler) loader_val = DataLoader(val_data, batch_size=100, sampler=val_sampler)