大家好,欢迎来到IT知识分享网。
转自:CIFAR数据读取
1. CIFAR简介
CIFAR是一个图像分类的数据集,包含60,000张32*32的图片,分十个类别(飞机、机动车、鸟类等等),每个类有1000张图片。整个数据集被分成50,000和10,000两部分,50,000是training set,用来做训练;10,000是test set,用来做验证
2. CIFAR结构介绍
官方数据源提供多种语言的数据集,如果你从官方数据源下载CIFAR的python版的数据集的话,数据集的结构是这样的:
- batches.meta
- data_batch_1
- data_batch_2
- data_batch_3
- data_batch_4
- data_batch_5
- test_batch
- readme.html
3. 读取CIFAR中的Image和Label
3.0 CIFAR数据集的数据结构
CIFAR的数据集是用python的pickle包进行组织的,要读取数据集,首先要了解pickle的使用方法
3.1用pickle进行数据的序列化,保存到磁盘
pickle.dump( obj, file, protocol )
3.2用pickle解数据的序列化,读到内存中
obj = pickle.load(file)
只有一个参数file,这个file是一个具有read方法的文件对象
pickle使用示例
import pickle test_l = [ '2', 4, 'test' ] print test_l pickle.dump(test_l, open("test.txt","w"), False) test_l_p = pickle.load(open("test.txt","r")) print test_l_p
读取CIFAR数据集的代码
def LOAD_CIFAR_DATA( filename ): with open(filename,'rb') as f: dataset = pickle.load(f) X = dataset[ 'data' ] Y = dataset[ 'labels' ] X = np.reshape(X, (10000,3,32,32)) Y = np.array(Y) return X,Y def LOAD_CIFAR_LABELS(filename): with open(filename,'rb') as f: obj = pickle.load(f) return obj['label_names']
完整示例
import pickle import cv2 import numpy as np class ReadData(): def __init__(self): self.label_names = None self.X = None self.Y = None def LOAD_CIFAR_DATA(self, filename): with open(filename, 'rb') as f: dataset = pickle.load(f) print dataset.keys() # print len(dataset['filenames']) X = dataset['data'] Y = dataset['labels'] self.X = np.reshape(X, (10000, 3, 32, 32)) self.Y = np.array(Y) def LOAD_CIFAR_LABELS(self, filename): with open(filename, 'rb') as f: obj = pickle.load(f) print obj self.label_names = obj['label_names'] def Read(self, imgFileName, labelFileName): self.LOAD_CIFAR_LABELS(filename=labelFileName) self.LOAD_CIFAR_DATA(filename=imgFileName) def Show(self): example_nums = self.X.shape[0] for i in range(example_nums): img = self.X[i] img_merged = cv2.merge([img[0], img[1], img[2]]) cv2.imshow("Image", img_merged) print self.label_names[self.Y[i]] cv2.waitKey(0) def main(): readData = ReadData() imgFileName = './cifar-10-batches-py/data_batch_2' labelFileName = './cifar-10-batches-py/batches.meta' readData.Read(imgFileName, labelFileName) readData.Show() if __name__ == '__main__': main()
参考资料
- Python中使用pickle持久化对象 oldj.net/article/python-pickle
- CS231n课程
免责声明:本站所有文章内容,图片,视频等均是来源于用户投稿和互联网及文摘转载整编而成,不代表本站观点,不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益,请在线联系站长,一经查实,本站将立刻删除。 本文来自网络,若有侵权,请联系删除,如若转载,请注明出处:https://haidsoft.com/127354.html