COCO数据集介绍_IT分享知识网

大家好，欢迎来到IT知识分享网。

文章目录

1、COCO数据集的介绍
2、COCO数据集标注格式
本文参考

本文主要是为了熟悉COCO数据集。

1、COCO数据集的介绍

#step1: 下载数据集 2017 Train images [118K/18GB] 2017 Val images [5K/1GB] 2017 Test images [41K/6GB] 2017 Train/Val annotations [241MB] #step2: 按照下面结构存放文件夹 coco ├── annotations │ ├── instances_train2014.json │ ├── instances_train2017.json │ ├── instances_val2014.json │ ├── instances_val2017.json │ | ... ├── train2017 │ ├── 000000000009.jpg │ ├── 000000.jpg │ | ... ├── val2017 │ ├── 000000000139.jpg │ ├── 000000000285.jpg │ | ... | ...

2、COCO数据集标注格式

object instance 目标实例
object keypoints 目标关键点
image captions 看图说话。
标注文件使用JSON文件进行存储。如下为COCO2017数据集中train,val的标注文件：

原文件是annotations_trainval2017.zip,解压后是annotations文件夹。可以看到一共有三种类型，每种类型包含训练和验证，共有6个JSON文件。

２.1实例分割Object Instance文件格式

以instance_val2017.json为例（验证集文件软小，打开较快），总体格式如下：

{ 
     "info": info, "licenses": [license], "images":[image], "annotations":[annotation], "categories":[category] }

images字段下是一个列表，列表长度等同于划入训练集（或验证集）的图片数量
annotatons字段下也是一个列表，列表长度等同地训练集（或验证集）中bounding box 的数量
categories字段下也是一个列表，列表长度等同于数据集类别的数，coco2017分类数是80,用VScode打开看：

2.1.1 info中的内容

"info": { 
     "description": "COCO 2017 Dataset", "url": "http://cocodataset.org", "version": "1.0", "year": 2017, "contributor": "COCO Consortium", "date_created": "2017/09/01" },

info中包括一些基本信息，时间，版本，贡献者等，没什么太大价值，可以忽略。

2.1.2 licenses中的内容

内容较少，这里全部列出：

"licenses": [ { 
     "url": "http://creativecommons.org/licenses/by-nc-sa/2.0/", "id": 1, "name": "Attribution-NonCommercial-ShareAlike License" }, { 
     "url": "http://creativecommons.org/licenses/by-nc/2.0/", "id": 2, "name": "Attribution-NonCommercial License" }, { 
     "url": "http://creativecommons.org/licenses/by-nc-nd/2.0/", "id": 3, "name": "Attribution-NonCommercial-NoDerivs License" }, { 
     "url": "http://creativecommons.org/licenses/by/2.0/", "id": 4, "name": "Attribution License" }, { 
     "url": "http://creativecommons.org/licenses/by-sa/2.0/", "id": 5, "name": "Attribution-ShareAlike License" }, { 
     "url": "http://creativecommons.org/licenses/by-nd/2.0/", "id": 6, "name": "Attribution-NoDerivs License" }, { 
     "url": "http://flickr.com/commons/usage/", "id": 7, "name": "No known copyright restrictions" }, { 
     "url": "http://www.usa.gov/copyright.shtml", "id": 8, "name": "United States Government Work" } ],

一共有8条，也没什么价值，可以忽略。

2.1.3 images中的内容

内容较多，列几条：

"images": [ { 
     "license": 4, "file_name": "000000.jpg", "coco_url": "http://images.cocodataset.org/val2017/000000.jpg", "height": 427, "width": 640, "date_captured": "2013-11-14 17:02:52", "flickr_url": "http://farm7.staticflickr.com/6116/_da26cf2c9e_z.jpg", "id":  }, { 
     "license": 1, "file_name": "000000037777.jpg", "coco_url": "http://images.cocodataset.org/val2017/000000037777.jpg", "height": 230, "width": 352, "date_captured": "2013-11-14 20:55:31", "flickr_url": "http://farm9.staticflickr.com/8429/_f6d48aa585_z.jpg", "id": 37777 }, { 
     "license": 4, "file_name": "000000.jpg", "coco_url": "http://images.cocodataset.org/val2017/000000.jpg", "height": 428, "width": 640, "date_captured": "2013-11-14 22:32:02", "flickr_url": "http://farm4.staticflickr.com/3446/_13d84bd0a1_z.jpg", "id":  },

jupyter中看的效果：

images是一个列表，列表中每一个元素是一个字典，存储一张图片中的信息。分别就图片信息做出说明：

license: 没用
file_name:图片文件名
coco_url:没用
height:图片高
width:图片宽
date_captured:没用
flickr_url没用
id:图片的身份ID，每个图片特有的
在以上信息中，height,width,file_name,id这四个值非常重要。

2.1.4 annotations中的内容

该内容较多，列几条：

"annotations": [ { 
     "segmentation": [ [ 510.66, 423.01, 511.72, ... 423.01, 510.45, 423.01 ] ], "area": 702.98, "iscrowd": 0, "image_id": , "bbox": [ 473.07, 395.93, 38.65, 28.67 ], "category_id": 18, "id": 1768 }, { 
     "segmentation": [ [ 289.74, 443.39, 302.29, ... 444.27, 291.88, 443.74 ] ], "area": 27718.5, "iscrowd": 0, "image_id": 61471, "bbox": [ 272.1, 200.23, 151.97, 279.77 ], "category_id": 18, "id": 1773 }, ...... "segmentation": { 
     "counts": [ 272, 2, 4, 4, ... 16, 228, 8, 10250 ], "size": [ 240, 320 ] }, "area": 18419, "iscrowd": 1, "image_id": , "bbox": [ 1, 0, 276, 122 ], "category_id": 1, "id": 3 },

jupyter中效果：

annotations是该JSON文件中最重要的。annotations是包含多个annotation实例的数组，annotation类型本身又包含一系列的字段：

segmentation:分割标签
area:面积
iscrowd: 是否多个目标
image_id:与images中的id对应
bbox:目标框
category_id:类别
id:标注框的一个序号
整体来说annotation的格式如下：

annotation{ 
     "segmentation": RLE or [polygon], "area" :float, "iscrowd": 0 or 1, "imgae_id": int, "bbox": [x,y,width,height], "category_id": int, "id": int

segmentation : { 
     'counts': [272, 2, 4, 4, 4, 4, 2, 9, 1, 2, 16, 43, 143, 24......], 'size': [240, 320] }

COCO数据集的RLE都是uncompressed RLE格式（与之相对的是compact RLE）。 RLE所占字节的大小和边界上的像素数量是正相关的。RLE格式带来的好处就是当基于RLE去计算目标区域的面积以及两个目标之间的unoin和intersection时会非常有效率。上面的segmentation中的counts数组和size数组共同组成了这幅图片中的分割 mask。其中size是这幅图片的宽高，然后在这幅图像中，每一个像素点要么在被分割（标注）的目标区域中，要么在背景中。很明显这是一个bool量：如果该像素在目标区域中为true那么在背景中就是False；如果该像素在目标区域中为1那么在背景中就是0。对于一个240×320的图片来说，一共有76800个像素点，根据每一个像素点在不在目标区域中，我们就有了76800个bit，比如像这样（随便写的例子，和上文的数组没关系）：000000…；但是这样写很明显浪费空间，我们直接写上0或者1的个数不就行了嘛（Run-length encoding)，于是就成了54251…，这就是上文中的counts数组。

area指向该segmentation的面积，iscrowd=0表示没有重叠，iscrowd=1表示有重叠；image_id就是前面images中存储的id.bbox指向的是物体的标注框；category_id指向的数字代表分类，共有80个分类；id不同于images中的id,这里的id只是每个框的身份编号。

2.1.4 categories中的内容

如下:

"categories": [ { 
     "supercategory": "person", "id": 1, "name": "person" }, { 
     "supercategory": "vehicle", "id": 2, "name": "bicycle" }, { 
     "supercategory": "vehicle", "id": 3, "name": "car" }, { 
     "supercategory": "vehicle", "id": 4, "name": "motorcycle" }, { 
     "supercategory": "vehicle", "id": 5, "name": "airplane" }, { 
     "supercategory": "vehicle", "id": 6, "name": "bus" }, { 
     "supercategory": "vehicle", "id": 7, "name": "train" }, { 
     "supercategory": "vehicle", "id": 8, "name": "truck" }, { 
     "supercategory": "vehicle", "id": 9, "name": "boat" }, { 
     "supercategory": "outdoor", "id": 10, "name": "traffic light" }, { 
     "supercategory": "outdoor", "id": 11, "name": "fire hydrant" }, { 
     "supercategory": "outdoor", "id": 13, "name": "stop sign" }, { 
     "supercategory": "outdoor", "id": 14, "name": "parking meter" }, { 
     "supercategory": "outdoor", "id": 15, "name": "bench" }, { 
     "supercategory": "animal", "id": 16, "name": "bird" }, { 
     "supercategory": "animal", "id": 17, "name": "cat" }, { 
     "supercategory": "animal", "id": 18, "name": "dog" }, { 
     "supercategory": "animal", "id": 19, "name": "horse" }, { 
     "supercategory": "animal", "id": 20, "name": "sheep" }, { 
     "supercategory": "animal", "id": 21, "name": "cow" }, { 
     "supercategory": "animal", "id": 22, "name": "elephant" }, { 
     "supercategory": "animal", "id": 23, "name": "bear" }, { 
     "supercategory": "animal", "id": 24, "name": "zebra" }, { 
     "supercategory": "animal", "id": 25, "name": "giraffe" }, { 
     "supercategory": "accessory", "id": 27, "name": "backpack" }, { 
     "supercategory": "accessory", "id": 28, "name": "umbrella" }, { 
     "supercategory": "accessory", "id": 31, "name": "handbag" }, { 
     "supercategory": "accessory", "id": 32, "name": "tie" }, { 
     "supercategory": "accessory", "id": 33, "name": "suitcase" }, { 
     "supercategory": "sports", "id": 34, "name": "frisbee" }, { 
     "supercategory": "sports", "id": 35, "name": "skis" }, { 
     "supercategory": "sports", "id": 36, "name": "snowboard" }, { 
     "supercategory": "sports", "id": 37, "name": "sports ball" }, { 
     "supercategory": "sports", "id": 38, "name": "kite" }, { 
     "supercategory": "sports", "id": 39, "name": "baseball bat" }, { 
     "supercategory": "sports", "id": 40, "name": "baseball glove" }, { 
     "supercategory": "sports", "id": 41, "name": "skateboard" }, { 
     "supercategory": "sports", "id": 42, "name": "surfboard" }, { 
     "supercategory": "sports", "id": 43, "name": "tennis racket" }, { 
     "supercategory": "kitchen", "id": 44, "name": "bottle" }, { 
     "supercategory": "kitchen", "id": 46, "name": "wine glass" }, { 
     "supercategory": "kitchen", "id": 47, "name": "cup" }, { 
     "supercategory": "kitchen", "id": 48, "name": "fork" }, { 
     "supercategory": "kitchen", "id": 49, "name": "knife" }, { 
     "supercategory": "kitchen", "id": 50, "name": "spoon" }, { 
     "supercategory": "kitchen", "id": 51, "name": "bowl" }, { 
     "supercategory": "food", "id": 52, "name": "banana" }, { 
     "supercategory": "food", "id": 53, "name": "apple" }, { 
     "supercategory": "food", "id": 54, "name": "sandwich" }, { 
     "supercategory": "food", "id": 55, "name": "orange" }, { 
     "supercategory": "food", "id": 56, "name": "broccoli" }, { 
     "supercategory": "food", "id": 57, "name": "carrot" }, { 
     "supercategory": "food", "id": 58, "name": "hot dog" }, { 
     "supercategory": "food", "id": 59, "name": "pizza" }, { 
     "supercategory": "food", "id": 60, "name": "donut" }, { 
     "supercategory": "food", "id": 61, "name": "cake" }, { 
     "supercategory": "furniture", "id": 62, "name": "chair" }, { 
     "supercategory": "furniture", "id": 63, "name": "couch" }, { 
     "supercategory": "furniture", "id": 64, "name": "potted plant" }, { 
     "supercategory": "furniture", "id": 65, "name": "bed" }, { 
     "supercategory": "furniture", "id": 67, "name": "dining table" }, { 
     "supercategory": "furniture", "id": 70, "name": "toilet" }, { 
     "supercategory": "electronic", "id": 72, "name": "tv" }, { 
     "supercategory": "electronic", "id": 73, "name": "laptop" }, { 
     "supercategory": "electronic", "id": 74, "name": "mouse" }, { 
     "supercategory": "electronic", "id": 75, "name": "remote" }, { 
     "supercategory": "electronic", "id": 76, "name": "keyboard" }, { 
     "supercategory": "electronic", "id": 77, "name": "cell phone" }, { 
     "supercategory": "appliance", "id": 78, "name": "microwave" }, { 
     "supercategory": "appliance", "id": 79, "name": "oven" }, { 
     "supercategory": "appliance", "id": 80, "name": "toaster" }, { 
     "supercategory": "appliance", "id": 81, "name": "sink" }, { 
     "supercategory": "appliance", "id": 82, "name": "refrigerator" }, { 
     "supercategory": "indoor", "id": 84, "name": "book" }, { 
     "supercategory": "indoor", "id": 85, "name": "clock" }, { 
     "supercategory": "indoor", "id": 86, "name": "vase" }, { 
     "supercategory": "indoor", "id": 87, "name": "scissors" }, { 
     "supercategory": "indoor", "id": 88, "name": "teddy bear" }, { 
     "supercategory": "indoor", "id": 89, "name": "hair drier" }, { 
     "supercategory": "indoor", "id": 90, "name": "toothbrush" } ]

2.2 关键点检测Object Keypoint文件格式

COCO数据集中person_keypoints_train2017.json、person_keypoints_val2017.json这两个文件就是这种格式。文件整体格式是：

{ 
     "info": info, "licenses": [license], "images": [image], "annotations": [annotation], "categories": [category] }

与instance_val2017.json相同。其中，info、licenses、images这三部分在不同的JSON文件中是相同的，定义是共享的，不共享的是annotations和category这两种在不同类型的JSON文件中是不一样的。

images字段下是一个列表，列表长度等同于划入训练集（或验证集）的图片数量
annotatons字段下也是一个列表，列表长度等同地训练集（或验证集）中bounding box 的数量，这里只有人这个类别的bounding box
categories字段下也是一个列表，列表长度等同于数据集类别的数，这里是1，只有person这一个类。
相同内容这里就不再列了，只列不同的。

2.2.1 annotations中的内容

这个类型中的annotation结构中包含 object instance中annotation所有的字段，再加上两个额外的字段。新增的keypoints是一个长度为3*k的数组，第一个和第二个元素分别是x和y坐标值，第三个是标志位v,v为0时表示这个关键点没有标注（这种情况下x=y=v=0），v为1时表示这个关键点标注了但是不可见（被遮挡了），v为2时表示这个关键点标注了同时也可见。num_keypoints表示这个目标上被标注的关键点的数量（v>0），比较小的目标上可能就无法标注关键点。

annotation{ 
     "segmentation": RLE or [polygon], "num_keypoints": int, "area": float, "iscrowd": 0 or 1, "keypoints": [x1,y1,v1,...], "image_id": int, "bbox": [x,y,width,height], "category_id": int, "id": int }

列举一个：

{ "segmentation": [ [ 492.38, 238.33, 491.91, 234.15, 494.47, 227.65, 495.17, 215.1, 497.02, 199.54, 503.53, 197.22, 503.3, 194.43, 503.3, 190.95, 506.08, 183.51, 511.89, 185.84, 514.21, 187, 514.21, 196.29, 521.88, 200.7, 526.76, 216.03, 520.25, 227.65, 519.56, 234.38, 519.09, 239.49, 519.09, 244.84, 519.56, 246.93, 518.16, 248.32, 516.3, 256.91, 510.03, 256.45, 513.28, 240.89 ] ], "num_keypoints": 13, "area": 1394.7431, "iscrowd": 0, "keypoints": [ 508, 192, 2, 510, 191, 2, 506, 191, 2, 512, 192, 2, 503, 192, 1, 515, 202, 2, 499, 202, 2, 524, 214, 2, 497, 215, 2, 516, 226, 2, 496, 224, 2, 511, 232, 2, 497, 230, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ], "image_id": , "bbox": [ 491.91, 183.51, 34.85, 73.4 ], "category_id": 1, "id":  }

2.2.2 categories中的内容

对于category,相比object instance中的category,新增了两个字段，keypoints是一个长度为k的数组，包含每个关键点的名称；skeleton定义各关键点的连接性（比如人的左手腕和左肘就是连接的，但是左手腕和右手腕就不是）。目前，COCO的keypoints只标注了person category （分类为人）。定义如下：

{ 
     "supercategory": str, "id": int, "name": str, "keypoints": [str], "skeleton": [edge] }

具体的：

"categories": [ { 
     "supercategory": "person", "id": 1, "name": "person", "keypoints": [ "nose", "left_eye", "right_eye", "left_ear", "right_ear", "left_shoulder", "right_shoulder", "left_elbow", "right_elbow", "left_wrist", "right_wrist", "left_hip", "right_hip", "left_knee", "right_knee", "left_ankle", "right_ankle" ], "skeleton": [ [ 16, 14 ], [ 14, 12 ], [ 17, 15 ], [ 15, 13 ], [ 12, 13 ], [ 6, 12 ], [ 7, 13 ], [ 6, 7 ], [ 6, 8 ], [ 7, 9 ], [ 8, 10 ], [ 9, 11 ], [ 2, 3 ], [ 1, 2 ], [ 1, 3 ], [ 2, 4 ], [ 3, 5 ], [ 4, 6 ], [ 5, 7 ] ] } ]

2.3 看图说话Image Caption文件格式

captions_train2017.json、captions_val2017.json这两个文件就是这种格式。Image Caption这种格式的文件从头至尾按照顺序分为以下段落，看起来和Object Instance一样，不过没有最后的categories字段：

{ 
     "info": info, "licenses": [license], "images": [image], "annotations": [annotation] }

其中，info、licenses、images这三个结构体/类型，在不同的JSON文件中这三个类型是一样的，定义是共享的。不共享的是annotations这种结构体，它在不同类型的JSON文件中是不一样的。

annotations: 数量要多于图片的数量，这是因为一个图片可以有多个场景描述；

2.3.1 annotation中的内容

这个类型中的annotation用来存储描述图片的语句。每个语句描述了对应图片的内容，而每个图片至少有5个描述语句（有的图片更多）。annotation定义如下：

annotation{ 
     "image_id": int, "id": int, "caption": str }

取一个具体片段：

{ 
     "image_id": , "id": , "caption": "A large group is sitting together and eating at a restaurant." }, { 
     "image_id": , "id": , "caption": "The people are gathered at the table for dinner." }, { 
     "image_id": , "id": , "caption": "Two men standing near a bar drinking together" }, { 
     "image_id": , "id": , "caption": "A large group of people pose for a photo at dinner." }, { 
     "image_id": , "id": , "caption": "The diners are enjoying their various beverages with their meals.." }

本文参考

https://zhuanlan.zhihu.com/p/
https://blog.csdn.net/weixin_/article/details/

免责声明：本站所有文章内容,图片，视频等均是来源于用户投稿和互联网及文摘转载整编而成，不代表本站观点，不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益，请在线联系站长,一经查实,本站将立刻删除。本文来自网络,若有侵权，请联系删除，如若转载，请注明出处：https://haidsoft.com/116605.html

COCO数据集介绍