COCO数据集介绍

COCO数据集介绍文章目录 1 COCO 数据集的介绍 2 COCO 数据集结构及组成介绍本文主要是为了熟悉 COCO 数据集

大家好,欢迎来到IT知识分享网。

本文主要是为了熟悉COCO数据集。

1、COCO数据集的介绍

#step1: 下载数据集 2017 Train images [118K/18GB] 2017 Val images [5K/1GB] 2017 Test images [41K/6GB] 2017 Train/Val annotations [241MB] #step2: 按照下面结构存放文件夹 coco ├── annotations │ ├── instances_train2014.json │ ├── instances_train2017.json │ ├── instances_val2014.json │ ├── instances_val2017.json │ | ... ├── train2017 │ ├── 000000000009.jpg │ ├── 000000.jpg │ | ... ├── val2017 │ ├── 000000000139.jpg │ ├── 000000000285.jpg │ | ... | ... 

2、COCO数据集标注格式

  • object instance 目标实例
  • object keypoints 目标关键点
  • image captions 看图说话。
    标注文件使用JSON文件进行存储。如下为COCO2017数据集中train,val的标注文件:
    在这里插入图片描述
    原文件是annotations_trainval2017.zip,解压后是annotations文件夹。可以看到一共有三种类型,每种类型包含训练和验证,共有6个JSON文件。


2.1实例分割Object Instance文件格式

以instance_val2017.json为例(验证集文件软小,打开较快),总体格式如下:

{ 
     "info": info, "licenses": [license], "images":[image], "annotations":[annotation], "categories":[category] } 

在这里插入图片描述

  • images字段下是一个列表,列表长度等同于划入训练集(或验证集)的图片数量
  • annotatons字段下也是一个列表,列表长度等同地训练集(或验证集)中bounding box 的数量
  • categories字段下也是一个列表,列表长度等同于数据集类别的数,coco2017分类数是80,用VScode打开看:
    在这里插入图片描述

在这里插入图片描述

2.1.1 info中的内容

"info": { 
     "description": "COCO 2017 Dataset", "url": "http://cocodataset.org", "version": "1.0", "year": 2017, "contributor": "COCO Consortium", "date_created": "2017/09/01" }, 

在这里插入图片描述
info中包括一些基本信息,时间,版本,贡献者等,没什么太大价值,可以忽略。

2.1.2 licenses中的内容

内容较少,这里全部列出:

"licenses": [ { 
     "url": "http://creativecommons.org/licenses/by-nc-sa/2.0/", "id": 1, "name": "Attribution-NonCommercial-ShareAlike License" }, { 
     "url": "http://creativecommons.org/licenses/by-nc/2.0/", "id": 2, "name": "Attribution-NonCommercial License" }, { 
     "url": "http://creativecommons.org/licenses/by-nc-nd/2.0/", "id": 3, "name": "Attribution-NonCommercial-NoDerivs License" }, { 
     "url": "http://creativecommons.org/licenses/by/2.0/", "id": 4, "name": "Attribution License" }, { 
     "url": "http://creativecommons.org/licenses/by-sa/2.0/", "id": 5, "name": "Attribution-ShareAlike License" }, { 
     "url": "http://creativecommons.org/licenses/by-nd/2.0/", "id": 6, "name": "Attribution-NoDerivs License" }, { 
     "url": "http://flickr.com/commons/usage/", "id": 7, "name": "No known copyright restrictions" }, { 
     "url": "http://www.usa.gov/copyright.shtml", "id": 8, "name": "United States Government Work" } ], 

在这里插入图片描述
一共有8条,也没什么价值,可以忽略。

2.1.3 images中的内容

内容较多,列几条:

"images": [ { 
     "license": 4, "file_name": "000000.jpg", "coco_url": "http://images.cocodataset.org/val2017/000000.jpg", "height": 427, "width": 640, "date_captured": "2013-11-14 17:02:52", "flickr_url": "http://farm7.staticflickr.com/6116/_da26cf2c9e_z.jpg", "id":  }, { 
     "license": 1, "file_name": "000000037777.jpg", "coco_url": "http://images.cocodataset.org/val2017/000000037777.jpg", "height": 230, "width": 352, "date_captured": "2013-11-14 20:55:31", "flickr_url": "http://farm9.staticflickr.com/8429/_f6d48aa585_z.jpg", "id": 37777 }, { 
     "license": 4, "file_name": "000000.jpg", "coco_url": "http://images.cocodataset.org/val2017/000000.jpg", "height": 428, "width": 640, "date_captured": "2013-11-14 22:32:02", "flickr_url": "http://farm4.staticflickr.com/3446/_13d84bd0a1_z.jpg", "id":  }, 

在这里插入图片描述
jupyter中看的效果:
在这里插入图片描述

images是一个列表,列表中每一个元素是一个字典,存储一张图片中的信息。分别就图片信息做出说明:

  • license: 没用
  • file_name:图片文件名
  • coco_url:没用
  • height:图片高
  • width:图片宽
  • date_captured:没用
  • flickr_url没用
  • id:图片的身份ID,每个图片特有的
    在以上信息中,height,width,file_name,id这四个值非常重要。

2.1.4 annotations中的内容

该内容较多,列几条:

"annotations": [ { 
     "segmentation": [ [ 510.66, 423.01, 511.72, ... 423.01, 510.45, 423.01 ] ], "area": 702.98, "iscrowd": 0, "image_id": , "bbox": [ 473.07, 395.93, 38.65, 28.67 ], "category_id": 18, "id": 1768 }, { 
     "segmentation": [ [ 289.74, 443.39, 302.29, ... 444.27, 291.88, 443.74 ] ], "area": 27718.5, "iscrowd": 0, "image_id": 61471, "bbox": [ 272.1, 200.23, 151.97, 279.77 ], "category_id": 18, "id": 1773 }, ...... "segmentation": { 
     "counts": [ 272, 2, 4, 4, ... 16, 228, 8, 10250 ], "size": [ 240, 320 ] }, "area": 18419, "iscrowd": 1, "image_id": , "bbox": [ 1, 0, 276, 122 ], "category_id": 1, "id": 3 }, 

在这里插入图片描述
jupyter中效果:
在这里插入图片描述

annotations是该JSON文件中最重要的。annotations是包含多个annotation实例的数组,annotation类型本身又包含一系列的字段:

  • segmentation:分割标签
  • area:面积
  • iscrowd: 是否多个目标
  • image_id:与images中的id对应
  • bbox:目标框
  • category_id:类别
  • id:标注框的一个序号
    整体来说annotation的格式如下:
annotation{ 
     "segmentation": RLE or [polygon], "area" :float, "iscrowd": 0 or 1, "imgae_id": int, "bbox": [x,y,width,height], "category_id": int, "id": int 
segmentation : { 
     'counts': [272, 2, 4, 4, 4, 4, 2, 9, 1, 2, 16, 43, 143, 24......], 'size': [240, 320] } 

COCO数据集的RLE都是uncompressed RLE格式(与之相对的是compact RLE)。 RLE所占字节的大小和边界上的像素数量是正相关的。RLE格式带来的好处就是当基于RLE去计算目标区域的面积以及两个目标之间的unoin和intersection时会非常有效率。 上面的segmentation中的counts数组和size数组共同组成了这幅图片中的分割 mask。其中size是这幅图片的宽高,然后在这幅图像中,每一个像素点要么在被分割(标注)的目标区域中,要么在背景中。很明显这是一个bool量:如果该像素在目标区域中为true那么在背景中就是False;如果该像素在目标区域中为1那么在背景中就是0。对于一个240×320的图片来说,一共有76800个像素点,根据每一个像素点在不在目标区域中,我们就有了76800个bit,比如像这样(随便写的例子,和上文的数组没关系):000000…;但是这样写很明显浪费空间,我们直接写上0或者1的个数不就行了嘛(Run-length encoding),于是就成了54251…,这就是上文中的counts数组。

area指向该segmentation的面积,iscrowd=0表示没有重叠,iscrowd=1表示有重叠;image_id就是前面images中存储的id.bbox指向的是物体的标注框;category_id指向的数字代表分类,共有80个分类;id不同于images中的id,这里的id只是每个框的身份编号。

2.1.4 categories中的内容

如下:

"categories": [ { 
     "supercategory": "person", "id": 1, "name": "person" }, { 
     "supercategory": "vehicle", "id": 2, "name": "bicycle" }, { 
     "supercategory": "vehicle", "id": 3, "name": "car" }, { 
     "supercategory": "vehicle", "id": 4, "name": "motorcycle" }, { 
     "supercategory": "vehicle", "id": 5, "name": "airplane" }, { 
     "supercategory": "vehicle", "id": 6, "name": "bus" }, { 
     "supercategory": "vehicle", "id": 7, "name": "train" }, { 
     "supercategory": "vehicle", "id": 8, "name": "truck" }, { 
     "supercategory": "vehicle", "id": 9, "name": "boat" }, { 
     "supercategory": "outdoor", "id": 10, "name": "traffic light" }, { 
     "supercategory": "outdoor", "id": 11, "name": "fire hydrant" }, { 
     "supercategory": "outdoor", "id": 13, "name": "stop sign" }, { 
     "supercategory": "outdoor", "id": 14, "name": "parking meter" }, { 
     "supercategory": "outdoor", "id": 15, "name": "bench" }, { 
     "supercategory": "animal", "id": 16, "name": "bird" }, { 
     "supercategory": "animal", "id": 17, "name": "cat" }, { 
     "supercategory": "animal", "id": 18, "name": "dog" }, { 
     "supercategory": "animal", "id": 19, "name": "horse" }, { 
     "supercategory": "animal", "id": 20, "name": "sheep" }, { 
     "supercategory": "animal", "id": 21, "name": "cow" }, { 
     "supercategory": "animal", "id": 22, "name": "elephant" }, { 
     "supercategory": "animal", "id": 23, "name": "bear" }, { 
     "supercategory": "animal", "id": 24, "name": "zebra" }, { 
     "supercategory": "animal", "id": 25, "name": "giraffe" }, { 
     "supercategory": "accessory", "id": 27, "name": "backpack" }, { 
     "supercategory": "accessory", "id": 28, "name": "umbrella" }, { 
     "supercategory": "accessory", "id": 31, "name": "handbag" }, { 
     "supercategory": "accessory", "id": 32, "name": "tie" }, { 
     "supercategory": "accessory", "id": 33, "name": "suitcase" }, { 
     "supercategory": "sports", "id": 34, "name": "frisbee" }, { 
     "supercategory": "sports", "id": 35, "name": "skis" }, { 
     "supercategory": "sports", "id": 36, "name": "snowboard" }, { 
     "supercategory": "sports", "id": 37, "name": "sports ball" }, { 
     "supercategory": "sports", "id": 38, "name": "kite" }, { 
     "supercategory": "sports", "id": 39, "name": "baseball bat" }, { 
     "supercategory": "sports", "id": 40, "name": "baseball glove" }, { 
     "supercategory": "sports", "id": 41, "name": "skateboard" }, { 
     "supercategory": "sports", "id": 42, "name": "surfboard" }, { 
     "supercategory": "sports", "id": 43, "name": "tennis racket" }, { 
     "supercategory": "kitchen", "id": 44, "name": "bottle" }, { 
     "supercategory": "kitchen", "id": 46, "name": "wine glass" }, { 
     "supercategory": "kitchen", "id": 47, "name": "cup" }, { 
     "supercategory": "kitchen", "id": 48, "name": "fork" }, { 
     "supercategory": "kitchen", "id": 49, "name": "knife" }, { 
     "supercategory": "kitchen", "id": 50, "name": "spoon" }, { 
     "supercategory": "kitchen", "id": 51, "name": "bowl" }, { 
     "supercategory": "food", "id": 52, "name": "banana" }, { 
     "supercategory": "food", "id": 53, "name": "apple" }, { 
     "supercategory": "food", "id": 54, "name": "sandwich" }, { 
     "supercategory": "food", "id": 55, "name": "orange" }, { 
     "supercategory": "food", "id": 56, "name": "broccoli" }, { 
     "supercategory": "food", "id": 57, "name": "carrot" }, { 
     "supercategory": "food", "id": 58, "name": "hot dog" }, { 
     "supercategory": "food", "id": 59, "name": "pizza" }, { 
     "supercategory": "food", "id": 60, "name": "donut" }, { 
     "supercategory": "food", "id": 61, "name": "cake" }, { 
     "supercategory": "furniture", "id": 62, "name": "chair" }, { 
     "supercategory": "furniture", "id": 63, "name": "couch" }, { 
     "supercategory": "furniture", "id": 64, "name": "potted plant" }, { 
     "supercategory": "furniture", "id": 65, "name": "bed" }, { 
     "supercategory": "furniture", "id": 67, "name": "dining table" }, { 
     "supercategory": "furniture", "id": 70, "name": "toilet" }, { 
     "supercategory": "electronic", "id": 72, "name": "tv" }, { 
     "supercategory": "electronic", "id": 73, "name": "laptop" }, { 
     "supercategory": "electronic", "id": 74, "name": "mouse" }, { 
     "supercategory": "electronic", "id": 75, "name": "remote" }, { 
     "supercategory": "electronic", "id": 76, "name": "keyboard" }, { 
     "supercategory": "electronic", "id": 77, "name": "cell phone" }, { 
     "supercategory": "appliance", "id": 78, "name": "microwave" }, { 
     "supercategory": "appliance", "id": 79, "name": "oven" }, { 
     "supercategory": "appliance", "id": 80, "name": "toaster" }, { 
     "supercategory": "appliance", "id": 81, "name": "sink" }, { 
     "supercategory": "appliance", "id": 82, "name": "refrigerator" }, { 
     "supercategory": "indoor", "id": 84, "name": "book" }, { 
     "supercategory": "indoor", "id": 85, "name": "clock" }, { 
     "supercategory": "indoor", "id": 86, "name": "vase" }, { 
     "supercategory": "indoor", "id": 87, "name": "scissors" }, { 
     "supercategory": "indoor", "id": 88, "name": "teddy bear" }, { 
     "supercategory": "indoor", "id": 89, "name": "hair drier" }, { 
     "supercategory": "indoor", "id": 90, "name": "toothbrush" } ] 

2.2 关键点检测Object Keypoint文件格式

COCO数据集中person_keypoints_train2017.json、person_keypoints_val2017.json这两个文件就是这种格式。文件整体格式是:

{ 
     "info": info, "licenses": [license], "images": [image], "annotations": [annotation], "categories": [category] } 

在这里插入图片描述

与instance_val2017.json相同。其中,info、licenses、images这三部分在不同的JSON文件中是相同的,定义是共享的,不共享的是annotations和category这两种在不同类型的JSON文件中是不一样的。

  • images字段下是一个列表,列表长度等同于划入训练集(或验证集)的图片数量
  • annotatons字段下也是一个列表,列表长度等同地训练集(或验证集)中bounding box 的数量,这里只有人这个类别的bounding box
  • categories字段下也是一个列表,列表长度等同于数据集类别的数,这里是1,只有person这一个类。
    相同内容这里就不再列了,只列不同的。

2.2.1 annotations中的内容

这个类型中的annotation结构中包含 object instance中annotation所有的字段,再加上两个额外的字段。新增的keypoints是一个长度为3*k的数组,第一个和第二个元素分别是x和y坐标值,第三个是标志位v,v为0时表示这个关键点没有标注(这种情况下x=y=v=0),v为1时表示这个关键点标注了但是不可见(被遮挡了),v为2时表示这个关键点标注了同时也可见。num_keypoints表示这个目标上被标注的关键点的数量(v>0),比较小的目标上可能就无法标注关键点。

annotation{ 
     "segmentation": RLE or [polygon], "num_keypoints": int, "area": float, "iscrowd": 0 or 1, "keypoints": [x1,y1,v1,...], "image_id": int, "bbox": [x,y,width,height], "category_id": int, "id": int } 

列举一个:

{ "segmentation": [ [ 492.38, 238.33, 491.91, 234.15, 494.47, 227.65, 495.17, 215.1, 497.02, 199.54, 503.53, 197.22, 503.3, 194.43, 503.3, 190.95, 506.08, 183.51, 511.89, 185.84, 514.21, 187, 514.21, 196.29, 521.88, 200.7, 526.76, 216.03, 520.25, 227.65, 519.56, 234.38, 519.09, 239.49, 519.09, 244.84, 519.56, 246.93, 518.16, 248.32, 516.3, 256.91, 510.03, 256.45, 513.28, 240.89 ] ], "num_keypoints": 13, "area": 1394.7431, "iscrowd": 0, "keypoints": [ 508, 192, 2, 510, 191, 2, 506, 191, 2, 512, 192, 2, 503, 192, 1, 515, 202, 2, 499, 202, 2, 524, 214, 2, 497, 215, 2, 516, 226, 2, 496, 224, 2, 511, 232, 2, 497, 230, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ], "image_id": , "bbox": [ 491.91, 183.51, 34.85, 73.4 ], "category_id": 1, "id":  } 

2.2.2 categories中的内容

对于category,相比object instance中的category,新增了两个字段,keypoints是一个长度为k的数组,包含每个关键点的名称;skeleton定义各关键点的连接性(比如人的左手腕和左肘就是连接的,但是左手腕和右手腕就不是)。目前,COCO的keypoints只标注了person category (分类为人)。定义如下:

{ 
     "supercategory": str, "id": int, "name": str, "keypoints": [str], "skeleton": [edge] } 

具体的:

"categories": [ { 
     "supercategory": "person", "id": 1, "name": "person", "keypoints": [ "nose", "left_eye", "right_eye", "left_ear", "right_ear", "left_shoulder", "right_shoulder", "left_elbow", "right_elbow", "left_wrist", "right_wrist", "left_hip", "right_hip", "left_knee", "right_knee", "left_ankle", "right_ankle" ], "skeleton": [ [ 16, 14 ], [ 14, 12 ], [ 17, 15 ], [ 15, 13 ], [ 12, 13 ], [ 6, 12 ], [ 7, 13 ], [ 6, 7 ], [ 6, 8 ], [ 7, 9 ], [ 8, 10 ], [ 9, 11 ], [ 2, 3 ], [ 1, 2 ], [ 1, 3 ], [ 2, 4 ], [ 3, 5 ], [ 4, 6 ], [ 5, 7 ] ] } ] 

在这里插入图片描述

2.3 看图说话Image Caption文件格式

captions_train2017.json、captions_val2017.json这两个文件就是这种格式。Image Caption这种格式的文件从头至尾按照顺序分为以下段落,看起来和Object Instance一样,不过没有最后的categories字段:

{ 
     "info": info, "licenses": [license], "images": [image], "annotations": [annotation] } 

其中,info、licenses、images这三个结构体/类型 ,在不同的JSON文件中这三个类型是一样的,定义是共享的。不共享的是annotations这种结构体,它在不同类型的JSON文件中是不一样的。

  • annotations: 数量要多于图片的数量,这是因为一个图片可以有多个场景描述;

2.3.1 annotation中的内容

这个类型中的annotation用来存储描述图片的语句。每个语句描述了对应图片的内容,而每个图片至少有5个描述语句(有的图片更多)。annotation定义如下:

annotation{ 
     "image_id": int, "id": int, "caption": str } 

取一个具体片段:

{ 
     "image_id": , "id": , "caption": "A large group is sitting together and eating at a restaurant." }, { 
     "image_id": , "id": , "caption": "The people are gathered at the table for dinner." }, { 
     "image_id": , "id": , "caption": "Two men standing near a bar drinking together" }, { 
     "image_id": , "id": , "caption": "A large group of people pose for a photo at dinner." }, { 
     "image_id": , "id": , "caption": "The diners are enjoying their various beverages with their meals.." } 

本文参考

  • https://zhuanlan.zhihu.com/p/
  • https://blog.csdn.net/weixin_/article/details/


免责声明:本站所有文章内容,图片,视频等均是来源于用户投稿和互联网及文摘转载整编而成,不代表本站观点,不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益,请在线联系站长,一经查实,本站将立刻删除。 本文来自网络,若有侵权,请联系删除,如若转载,请注明出处:https://haidsoft.com/116605.html

(0)
上一篇 2025-11-27 18:33
下一篇 2025-11-27 19:00

相关推荐

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注微信