The KITTI Vision Benchmark Suite - 3d object
The KITTI Vision Benchmark Suite - 3d object
The KITTI Vision Benchmark Suite
http://www.cvlibs.net/datasets/kitti/index.php
1. 3D Object Detection Evaluation 2017
The 3D object detection benchmark consists of 7481 training images and 7518 test images as well as the corresponding point clouds, comprising a total of 80.256 labeled objects. For evaluation, we compute precision-recall curves. To rank the methods we compute average precision. We require that all methods use the same parameter set for all test pairs. Our development kit provides details about the data format as well as MATLAB / C++ utility functions for reading and writing the label files.
3D 物体检测基准包括 7481 张训练图像和 7518 张测试图像以及相应的点云,包括总共 80, 256 个标记目标。为了评估,我们计算 precision-recall 曲线。要对方法进行排名,我们计算平均精度。我们要求所有方法对所有测试使用相同的参数集。我们的开发套件提供有关数据格式的详细信息以及用于读取和写入标签文件的 MATLAB / C++ 实用程序功能。
comprise [kəm'praɪz]:vt. 包含,由...组成
utility [juːˈtɪlɪtɪ]:n. 实用,效用,公共设施,功用 adj. 实用的,通用的,有多种用途的
Download left color images of object data set (12 GB)
Download right color images, if you want to use stereo information (12 GB)
Download the 3 temporally preceding frames (left color) (36 GB)
Download the 3 temporally preceding frames (right color) (36 GB)
Download Velodyne point clouds, if you want to use laser information (29 GB)
Download camera calibration matrices of object data set (16 MB)
Download training labels of object data set (5 MB)
Download object development kit (1 MB) (including 3D object detection and bird’s eye view evaluation code)
Download pre-trained LSVM baseline models (5 MB) used in Joint 3D Estimation of Objects and Scene Layout (NIPS 2011). These models are referred to as LSVM-MDPM-sv (supervised version) and LSVM-MDPM-us (unsupervised version) in the tables below.
Download reference detections (L-SVM) for training and test set (800 MB)
Qianli Liao (NYU) has put together code to convert from KITTI to PASCAL VOC file format (documentation included, requires Emacs).
Karl Rosaen (U.Mich) has released code to convert between KITTI, KITTI tracking, Pascal VOC, Udacity, CrowdAI and AUTTI formats.
We thank David Stutz and Bo Li for developing the 3D object detection benchmark.
stereo [ˈsterɪəʊ]:n. 立体声,立体声系统,铅版,立体照片 adj. 立体的,立体声的,立体感觉的
temporal [ˈtempərəl]:adj. 世间的,世俗的,现世的,时间的,太阳穴的,颞的
precede [prɪ'siːd]:vt. 领先,在...之前,优于,高于 vi. 领先,在前面
velodyne ['vi:ləudain]:n. 一种转数表传感器,调速发电机
calibration [kælɪ'breɪʃ(ə)n]:n. 校准,刻度,标度
matrice:n. 矩阵,真值表,母式
Velodyne:威力登
Lagrangian Support Vector Machine,LSVM:
Lagrangian [lə'grændʒiən]:adj. 拉格朗日算符的,n. 拉格朗日算符
We evaluate 3D object detection performance using the PASCAL criteria also used for 2D object detection. Far objects are thus filtered based on their bounding box height in the image plane. As only objects also appearing on the image plane are labeled, objects in don’t car areas do not count as false positives. We note that the evaluation does not take care of ignoring detections that are not visible on the image plane - these detections might give rise to false positives. For cars we require an 3D bounding box overlap of 70%, while for pedestrians and cyclists we require a 3D bounding box overlap of 50%. Difficulties are defined as follows:
我们使用也用于 2D 目标检测的 PASCAL 标准来评估 3D 目标检测性能。因此,远处目标基于其在图像平面中的边界框高度被过滤。由于仅标记出现在图像平面上的目标,因此不在汽车区域中的目标不会被视为误报。我们注意到评估不会忽略在图像平面上看不到的检测 - 这些检测可能会导致误报。对于汽车,我们要求 3D 边界框重叠 70%。而对于行人和骑自行车者,我们要求 3D 边界框重叠 50%。困难定义如下:
Easy: Min. bounding box height: 40 Px, Max. occlusion level: Fully visible, Max. truncation: 15 %
Moderate: Min. bounding box height: 25 Px, Max. occlusion level: Partly occluded, Max. truncation: 30 %
Hard: Min. bounding box height: 25 Px, Max. occlusion level: Difficult to see, Max. truncation: 50 %
All methods are ranked based on the moderately difficult results.
所有方法都根据中等难度的结果进行排名。
criteria [kraɪ'tɪərɪə]:n. 标准,条件 (criterion 的复数)
cyclist ['saɪklɪst]:n. 骑自行车的人
moderately ['mɒd(ə)rətlɪ]:adv. 适度地,中庸地,有节制地
ignore [ɪg'nɔː]:vt. 驳回诉讼,忽视,不理睬
moderate ['mɒd(ə)rət]:adj. 稳健的,温和的,适度的,中等的,有节制的 vi. 变缓和,变弱 vt. 节制,减轻
truncation [trʌŋ'keɪʃən]:n. 截断,切掉顶端
occlude [ə'kluːd]:vt. 使闭塞,封闭,挡住 vi. 咬合
occlusion [ə'kluːʒ(ə)n]:n. 闭塞,吸收,锢囚锋
Important Policy Update: As more and more non-published work and re-implementations of existing work is submitted to KITTI, we have established a new policy: from now on, only submissions with significant novelty that are leading to a peer-reviewed paper in a conference or journal are allowed. Minor modifications of existing algorithms or student research projects are not allowed. Such work must be evaluated on a split of the training set. To ensure that our policy is adopted, new users must detail their status, describe their work and specify the targeted venue during registration. Furthermore, we will regularly delete all entries that are 6 months old but are still anonymous or do not have a paper associated with them. For conferences, 6 month is enough to determine if a paper has been accepted and to add the bibliography information. For longer review cycles, you need to resubmit your results.
重要政策更新:随着越来越多的未发表的工作和现有工作的重新实现提交给 KITTI,我们制定了一项新政策:从现在开始,只有具有重大新颖性的算法 (在 conference or journal 上获得同行评审的论文) 才允许提交。对现有算法的微小修改或学生研究项目不允许提交,在训练集划分出来的部分上评估此类工作。为确保我们的政策得到采纳,新用户必须详细说明其状态,描述他们的工作并在注册期间指定会议地点。此外,我们会定期删除所有存在 6 个月仍然是匿名的条目或没有与之相关的文件。对于会议,6 个月足以确定是否已接受论文并添加参考书目信息。对于较长的审核周期,您需要重新提交结果。
novelty ['nɒv(ə)ltɪ]:n. 新奇,新奇的事物,新颖小巧而廉价的物品
peer review:同业互查
venue ['venjuː]:n. 聚会地点,举行场所,犯罪地点,发生地点,审判地
regularly ['rɛɡjəlɚli]:adv. 定期地,有规律地,整齐地,匀称地
bibliography [,bɪblɪ'ɒgrəfɪ]:n. 参考书目,文献目录
Additional information used by the methods
Stereo: Method uses left and right (stereo) images
Flow: Method uses optical flow (2 temporally adjacent images)
Multiview: Method uses more than 2 temporally adjacent images
Laser Points: Method uses point clouds from Velodyne laser scanner
Additional training data: Use of additional data sources for training (see details)
adjacent [ə'dʒeɪs(ə)nt]:adj. 邻近的,毗连的
temporally:现世地,暂时地
Car
Pedestrian
Cyclist
2. Related Datasets
CERV Vehicle Lights Dataset: Annotations of vehicle lights for a subset of the object detection benchmark.
https://cerv.aut.ac.nz/vehicle-lights-dataset/
PASCAL3D+: Augments 12 rigid object classes of PASCAL VOC 2012 with 3D annotations.
http://cvgl.stanford.edu/projects/pascal3d.html
The PASCAL Visual Object Classes Challenges: Dataset and benchmarks for object class recognition.
TME Motorway Dataset: 28 video sequences with vehicle annotations captured from VisLab’s BRAiVE vehicle.
http://cmp.felk.cvut.cz/data/motorway/
LabelMe: Online annotation tool to build image databases for computer vision research.
http://labelme.csail.mit.edu/Release3.0/
MIT Street Scenes: Street-side images with labels for 9 object categories (including cars, pedestrians, buildings, trees).
http://cbcl.mit.edu/software-datasets/streetscenes/
Daimler Pedestrian Datasets: Datasets focusing on pedestrian detection for autonomous driving.
http://www.gavrila.net/Datasets/Daimler_Pedestrian_Benchmark_D/daimler_pedestrian_benchmark_d.html
Caltech Pedestrian Detection Benchmark: 10 hours of video with 350.000 annotated pedestrian bounding boxes.
http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/index.html
Robust Multi-Person Tracking from Mobile Platforms: Videos with annotated pedestrians captured from a stroller.
https://data.vision.ee.ethz.ch/cvl/aess/
rigid ['rɪdʒɪd]:adj. 严格的,僵硬的,死板的,坚硬的,精确的
motorway ['məʊtəweɪ]:n. 高速公路,汽车高速公路
Daimler ['daimlə]:n. 戴姆勒
citation [saɪ'teɪʃ(ə)n]:n. 引用,引证,传票,褒扬
3. Citation
When using this dataset in your research, we will be happy if you cite us:
Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite
更多推荐
所有评论(0)