OOTB论文深扒

引言
通用视频跟踪器综述2.1 基于判别式相关滤波（DCF）的单目标跟踪2.1.1 判别式目标表征2.1.2 自适应尺度估计2.1.3 边界效应处理2.2 基于孪生神经网络（SNN）的单目标跟踪2.2.1 判别式目标表征2.2.2 自适应尺度估计2.2.3 训练数据平衡2.3 基于 Transformer 的单目标跟踪2.4 基于循环神经网络（RNN）的单目标跟踪2.5 基于生成对抗网络（GAN）的单目标跟踪2.6 其他卷积神经网络结构的单目标跟踪
卫星视频跟踪器综述3.1 跟踪器原型3.2 所用特征3.3 完全遮挡的识别与处理3.4 旋转估计3.5 数据源与跟踪目标3.6 评测基准
相关数据集4.1 通用评测基准数据集4.2 专用评测基准数据集
旋转目标跟踪基准（OOTB）5.1 多平台数据采集5.2 高质量旋转框（OBB）标注5.3 数据统计5.3.1 场景类型5.3.2 目标类别5.3.3 尺寸与长宽比5.4 属性5.5 评测协议5.5.1 精度图5.5.2 归一化精度图5.5.3 成功率图5.6 33 种前沿跟踪器选型评测
实验与分析6.1 定量评测6.1.1 整体评测结果6.1.2 类别化评测结果6.1.3 属性化评测结果6.2 定性评测6.3 运行速度分析
未来工作讨论与建议7.1 外观信息与运动线索融合7.2 密集目标跟踪7.3 运动估计7.4 精准目标表征7.5 适配的主干网络与特征7.6 视频增强
结论

二、核心翻译

摘要

卫星视频单目标跟踪（SOT）可连续获取任意目标的位置与范围信息，在遥感应用中价值显著。但现有跟踪器与数据集极少关注卫星视频中旋转目标的跟踪。为此，本文全面综述通用视频与卫星视频领域的各类跟踪范式与框架，进而提出 ** 旋转目标跟踪基准（OOTB）** 以推动视觉跟踪领域发展。

OOTB 包含110 段视频序列、29890 帧，覆盖车辆、船舶、飞机、火车四类卫星视频常见目标；所有帧均人工标注旋转包围框（OBB），每段序列标注12 类细粒度属性；同时提出高精度评测协议，用于跟踪器全面公平对比。

本文在 OOTB 上评测了33 种前沿跟踪器（共 58 个模型），覆盖不同特征、主干网络与跟踪范式，通过大量实验给出深度分析与基线结果，为后续研究提供参考。OOTB 开源地址：https://github.com/YZCU/OOTB

核心贡献

系统综述通用视频与卫星视频的单目标跟踪范式、框架及评测基准。
构建首个卫星视频旋转目标跟踪基准 OOTB，含高质量旋转框标注与专属评测协议。
评测 33 种前沿跟踪器，提供卫星视频跟踪的性能基线与未来研究方向。

关键挑战

卫星视频缺乏高质量、带旋转框标注的公开数据集与评测协议。
目标尺寸小、像素少，空间 / 光谱特征有限。
卫星平台高速运动，背景复杂，易受相似目标、遮挡、运动模糊、背景杂波干扰。

评测结论

基于孪生网络的跟踪器（SiamCAR、SiamFC++、SiamDW）整体性能最优。
旋转、完全遮挡、背景杂波是卫星视频跟踪最核心难题。
融合外观与运动特征的跟踪器（DF、RAMC、CFME）在复杂属性下更鲁棒。

三、全文完整精准翻译整理成对照版

卫星视频单目标跟踪：系统性综述与面向旋转目标的评测基准

英文标题：Satellite video single object tracking: A systematic review and an oriented object tracking benchmark期刊：ISPRS Journal of Photogrammetry and Remote Sensing, 2024开源基准：OOTB https://github.com/YZCU/OOTB

对照翻译（中文 | 英文）

摘要

卫星视频中的单目标跟踪（SOT）能够连续获取任意目标的位置与范围信息，在遥感应用中展现出巨大价值。然而，现有跟踪器与数据集很少关注卫星视频中旋转目标的单目标跟踪问题。为弥补这一空白，本文全面综述了通用视频与卫星视频领域的各类跟踪范式与框架，进而提出旋转目标跟踪基准（OOTB），以推动视觉跟踪领域发展。OOTB 包含 110 个视频序列、共 29890 帧，覆盖卫星视频常见目标类别：车辆、船舶、飞机、火车。所有帧均采用旋转包围框（OBB）进行人工标注，每个序列标注 12 类细粒度属性。此外，本文提出一套高精度评测协议，用于跟踪器的全面、公平对比。为验证现有跟踪器并探索适用于卫星视频跟踪的框架，本文评测了 33 种前沿跟踪器（共 58 个模型），涵盖不同特征、主干网络与跟踪类型。最后，本文提供大量实验结果与深刻见解，帮助理解其性能并为未来研究提供基线结果。

表格

中文	English
卫星视频中的单目标跟踪（SOT）能够连续获取任意目标的位置与范围信息，在遥感应用中展现出巨大价值。	Single object tracking (SOT) in satellite video (SV) enables the continuous acquisition of position and range information of an arbitrary object, showing promising value in remote sensing applications.
然而，现有跟踪器与数据集很少关注卫星视频中旋转目标的单目标跟踪问题。	However, existing trackers and datasets rarely focus on the SOT of oriented objects in SV.
为弥补这一空白，本文全面综述了通用视频与卫星视频领域的各类跟踪范式与框架，进而提出旋转目标跟踪基准（OOTB），以推动视觉跟踪领域发展。	To bridge this gap, this article presents a comprehensive review of various tracking paradigms and frameworks covering both the general video and satellite video domains and subsequently proposes the oriented object tracking benchmark (OOTB) to advance the field of visual tracking.
OOTB 包含 110 个视频序列、共 29890 帧，覆盖卫星视频常见目标类别：车辆、船舶、飞机、火车。	OOTB contains 29,890 frames from 110 video sequences, covering common satellite video object categories including car, ship, plane, and train.
所有帧均采用旋转包围框（OBB）进行人工标注，每个序列标注 12 类细粒度属性。	All frames are manually annotated with oriented bounding boxes, and each sequence is labeled with 12 fine-grained attributes.
此外，本文提出一套高精度评测协议，用于跟踪器的全面、公平对比。	Additionally, a high-precision evaluation protocol is proposed for comprehensive and fair comparisons of trackers.
为验证现有跟踪器并探索适用于卫星视频跟踪的框架，本文评测了 33 种前沿跟踪器（共 58 个模型），涵盖不同特征、主干网络与跟踪类型。	To validate the existing trackers and explore frameworks suitable for SV tracking, we benchmark 33 state-of-the-art trackers totaling 58 models with different features, backbones, and tracker tags.
最后，本文提供大量实验结果与深刻见解，帮助理解其性能并为未来研究提供基线结果。	Finally, extensive experiments and insightful thoughts are also provided to help understand their performance and offer baseline results for future research.

1 引言

表格

中文	English
单目标跟踪（SOT）是计算机视觉领域最基础的任务之一，可在视频序列中建立目标对应关系。给定初始状态，SOT 旨在确定任意目标在后续帧中的状态。	Single object tracking (SOT) is one of the most essential tasks in computer vision, which allows the establishment of object correspondences in video sequences. Given the initial state, SOT aims to determine subsequent states of an arbitrary object.
SOT 可应用于自动驾驶、智能监控、机器人、增强现实等诸多领域。	SOT can be applied to a variety of fields such as autonomous driving, intelligent surveillance, robotics, and augmented reality.
跟踪技术受到广泛关注，大量先进跟踪器被提出以解决尺度变化、形变、相似外观、光照变化等现实挑战。	Tracking technology has received a lot of attention, and many advanced trackers have been proposed to solve realistic challenges such as scale variation, deformation, similar appearance, and illumination changes.
随着跟踪器不断发展，评测基准在性能评估中发挥基础性作用。	With the advancement of trackers, the tracking benchmark plays a fundamental role in performance evaluation.
多个广泛使用的基准（如 LaSOT、TrackingNet、LasHeR）被发布，用于评估跟踪器并推动视觉跟踪发展。	Several widely used benchmarks such as LaSOT, TrackingNet, and LasHeR have been released for evaluating trackers and promoting the development of visual tracking.
卫星视频是一种宝贵的地表观测数据，能提供特定区域丰富的静态与动态信息。	Satellite video (SV) is a valuable surface observation data that provides a wealth of static and dynamic information on specific areas.
卫星视频数据的出现提升了遥感观测能力，也为视觉跟踪领域带来新机遇。	The emergence of SV data enhances remote sensing observation capabilities and facilitates the visual tracking community.
卫星视频单目标跟踪在智能交通监控与分析等领域具有广阔应用前景。	SOT in SV has promising applications in intelligent traffic surveillance and analysis.
相比之下，由于缺乏标注完善的基准数据集与评测协议，卫星视频目标跟踪进展远落后于通用视频。	In contrast, progress in SV object tracking still lags far behind that of GV due to the lack of well-annotated benchmark datasets and evaluation protocols.
卫星视频目标跟踪难以实现精准鲁棒跟踪，主要面临以下挑战：	It is also difficult to achieve accurate and robust tracking due to the following challenges:
1）高质量公开的卫星视频单目标跟踪数据集与基准不足，尤其缺少带旋转包围框标注的数据。	1) Insufficient high-quality public datasets and benchmarks for SV SOT, especially those with oriented bounding box (OBB) annotations.
2）卫星视频通常为 RGB 三波段，目标光谱特征有限；运动目标尺寸小、像素少，空间特征（上下文、纹理）受限。	2) SV typically has limited spectral features; moving objects are small with few pixels, leading to limited spatial features.
3）卫星平台高速运动，背景非平稳且复杂，小目标易受相似外观、部分遮挡、运动模糊、背景杂波等异常干扰。	3) The high-speed satellite platform causes non-stationary complex backgrounds; small objects suffer from similar appearance, occlusion, motion blur, and background clutters.
本文建立了首个公开的卫星视频旋转目标跟踪基准 OOTB。	This article establishes the first available oriented object tracking benchmark (OOTB) for SOT in SV.

2 通用视频跟踪器综述

2.1 基于判别式相关滤波（DCF）的单目标跟踪

表格

中文	English
过去十年，DCF 在各类基准上展现出高性能与高效率。	Over the last decade, DCFs have proved their high performance and efficiency on various benchmarks.
DCF 通过最小化二乘误差学习滤波器，确定目标位置并更新模型以适应目标变化。	The DCF learns a filter by minimizing a least-squares error to determine the object’s position and updates the model to adapt to object changes during tracking.
2.1.1 判别式目标表征	2.1.1 Discriminative object representations
自 MOSSE 提出以来，基于 DCF 的跟踪器成为研究热点。	DCF-based trackers have been a highlight since the introduction of MOSSE.
CSK 引入循环矩阵与核技巧提升跟踪性能。	CSK modeled after MOSSE introduces the circular matrix and kernel trick to improve tracking performance.
后续工作融合颜色名、HOG 等手工特征，提升表征能力。	Later works integrate color name, HOG, and other hand-crafted features for enhanced representation.
深度学习兴起后，越来越多 DCF 跟踪器使用深度卷积特征。	Inspired by deep learning, an increasing number of DCF-based trackers utilize deep convolutional neural network (CNN) features.
2.1.2 自适应尺度估计	2.1.2 Adaptive scale estimation
标准 DCF 使用固定模板，无法处理尺度变化，易导致跟踪漂移。	Standard DCF-based trackers use a fixed-size template and are unable to handle scale changes, leading to severe tracking drifts.
DSST 先通过二维滤波器估计目标位置，再用一维滤波器估计尺度，高效有效。	DSST first estimates the object position using a 2D filter and then uses a 1D filter for scale estimation, efficient and effective.
前沿跟踪器采用深度框回归方法，无需手动设置尺度参数。	Recent SOTA trackers adopt deep bounding box regression without manual scale parameters.
2.1.3 边界效应处理	2.1.3 Handling boundary effects
训练样本周期性假设带来的边界效应，严重限制搜索区域并降低模型判别能力。	The boundary effect caused by the periodic assumption severely limits the search region and degrades discrimination capability.
CFLB、SRDCF、BACF、CSR-DCF 等方法通过空间约束、真实负样本等方式缓解边界效应。	Methods such as CFLB, SRDCF, BACF, CSR-DCF alleviate boundary effects via spatial constraints and real negative samples.

2.2 基于孪生神经网络（SNN）的单目标跟踪

表格

中文	English
孪生网络跟踪器包含模板分支与候选分支，共享卷积神经网络权重。	Conventionally, SNN-based trackers consist of a template branch and a candidate branch, sharing CNN weights.
2.2.1 判别式目标表征	2.2.1 Discriminative object representation
主干网络决定特征判别能力，SiamFC 率先使用 AlexNet，性能超越同期 DCF 跟踪器。	The backbone determines discriminative power; SiamFC pioneers using AlexNet and outperforms DCF trackers.
SiamRPN 引入区域建议网络（RPN），大幅提升精度与速度。	SiamRPN introduces the regional proposal network (RPN), greatly improving accuracy and speed.
SiamRPN++、SiamDW 采用更深更宽主干（ResNet、VGG、Inception），突破平移不变性限制。	SiamRPN++ and SiamDW use deeper and wider backbones, breaking translational invariance restrictions.
2.2.2 自适应尺度估计	2.2.2 Adaptive scale estimation
早期孪生跟踪器采用多尺度搜索，计算代价高。	Early SNN-based trackers use multi-scale search with high computation cost.
RPN 成为锚框回归主流方案，SiamRPN 系列广泛采用。	RPN becomes the mainstream anchor-based bounding box regression approach.
无锚框方法（Center-based、Keypoint-based）简化结构，不依赖超参，逐渐成为趋势。	Anchor-free methods simplify structure and are free of hyperparameters, becoming a trend.
2.2.3 训练数据平衡	2.2.3 Balancing training data
离线训练正负样本不均衡，影响模型判别能力。	Imbalanced positive/negative samples in offline training harm discriminative ability.
DaSiamRPN、C-RPN 采用难负样本挖掘，缓解样本失衡问题。	DaSiamRPN and C-RPN use hard negative sampling to alleviate imbalance.

2.3 基于 Transformer 的单目标跟踪

表格

中文	English
Transformer 基于注意力机制实现序列到序列转换，在跟踪领域取得显著进展。	Transformer transforms sequences using attention-based encoders and decoders, making remarkable progress in tracking.
分为 CNN-Transformer 跟踪器与全 Transformer 跟踪器。	They are classified into CNN-Transformer trackers and Fully-Transformer trackers.
CNN-Transformer 用 CNN 提取特征，Transformer 实现特征交互。	CNN-Transformer trackers use CNN for feature extraction and Transformer for feature interaction.
全 Transformer 分为双流两阶段与单流单阶段范式，结构更简洁、学习能力更强。	Fully-Transformer trackers include two-stream two-stage and one-stream one-stage paradigms, simpler and more powerful.

2.4 基于 RNN 的单目标跟踪

表格

中文	English
RNN 擅长处理时序数据，被用于建模时空信息与目标外观。	RNN is good at processing sequential data and is used to model spatio-temporal information and object appearance.
但训练复杂、参数量大，目前基于 RNN 的跟踪器较少。	However, complex training and large parameters result in few RNN-based trackers.

2.5 基于 GAN 的单目标跟踪

表格

中文	English
GAN 可捕捉数据分布、生成训练样本，用于解决样本不均衡问题。	GAN captures data distribution and generates training samples to address imbalance.
VITAL、TGGAN、ADT 等将 GAN 用于跟踪，取得稳健效果。	VITAL, TGGAN, ADT apply GAN to tracking and achieve robust performance.
但 GAN 可解释性差、训练难度高。	However, GAN is difficult to interpret and train.

2.6 其他 CNN 结构的单目标跟踪

表格

中文	English
基于图神经网络（GNN）、传统 CNN 的跟踪器也被提出。	Trackers based on graph neural network (GNN) and traditional CNN are also developed.
各类范式相互融合，取长补短，提升跟踪性能。	Tracking paradigms are integrated to draw on each other’s strengths and enhance performance.

3 卫星视频跟踪器综述

表格

中文	English
多个卫星视频跟踪器被提出，在自建数据集上取得优异效果。	Several trackers have been developed for SOT in SV, achieving superior results on home-grown datasets.
3.1 跟踪器原型	3.1 Tracker prototype
多数卫星视频跟踪器继承通用视频范式：DCF、SNN、CNN、Transformer。	Most SV trackers inherit paradigms from GV: DCF, SNN, CNN, Transformer.
基于 DCF 的跟踪器速度快，基于孪生网络与 Transformer 的跟踪器精度更高。	DCF-based trackers run fast; SNN/Transformer-based trackers achieve higher accuracy.
3.2 所用特征	3.2 Exploited features
分为空间特征（手工特征、深度外观特征）与时间特征（光流、深度运动特征、物理运动特征）。	Features include spatial (hand-crafted, deep appearance) and temporal (optical flow, deep motion, physical motion).
融合空间与时间特征能有效应对复杂挑战。	Fusing spatial and temporal features effectively copes with challenging attributes.
3.3 完全遮挡的识别与处理	3.3 Recognition and treatment of full occlusion
完全遮挡是卫星视频典型难题，需解决遮挡感知、遮挡处理、遮挡结束感知三个子问题。	Full occlusion is challenging; three sub-problems: awareness, handling, end awareness.
常用 APCE、峰值、PSR 等指标判断遮挡，用卡尔曼滤波、轨迹拟合预测目标状态。	Indicators like APCE, peak value, PSR detect occlusion; KF, trajectory fitting predict states.
3.4 旋转估计	3.4 Rotation estimation
目标旋转会导致水平框（HBB）跟踪精度下降。	Object rotation degrades HBB-based tracking accuracy.
输出旋转框（OBB）的跟踪器通过角度池匹配实现旋转估计；输出 HBB 的跟踪器采用旋转不变特征。	OBB-output trackers use angle pools; HBB-output trackers use rotation-invariant features.
3.5 数据源与跟踪目标	3.5 Data source and tracked object
卫星视频主要来自吉林一号、SkySat、ISS、Carbonite-2。	SV data mainly comes from JL-1, SkySat, ISS, Carbonite-2.
跟踪目标以车辆、船舶、飞机为主，火车因长宽比大更具挑战性。	Tracked objects: car, ship, plane; train is more challenging due to large aspect ratio.
3.6 评测基准	3.6 Evaluation benchmark
多数跟踪器采用 OTB 的一次性评估（OPE），少数采用 VOT。	Most trackers use OPE of OTB; few use VOT.
现有基准无法精准评估旋转框结果，精度分数易受目标尺寸影响。	Existing benchmarks cannot accurately evaluate OBB results; precision is sensitive to object size.

4 相关数据集

表格

中文	English
评测基准数据集对跟踪器公平标准化评估至关重要，分为通用与专用基准。	Benchmark datasets are essential for fair evaluation; classified into generic and specific.
4.1 通用评测基准数据集	4.1 Generic benchmark datasets
OTB50/100、NFS、VOT2018、LaSOT、TrackingNet、GOT-10k 等覆盖自然场景各类目标。	OTB50/100, NFS, VOT2018, LaSOT, TrackingNet, GOT-10k cover various objects in natural scenes.
均采用水平框标注，用于通用目标跟踪评测。	All use horizontal bounding box (HBB) annotations for generic tracking evaluation.
4.2 专用评测基准数据集	4.2 Specific benchmark datasets
UAV123、LasHeR、TOTB、VISO、SatSOT、SV248S、XDU-BDSTU、ThickSiam_D 面向特定场景。	UAV123, LasHeR, TOTB, VISO, SatSOT, SV248S, XDU-BDSTU, ThickSiam_D target specific scenarios.
卫星视频专用数据集仍缺少高质量旋转框标注。	SV-specific datasets lack high-quality OBB annotations.
本文提出的 OOTB 是首个面向卫星视频单目标跟踪的旋转目标基准。	The proposed OOTB is the first OBB benchmark dedicated to SV SOT.

5 旋转目标跟踪基准（OOTB）

5.1 多平台数据采集

表格

中文	English
OOTB 数据来自吉林一号、SkySat、ISS 等多平台，保证数据集多样性。	OOTB data is sampled from JL-1, SkySat, ISS, etc., ensuring dataset diversity.

5.2 高质量旋转框（OBB）标注

表格

中文	English
旋转框（OBB）相比水平框（HBB）更紧凑，能抑制背景干扰，尤其适合大长宽比、有角度目标。	OBB is more compact and suppresses background interference than HBB, especially for large aspect ratio and oriented objects.
采用 roLabelImg 标注，放大 10 倍保证精度，标注格式：中心坐标、宽、高、旋转角。	Annotated with roLabelImg at 10× zoom; format: center (x,y), width, height, rotation angle θ.
多人多次校验，确保标注一致性与高质量。	Annotated and refined by multiple annotators to ensure consistency and high quality.

5.3 数据统计

表格

中文	English
共 110 段序列、29890 帧：车辆 45、船舶 30、飞机 25、火车 10。	110 sequences, 29,890 frames: 45 cars, 30 ships, 25 planes, 10 trains.
平均视频长度 271.7 帧，目标平均面积：车辆 109.7、船舶 238.7、飞机 2075.3、火车 1949.0 像素。	Average video length: 271.7 frames; average object area: car 109.7, ship 238.7, plane 2075.3, train 1949.0 pixels.
平均长宽比：车辆 2.0、船舶 2.0、飞机 1.2、火车 10.9。	Average aspect ratio: car 2.0, ship 2.0, plane 1.2, train 10.9.

5.4 属性定义

表格

中文	English
共 12 类细粒度属性，贴合卫星视频特点：	12 fine-grained attributes tailored for SV:
DEF 形变、IPR 平面内旋转、PO 部分遮挡、FO 完全遮挡、IV 光照变化、MB 运动模糊、BC 背景杂波、OON 超常长宽比、SA 相似外观、LT 纹理缺失、IM 同向运动、AM 异向运动。	DEF (Deformation), IPR (In-Plane Rotation), PO (Partial Occlusion), FO (Full Occlusion), IV (Illumination Variation), MB (Motion Blur), BC (Background Clutters), OON (Out-of-Normal), SA (Similar Appearance), LT (Less Textures), IM (Isotropic Motion), AM (Anisotropic Motion).

5.5 高精度评测协议

表格

中文	English
精度图：中心误差（CLE）小于阈值的帧占比，卫星视频采用 5 像素阈值。	Precision plot: percentage of frames with center location error (CLE) < threshold; 5 pixels for SV.
归一化精度图：归一化中心误差占比，消除目标尺寸影响。	Normalized precision plot: normalized CLE, eliminating object size influence.
成功率图：交并比（IOU）大于阈值的帧占比。	Success plot: percentage of frames with intersection-over-union (IOU) > threshold.
同时支持 HBB 与 OBB 输出的公平评测。	Supports fair evaluation for both HBB and OBB outputs.

5.6 前沿跟踪器选型

表格

中文	English
评测 33 种前沿跟踪器、58 个模型，覆盖所有主流范式：	Benchmark 33 SOTA trackers (58 models), covering all mainstream paradigms:
CSK、SAMF、DAT、KCF、SRDCF、Staple、DSST、BACF、SiamRPN、DaSiamRPN、ARCF、SiamRPN++、UpdateNet、SiamDW、SiamMask、SiamBAN、SiamFC++、AutoTrack、CFME、SiamGAT、LightTrack、Stark、SiamCAR、OSTrack、SimTrack、DF、RAMC、SBT、GRM、SeqTrack、ARTrack、ODTrack、SMAT。	CSK, SAMF, DAT, KCF, SRDCF, Staple, DSST, BACF, SiamRPN, DaSiamRPN, ARCF, SiamRPN++, UpdateNet, SiamDW, SiamMask, SiamBAN, SiamFC++, AutoTrack, CFME, SiamGAT, LightTrack, Stark, SiamCAR, OSTrack, SimTrack, DF, RAMC, SBT, GRM, SeqTrack, ARTrack, ODTrack, SMAT.

6 实验与分析

6.1 定量评测

表格

中文	English
整体结果：SiamCAR、SiamFC++、SiamDW 性能最优；DSST、DF、RAMC 表现稳健。	Overall: SiamCAR, SiamFC++, SiamDW perform best; DSST, DF, RAMC are robust.
类别结果：飞机最易跟踪（尺寸大、特征明显）；车辆最难（尺寸小、背景复杂）；火车因形变与大长宽比极具挑战。	Category: Plane easiest (large size, clear features); car hardest (small, complex background); train very challenging (deformation, large aspect ratio).
属性结果：完全遮挡（FO）、平面内旋转（IPR）、背景杂波（BC）是最大挑战；DF、CFME、RAMC 在遮挡与旋转下更鲁棒。	Attribute: FO, IPR, BC are most challenging; DF, CFME, RAMC robust to occlusion and rotation.

6.2 定性评测

表格

中文	English
可视化结果表明：融合外观与运动特征的跟踪器应对复杂挑战更有效；深度特征跟踪器优于手工特征跟踪器。	Qualitative results show: fusing appearance and motion is effective; deep feature trackers outperform hand-crafted ones.
火车跟踪仍是巨大难题。	Train tracking remains extremely difficult.

6.3 运行速度分析

表格

中文	English
CPU 上 CSK、KCF、DAT 速度最快（>300 FPS）。	On CPU: CSK, KCF, DAT are fastest (>300 FPS).
GPU 上 SiamFC++、DaSiamRPN 速度领先（>150 FPS）。	On GPU: SiamFC++, DaSiamRPN lead (>150 FPS).
最新 Transformer 跟踪器速度较低，精度–速度权衡仍需研究。	Recent Transformer trackers have lower speed; accuracy-speed trade-off needs research.

7 未来工作建议

表格

中文	English
7.1 外观信息与运动线索融合：光流、轨迹信息可提升相似目标、异向运动场景性能。	7.1 Synergy of appearance and motion: Optical flow and trajectory improve performance for similar objects and anisotropic motion.
7.2 密集目标跟踪：提取更判别性特征，分析邻近目标运动状态。	7.2 Dense object: Exploit discriminative features and motion of nearby objects.
7.3 运动估计：消除背景运动，融合多模态数据提升定位精度。	7.3 Motion estimation: Eliminate background motion; fuse multi-modal data for precise localization.
7.4 精准目标表征：输出旋转框、关键点、分割掩码，获取完整目标信息。	7.4 Precise representation: Output OBB, keypoints, masks for complete object information.
7.5 适配主干与特征：设计面向卫星视频的网络与特征，利用遥感数据预训练。	7.5 Suitable backbones and features: Design SV-specific networks; pre-train on remote sensing data.
7.6 视频增强：时空超分、去模糊、画质增强提升跟踪稳定性。	7.6 Video enhancement: Spatio-temporal super-resolution, deblurring improve stability.

8 结论

表格

中文	English
本文系统综述卫星视频单目标跟踪方法与数据集，提出首个卫星视频旋转目标跟踪基准 OOTB。	This paper systematically reviews SV SOT methods and datasets, and proposes the first OBB benchmark OOTB for SV.
OOTB 包含高质量旋转框标注与高精度评测协议，为该领域提供标准化评测平台。	OOTB provides high-quality OBB annotations and a precise protocol for standardized evaluation.
大量实验表明，卫星视频目标跟踪仍极具挑战，未来需在特征、运动建模、旋转估计、轻量化模型等方向深入研究。	Extensive experiments show SV tracking remains challenging; future research needs better features, motion modeling, rotation estimation, and lightweight models.

四、OOTB数据集中12种属性的定义说明

OOTB 数据集 12 种属性官方定义（论文可直接插入）

序号	缩写	英文属性名	中文属性名	官方定义（Description）
1	DEF	Deformation	形变	目标发生非刚性形变。
2	IPR	In-Plane Rotation	平面内旋转	目标在图像平面内发生旋转。
3	PO	Partial Occlusion	部分遮挡	目标在卫星视频中被部分遮挡。
4	FO	Full Occlusion	完全遮挡	目标在卫星视频中被完全遮挡。
5	IV	Illumination Variation	光照变化	目标周围光照发生显著变化。
6	MB	Motion Blur	运动模糊	因目标或卫星平台运动导致目标区域模糊。
7	BC	Background Clutters	背景杂波	目标附近背景与目标具有相似纹理或颜色。
8	OON	Out-of-Normal	超常长宽比	视频中包围框长宽比超出 [0.3, 3] 范围。
9	SA	Similar Appearance	相似外观	被跟踪目标附近存在外观相似的目标。
10	LT	Less Textures	纹理缺失	目标纹理信息极少，难以区分。
11	IM	Isotropic Motion	同向运动	附近存在运动大小与方向均相似的目标。
12	AM	Anisotropic Motion	异向运动	附近存在运动大小相似但方向相反的目标。

五、论文的核心图表标注 + 可直接插入论文的中英双语图表说明

一、核心图表 1：典型卫星视频样例

图 Fig. 1 Sample frames from video satellites.

英文图注Sample frames captured by (a) SkySat-1, (b) ISS, (c) Luojia-3–01, (d) OVS-1, and (e) Jilin-1 satellites.中文图注不同卫星平台拍摄的卫星视频样例。(a) SkySat-1，(b) ISS，(c) 珞珈三号 - 01，(d) OVS-1，(e) 吉林一号。

二、核心图表 2：卫星视频小目标示意图

图 Fig. 2 Visual examples of SV objects.

英文图注Visualization of small objects in satellite videos: (a) original frame, (b) zoomed car, (c) zoomed plane. Objects are annotated with yellow boxes.中文图注卫星视频小目标示例。(a) 原始帧，(b) 放大车辆，(c) 放大飞机。目标使用黄色框标注。

三、核心图表 3：卫星视频典型挑战可视化

图 Fig. 3 Visualization of several abnormal interferences.

英文图注Challenging attributes in satellite videos: (a) similar appearance, (b) partial occlusion, (c) motion blur, (d) background clutters.中文图注卫星视频中典型干扰挑战。(a) 相似目标，(b) 部分遮挡，(c) 运动模糊，(d) 背景杂波。

四、核心图表 4：HBB 与 OBB 对比

图 Fig. 4 Visualization of the HBB (red) and OBB (green).

英文图注Comparison between horizontal bounding box (HBB, red) and oriented bounding box (OBB, green). OBB is more compact and suppresses background interference.中文图注水平框（红色）与旋转框（绿色）对比。旋转框更紧凑，能有效抑制背景干扰。

五、核心图表 5：OOTB 数据集目标统计

图 Fig. 8 Overview of the OOTB dataset.

英文图注Statistics of OOTB: 45 cars, 30 ships, 25 planes, 10 trains, 29,890 frames in total. The blue bar denotes object size; the red line denotes frame count.中文图注OOTB 数据集统计：包含 45 段车辆、30 段船舶、25 段飞机、10 段火车序列，共 29890 帧。蓝色柱为目标尺寸，红色线为帧数量。

六、核心图表 6：目标长宽比分布

图 Fig. 9 The aspect ratios of the OOTB dataset.

英文图注Aspect ratio distribution of four object categories. Train has the largest aspect ratio ranging from 4.4 to 14.8.中文图注OOTB 四类目标长宽比分布。火车具有最大长宽比，范围 4.4–14.8。

七、核心图表 7：12 类属性分布

图 Fig. 10 Attribute distribution for each type of object.

英文图注Distribution of 12 fine-grained attributes over car, ship, plane, and train sequences in OOTB.中文图注OOTB 数据集中车辆、船舶、飞机、火车的 12 类细粒度属性分布。

八、核心图表 8：精确度 / 归一化精确度 / 成功率曲线图

图 Fig. 14 Overall results for the top 30 trackers on OOTB.

英文图注(a) Precision plot, (b) normalized precision plot, (c) success plot of top 30 trackers on OOTB. Values in legend represent precision at 5px, normalized precision, and success AUC.中文图注OOTB 上排名前 30 跟踪器的 (a) 精确度曲线、(b) 归一化精确度曲线、(c) 成功率曲线。图例数值分别表示 5 像素精确度、归一化精确度、成功率 AUC。

九、核心图表 9：按类别分曲线图（车 / 船 / 飞机 / 火车）

图 Fig. 15 Category-based results.

英文图注Precision, normalized precision, and success plots for car, ship, plane, and train categories.中文图注车辆、船舶、飞机、火车四类目标的精确度、归一化精确度与成功率曲线。

十、核心图表 10：12 类属性成功率曲线

图 Fig. 16 Success plots under 12 attributes.

英文图注Success plots of top trackers under 12 challenging attributes: DEF, IPR, PO, FO, IV, MB, BC, OON, SA, LT, IM, AM.中文图注主流跟踪器在 12 类挑战属性下的成功率曲线：形变、平面内旋转、部分遮挡、完全遮挡、光照变化、运动模糊、背景杂波、超常长宽比、相似外观、纹理缺失、同向运动、异向运动。

十一、图表说明模板（论文通用段落）

英文

Fig. X illustrates the qualitative tracking results on OOTB. Ground truth is marked in green, and predictions are marked in other colors. Better tracking performance corresponds to higher overlap and more accurate localization.