论文阅读笔记-A Survey on Graph Neural Networks and Graph Transformers in Computer Vision(GNN综述)
论文阅读笔记-GNN综述
主要介绍了GNN以及它在各个领域的应用
2D NATURAL IMAGES
Image Classification
Multi-Label Classification
ML-GCN:builds a directed graph on the basis of label space, where each node stands for a object label (word embeddings) and their connections model the inter-dependencies of different labels.
attention-driven GCN:model the label dependencies via more elaborate GNN architectures
hypergraph neural networks:model the label dependencies via more elaborate GNN architectures
Few-Shot Learning
论文名称 | 来源 | 主要思想 |
---|---|---|
Few-shot learning with graph neural networks | ICLR,2018 | formulate FSL as a supervised interpolation problem on a densely-connected graph, where the vertices stand for images in the collection and the adjacency is learnable with trainable similarity kernels. |
Learning to propagate labels: Transductive propagation network for few-shot learning | ICLR,2019 | constructs graphs on the top of embedding space to fully exploit the manifold structure of the novel classes.Label information is propagated from the support set to the query set based on the constructed graphs |
dge-labeling graph neural network for few-shot learning | CVPR,2019 | propose a edge-labeling GNN framework that learns to predict edge labels, explicitly constraining the intra- and inter-class similarities. |
Learning from the past: Continual meta-learning via bayesian graph modeling | AAAI,2020 | formulate meta-learning-based FSL as continual learning of a sequence of tasks and resort to Bayesian GNN to capture the intra- and inter-task correlations. |
Dpgn: Distribution propagation graph network for few-shot learning | CVPR,2020 | devise a dual complete graph network to model both distribution- and instance-level relations. |
Hierarchical graph neural networks for few-shot learning | TCSVT,2021 | exploit the hierarchical relationships among graph nodes via the bottom-up and top-down reasoning modules. |
Hybrid graph neural networks for few-shot learning | AAAI,2022 | introduce an instance GNN and a prototype GNN as feature embedding task adaptation modules for quickly adapting learned features to new tasks. |
Zero-Shot Learning (ZSL)
论文名称 | 来源 | 主要思想 |
---|---|---|
Rethinking knowledge graph propagation for zero-shot learning | CVPR,2019 | propose a Dense Graph Propagation (DGP) module to exploit the hierarchical structure of knowledge graph.It consists of two phases to iteratively propagate knowledge between a node and its ancestors and descendants. |
Region graph embedding network for zero-shot learning | ECCV,2020 | represent each input image as a region graph, where each node stands for an attended region in the image and the edges are appearance similarities among these region nodes. |
Attribute propagation network for graph zero-shot learning | AAAI,2020 | generates and updates attribute vectors with an attribute propagation network for optimizing the attribute space |
Isometric propagation network for generalized zero-shot learning | ICLR,2021 | introduce the visual and semantic prototype propagation on auto-generated graphs to enhance the inter-class relations and align the corresponding classwise dependencies in visual and semantic space |
Learning graph embeddings for open world compositional zero-shot learning | TPAMI, 2022 | introducing a Compositional Cosine Graph Embedding (Co-CGE) model to learn the relationship between primitives and compositions through a GCN.They quantitatively measure the feasibility scores of a state-object composition and incorporate the computed scores into CoCGE in two ways |
Gndan: Graph navigated dual attention network for zero-shot learning | IEEE TNNLS, 2022 | resort to GAT for exploiting the appearance relations between local regions and the cooperation between local and global features. |
Transfer Learning
论文名称 | 来源 | 主要思想 |
---|---|---|
Gcan: Graph convolutional adversarial network for unsupervised domain adaptation | CVPR,2019 | propose a Graph Convolutional Adversarial Network (GCAN) for DA, where a GCN is developed on top of densely-connected instance graphs to encode data structure information. |
Heterogeneous graph attention network for unsupervised multiple-target domain adaptation | IEEE TPAMI, 2020 | build a heterogeneous relation graph and introduce GAT to propagate the semantic information and generate reliable pseudo-labels. |
Curriculum graph co-teaching for multi-target domain adaptation | CVPR,2021 | introduce a GCN to aggregate information from different domains along with a co-teaching and curriculum learning strategy to achieve progressive adaptation. |
Progressive graph learning for open-set domain adaptation | ICML,2020 | study the problem of open-set DA via a progressive graph learning framework to select pseudo-labels and thus avoid the negative transfer. |
Prototype-matching graph network for heterogeneous domain adaptation | ACMMM 2020 | attain cross-domain prototype alignment based on features learned from different stages of GNNs. |
Learning to combine: Knowledge aggregation for multi-source domain adaptation | ECCV. Springer, 2020. | introduce a knowledge graph based on the prototypes of different domains to perform information propagation among semantically adjacent representations. |
Compound domain generalization via meta-knowledge encoding | CVPR,2022 | build global prototypical relation graphs and introduce a graph self-attention mechanism |
当前工作重点
Current work focuses on extracting adhoc knowledge graphs from the data for a certain task, which is heuristic and relies on the human prior
未来的方向
(1)develop general and automatic graph construction procedures,
(2)enhance the interactions between abstract graph structures and task-specific classifiers
(3)excavate more fine-grained building blocks (node and edge) to increase the capability of constructed graphs.
Object Detection
论文名称 | 来源 | 主要思想 |
---|---|---|
Reasoning-rcnn: Unifying adaptive global reasoning into large-scale object detection | CVPR,2019 | presents an adaptive global reasoning network for large-scale object detection by incorporating commonsense knowledge (category-wise knowledge graph) and propagating visual information globally |
Spatial-aware graph relation network for large-scale object detection | CVPR,2019 | adaptively discover semantic and spatial relationships without requiring prior handcrafted linguistic knowledge |
Relation networks for object detection | CVPR,2018 | introduces an adapted attention module to detection head networks, explicitly learning information between objects through encoding the longrange dependencies. |
Relationnet++: Bridging visual representations for object detection via transformer decoder | NeurIPS,2020 | presents a selfattention-based decoder module to embrace the strengths of different object/part representations within a single detection framework. |
Gar: Graph assisted reasoning for object detection | WACV,2020 | introduce a heterogeneous graph to jointly model object-object and object-scene relations. |
Graphfpn: Graph feature pyramid network for object detection | ICCV,2021 | propose a graph feature pyramid network (GraphFPN), which explores the contextual and hierarchical structures of an input image based on a superpixel hierarchy |
Relation matters: Foreground-aware graph-based relational reasoning for domain adaptive object detection | IEEE TPAMI,2022 | first builds intra- and inter-domain relation graphs in virtue of cyclic between-domain consistency without any prior knowledge about the target distribution. |
Sigma: Semantic-complete graph matching for domain adaptive object detection | ICCV,2021 | formulates DAOD as a graph matching problem by establishing cross-image graphs to model classconditional distributions on both domains |
Semantic relation reasoning for shot-stable few-shot object detection | CVPR,2022 | introduces a semantic relation reasoning module to integrate semantic information between base and novel classes for novel object detection |
说明:domain adaptive object detection (DAOD)
当前的工作重点
exploit between-object, cross-scale or cross-domain relationships, as well as relationships between base and novel classes
未来的方向
(1)design better region-to-node feature mapping methods,
(2)incorporate Transformer (or pure GNN) encoders to improve the expressive power of initial node features
(3)directly perform reasoning in the original feature space to better preserve the intrinsic structure of images.
Image Segmentation
一般的分割
论文题目 | 来源 | 主要思想 |
---|---|---|
Dual graph convolutional network for semantic segmentation | BMVC,2019 | targets on modeling the global context of input features via a dual GCN framework where a coordinate space GCN models spatial relationships between pixels in the image, and a feature space GCN models dependencies along the channel dimensions of the network’s feature map. |
Graph-based global reasoning networks | CVPR,2019 | design the global reasoning unit by projecting features that are globally aggregated in coordinate space to node domain and performing relational reasoning in a fullyconnected graph. |
Dynamic graph message passing networks | CVPR,2020 | dynamically samples the neighborhood of a node and then predicts the node dependencies, filter weights, and affinity matrix to attain information propagation |
Representative graph neural network | ECCV,2020 | propose to dynamically sample some representative nodes for relational modeling. |
Spatial pyramid based graph reasoning for semantic segmentation | CVPR,2020 | propose an improved Laplacian formulation that enables graph reasoning in the original feature space, fully exploiting the contextual relations at different feature scales. |
Class-wise dynamic graph convolution for semantic segmentation | ECCV,2020 | introduce a classwise dynamic graph convolution module to conduct graph reasoning over the pixels that belong to the same class |
Bidirectional graph reasoning network for panoptic segmentation | CVPR,2020 | design a bidirectional graph reasoning network to bridge the things branch and the stuff branch for panoptic segmentation. |
One-Shot Semantic Segmentation
论文题目 | 来源 | 主要思想 |
---|---|---|
Pyramid graph networks with connection attentions for region-based oneshot semantic segmentation | ICCV,2019 | introduce a pyramid graph attention module to model the connection between query and support feature maps |
Few-Shot Semantic Segmentation
论文题目 | 来源 | 主要思想 |
---|---|---|
Scale-aware graph neural network for few-shot semantic segmentation | CVPR,2021 | propose a scale-aware GNN to perform crossscale relational reasoning among support-query images. A self-node collaboration mechanism is introduced to perceive different resolutions of the same object. |
Weakly Supervised Semantic Segmentation
论文题目 | 来源 | 主要思想 |
---|---|---|
Affinity attention graph neural network for weakly supervised semantic segmentation | IEEE,TPAMI 2021 | an image will first be converted to a weighted graph via an affinity CNN network, and then an affinity attention layer is devised to obtain long-range interactions from the constructed graph and propagate semantic information to the unlabeled pixels |
当前的工作重点
explore contextual information in the localor global-level with pyramid pooling, dilated convolutions, or the self-attention mechanism
Scene Graph Generation (SGG)
任务概述:检测图像中的对象对及其关系以生成可视化的场景图的任务,它提供了对视觉场景的高级理解,而不是孤立地处理单个对象
论文题目 | 来源 | 主要思想 |
---|---|---|
Factorizable net: an efficient subgraph-based framework for scene graph generation | ECCV,2018 | a subgraph-based approach (each subgraph is regarded as a node), has a spatially weighted message passing structure to refine the features of objects and subgroups by passing messages among them with attention-like schemes |
Graph r-cnn for scene graph generation | ECCV,2018 | first obtain a sparse candidate graph by pruning the densely-connected graph generated from RPN via a relation proposal network, then an attentional GCN is introduced to aggregate contextual information and update node features and edge relationships |
Attentive relational networks for mapping images to scene graphs | CVPR,2019 | propose attentive relational networks, which first transform label word embeddings and visual features into a shared semantic space, and then rely on GAT to perform feature aggregation for final relation inference |
Bipartite graph network with adaptive message passing for unbiased scene graph generation | CVPR,2021 | introduce bipartite GNN to estimate and propagate relation confidence in a multi-stage manner. |
Energy-based learning for scene graph generation | CVPR,2021 | propose an energybased framework, which depends on graph message passing algorithm for computing the energy of configurations. |
VIDEO UNDERSTANDING
Video Action Recognition
任务介绍:视频人体动作识别是视频处理和理解的基本任务之一,其目的是识别和分类RGB/深度视频或骨架数据中的人体动作。
Action Recognition
论文题目 | 来源 | 主要思想 |
---|---|---|
propose to capture the long-range temporal contexts via graph-based reasoning over human-object and object-object relationships | ||
construct actor-centric object-level graph and applying GCNs to capture the contexts among objects in a actor-centric way.A relation-level graph is built to inference the contexts in relation nodes | ||
propose multi-scale reasoning in the temporal graph of a video, in which each node is a frame in the video, and the pairwise relations between nodes are represented as a learnable adjacent matrix | ||
extend the GCN-based relation modeling to zero-shot action recognition and leverage knowledge graphs to model the relations among actions and attributes jointly | ||
introduce a graph-based high-order relation modeling method for long-term action recognition. |
Skeleton-Based Action Recognition.
论文题目 | 来源 | 主要思想 |
---|---|---|
propose a STGCN network first connects joints in a frame according to the natural connectivity in the human body and then connects the same joints in two consecutive frames to maintain temporal information. | ||
introduce a fully-connected graph with learnable edge weights between joints and a data-dependent graph learned from the input skeleton. | ||
connect physically-apart skeleton joints to captures the patterns of collaborative moving joints | ||
improves the joints’ connection in a single frame by adding edges between limbs and head.it uses GCNs to capture joints’ relations in single frames and adopt the LSTM to capture the temporal dynamics. | ||
introduce to maintain edge features and learn both node and edge feature representations via directed graph convolution. | ||
first construct multiple dilated windows over temporal dimension.Then separately utilize GCNs on multiple graphs with different scales.Finally aggregate the results of GCNs on all the graphs in multiple windows to capture multi-scale and long-range dependencies. |
Temporal Action Localization
更多推荐
所有评论(0)