1 Title 

        VideoGPT: Video Generation using VQ-VAE and Transformers(Wilson Yan,Yunzhi Zhang ,Pieter Abbeel,Aravind Srinivas)

2 Conlusion

        This paper present VideoGPT: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos. VideoGPT uses VQ-VAE that learns downsampled discrete latent representations of a raw video by employing 3D convolutions and axial self-attention. A simple GPT-like architecture is then used to autoregressively model the discrete latents using spatio-temporal position encodings. 

3 Good Sentences

        1、High-fidelity natural videos is one notable modality that has not seen the same level of progress in generative modeling as compared to images, audio, and text. This is reasonable since the complexity of natural videos requires modeling correlations across both space and time with much higher input dimensions. Video modeling is therefore a natural next challenge for current deep generative models. (The significance of this work)
        2、The above line of reasoning leads us to our proposed model:VideoGPT, a simple video generation architecture that is a minimal adaptation of VQ-VAE and GPT architectures for videos.(The reason for choosing VideoGPT)
        3、Although the VQ-VAE is trained unconditionally, we can generate conditional samples by training a conditional prior. We use two types of conditioning:Cross Attention and Conditional Norms.(How to transform unconditional to conditional learning)


背景知识

        VQ-VAE

        VQ-VAE能利用codebook机制把图像编码成离散向量

Method

        

        整个训练过程如图所示,分为两个部分,训练VQ-VAE(左)和训练隐空间中的自回归Transformer(右)
        第一阶段与原始VQ-VAE训练过程类似。
        第二阶段,VQ-VAE将视频数据编码为隐序列作为先验模型的训练数据。首先从先验中采样隐序列,然后使用VQ-VAE将隐序列解码为视频样本。(Transformer的作用是引入条件,这里可以使用交叉注意力或者Conditional Norms:)

GitHub 加速计划 / tra / transformers
63
5
下载
huggingface/transformers: 是一个基于 Python 的自然语言处理库,它使用了 PostgreSQL 数据库存储数据。适合用于自然语言处理任务的开发和实现,特别是对于需要使用 Python 和 PostgreSQL 数据库的场景。特点是自然语言处理库、Python、PostgreSQL 数据库。
最近提交(Master分支:4 个月前 )
6e0515e9 * added changes from 32905 * fixed mistakes caused by select all paste * rename diff_dinov2... * ran tests * Fix modular * Fix tests * Use new init * Simplify drop path * Convert all checkpoints * Add figure and summary * Update paths * Update docs * Update docs * Update toctree * Update docs --------- Co-authored-by: BernardZach <bernardzach00@gmail.com> Co-authored-by: Zach Bernard <132859071+BernardZach@users.noreply.github.com> 5 天前
d8c1db2f Signed-off-by: jiqing-feng <jiqing.feng@intel.com> 5 天前
Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐