1. A Transformer Network processes sentences from left to right, one word at a time.
  • False
  • True
  1. Transformer Network methodology is taken from:
  • GRUs and LSTMs
  • Attention Mechanism and RNN style of processing.
  • Attention Mechanism and CNN style of processing.
  • RNN and LSTMs
  1. **What are the key inputs to computing the attention value for each word? **
    在这里插入图片描述
  • The key inputs to computing the attention value for each word are called the query, knowledge, and vector.
  • The key inputs to computing the attention value for each word are called the query, key, and value.
  • The key inputs to computing the attention value for each word are called the quotation, key, and vector.
  • The key inputs to computing the attention value for each word are called the quotation, knowledge, and value.

解析:The key inputs to computing the attention value for each word are called the query, key, and value.

  1. Which of the following correctly represents Attention ?
  • A t t e n t i o n ( Q , K , V ) = s o f t m a x ( Q K T d k ) V Attention(Q,K,V)=softmax(\frac{QK^{T}}{\sqrt{d_k}})V Attention(Q,K,V)=softmax(dk QKT)V
  • A t t e n t i o n ( Q , K , V ) = s o f t m a x ( Q V T d k ) K Attention(Q,K,V)=softmax(\frac{QV^{T}}{\sqrt{d_k}})K Attention(Q,K,V)=softmax(dk QVT)K
  • A t t e n t i o n ( Q , K , V ) = m i n ( Q K T d k ) V Attention(Q,K,V)=min(\frac{QK^{T}}{\sqrt{d_k}})V Attention(Q,K,V)=min(dk QKT)V
  • A t t e n t i o n ( Q , K , V ) = m i n ( Q V T d k ) K Attention(Q,K,V)=min(\frac{QV^{T}}{\sqrt{d_k}})K Attention(Q,K,V)=min(dk QVT)K
  1. Are the following statements true regarding Query (Q), Key (K) and Value (V)?
    Q = interesting questions about the words in a sentence
    K = specific representations of words given a Q
    V = qualities of words given a Q
  • False
  • True

解析:Q = interesting questions about the words in a sentence, K = qualities of words given a Q, V = specific representations of words given a Q

在这里插入图片描述
i here represents the computed attention weight matrix associated with the i t h ith ith “word” in a sentence

  • False
  • True

解析: i i i here represents the computed attention weight matrix associated with the i t h ith ith “head” (sequence).

  1. Following is the architecture within a Transformer Network (without displaying positional encoding and output layers(s)).
    在这里插入图片描述
    What is generated from the output of the Decoder’s first block of Multi-Head Attention?
  • Q
  • K
  • V

解析:This first block’s output is used to generate the Q matrix for the next Multi-Head Attention block.

  1. Following is the architecture within a Transformer Network. (without displaying positional encoding and output layers(s))
    在这里插入图片描述
    What is the output layer(s) of the Decoder ? (Marked Y Y Y, pointed by the independent arrow)
  • Softmax layer
  • Linear layer
  • Linear layer followed by a softmax layer.
  • Softmax layer followed by a linear layer.
  1. Which of the following statements is true about positional encoding? Select all that apply.
  • Positional encoding is important because position and word order are essential in sentence construction of any language.

解析:This is a correct answer, but other options are also correct. To review the concept watch the lecture Transformer Network.

  • Positional encoding uses a combination of sine and cosine equations.

解析This is a correct answer, but other options are also correct. To review the concept watch the lecture Transformer Network.

  • Positional encoding is used in the transformer network and the attention model.
  • Positional encoding provides extra information to our model.
  1. Which of these is a good criterion for a good positionial encoding algorithm?
  • The algorithm should be able to generalize to longer sentences.
  • Distance between any two time-steps should be inconsistent for all sentence lengths.
  • It must be nondeterministic.
  • It should output a common encoding for each time-step (word’s position in a sentence).
GitHub 加速计划 / tra / transformers
62
5
下载
huggingface/transformers: 是一个基于 Python 的自然语言处理库,它使用了 PostgreSQL 数据库存储数据。适合用于自然语言处理任务的开发和实现,特别是对于需要使用 Python 和 PostgreSQL 数据库的场景。特点是自然语言处理库、Python、PostgreSQL 数据库。
最近提交(Master分支:4 个月前 )
94fe0b91 * Improved Documentation Of Audio Classification * Updated documentation as per review * Updated audio_classification.md * Update audio_classification.md 1 天前
c96cc039 * Improve modular transformers documentation - Adds hints to general contribution guides - Lists which utils scripts are available to generate single-files from modular files and check their content * Show commands in copyable code cells --------- Co-authored-by: Joel Koch <joel@bitcrowd.net> 1 天前
Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐