解决NotImplementedError：与transformers中预训练模型的get_input_embeddings等函数有关

Holmes＇K

3347人浏览 · 2023-07-06 01:40:13

Holmes＇K · 2023-07-06 01:40:13 发布

进行深度学习实验的时候，对模型model进行：

model.get_input_embeddings()

报错：NotImplementedError

解决报错思路：
如果一眼凭借经验你看不出来，也无法通过简单的检索获得有效的指导时，应该深入报错位置进行检查。

报错位置的相应代码：

def get_input_embeddings(self) -> nn.Module:
     """
     Returns the model's input embeddings.

     Returns:`nn.Module`: A torch module mapping vocabulary to hidden states.
     """
     base_model = getattr(self, self.base_model_prefix, self)
     if base_model is not self:
         return base_model.get_input_embeddings()
     else:
         raise NotImplementedError

可见报错原因是目标模型经过getattr函数处理后得到的基模型base_model与模型本身self是相同的类型（默认的get_input_embeddings函数是用于目标模型的base_model与目标模型本身不同的场景），我推测代码因为目标模型是我在bert基础上自己搭建的pairwise模型，毕竟不是官方给定的模型，因此导致该函数无法准确找到目标模型的基模型（BERT），只能把self当做base_model。

出现NotImplementedError的原因就是基模型base_model与模型本身self是相同的类型。解决思路一是直接在get_input_embeddings函数代码上修改，但是这个函数代码是第三方库的代码。直接修改风险很大，我刚开始就尝试改为：

if base_model is not self:
     return base_model.get_input_embeddings()
else:
     return base_model.get_input_embeddings()
     raise NotImplementedError

但继续报错，因此直接在第三方库的代码上修改不是一个最好的办法，虽然少数情况这样可能有效。

因此遇到这种困境我们就要尝试把思路打开，从报错位置出发，不断溯源，寻找可能影响报错的因素。
就比如我们在get_input_embeddings函数中不论两者是否相同都返回base_model.get_input_embeddings()，这样就能不报错。但是该方式行不通，因此，我们可以尝试如何让base_model与self不同，或者重新定义目标模型对应的get_imput_embeddings函数。
通过查看self模型（目标模型）的结构，我们发现它的embeddings是bert的embeddings，因此我们能否在不修改第三方库代码的前提下传入bert的embeddings呢？
于是我尝试向外层进行分析。我找到目标模型的代码，如下：

class MinitForPairwiseLearning(BertPreTrainedModel):
    def __init__(self, config, loss_function="label-smoothing-cross-entropy", smoothing=0.1):
        super().__init__(config)
        print("config:",config)

        #There should be at least relevant and non relevant options.
        self.num_labels = config.num_labels+1
        self.miniLM = AutoModel.from_pretrained("cross-encoder/ms-marco-MiniLM-L-12-v2")
        #print("mini_model:", self.miniLM)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)#0.1
        self.classifier = nn.Linear(config.hidden_size, self.num_labels)#hidden_size=384

        if loss_function == "cross-entropy":
            self.loss_fct = nn.CrossEntropyLoss(size_average=False, reduce=True, reduction=None)
        elif loss_function == "label-smoothing-cross-entropy":
            self.loss_fct = label_smoothing.LabelSmoothingCrossEntropy(smoothing)

        self.init_weights()

    def forward(
        self,
        input_ids_pos=None,
        attention_mask_pos=None,
        token_type_ids_pos=None,
        inputs_embeds_pos=None,
        input_ids_neg=None,
        attention_mask_neg=None,
        token_type_ids_neg=None,
        inputs_embeds_neg=None,
        labels=None
    ):
        #forward pass for positive instances
        outputs_pos = self.miniLM(
            input_ids=input_ids_pos,
            attention_mask=attention_mask_pos,
            token_type_ids=token_type_ids_pos,
            inputs_embeds=inputs_embeds_pos
        )
        #print("labels:",labels)
        pooled_output_pos = outputs_pos[1]
        pooled_output_pos = self.dropout(pooled_output_pos)
        #print(pooled_output_pos.shape)
        logits_pos = self.classifier(pooled_output_pos)
        #print("pos_logit:", logits_pos)#2维，0,1得分，否是相关（pos or not）

        #forward pass for negative instances
        outputs_neg = self.miniLM(
            input_ids=input_ids_neg,
            attention_mask=attention_mask_neg,
            token_type_ids=token_type_ids_neg,
            inputs_embeds=inputs_embeds_neg
        )
        pooled_output_neg = outputs_neg[1]
        pooled_output_neg = self.dropout(pooled_output_neg)
        logits_neg = self.classifier(pooled_output_neg)
        #print("neg_logit:", logits_neg)#2维，0,1得分，否是相关（pos or not）

        logits_diff = logits_pos - logits_neg#此处的设计可以商榷！！！拼接还是相减

        # Calculating Cross entropy loss for pairs <q,d1,d2>
        # based on "Learning to Rank using Gradient Descent" 2005 ICML
        loss = None
        if labels is not None:
            loss = self.loss_fct(logits_diff.view(-1, self.num_labels), labels.view(-1))

        # for label, we only consider the first part
        # output = (logits_pos,) + outputs_pos[2:]
        output = (logits_pos, logits_diff)
        #print("before:", output)
        return ((loss,) + output) if loss is not None else output

可见目标模型中有基模型miniLM，属于官方给定的模型，且miniLMj的base_model应该是BERT，与本身self不同，因此我尝试在目标模型类中添加一个get_input_embeddings的函数，返回的是类中基模型的embeddings，如下：

class MinitForPairwiseLearning(BertPreTrainedModel):
    def __init__(self, config, loss_function="label-smoothing-cross-entropy", smoothing=0.1):
        super().__init__(config)
        print("config:",config)

        #There should be at least relevant and non relevant options.
        self.num_labels = config.num_labels+1
        self.miniLM = AutoModel.from_pretrained("cross-encoder/ms-marco-MiniLM-L-12-v2")
        #print("mini_model:", self.miniLM)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)#0.1
        self.classifier = nn.Linear(config.hidden_size, self.num_labels)#hidden_size=384

        if loss_function == "cross-entropy":
            self.loss_fct = nn.CrossEntropyLoss(size_average=False, reduce=True, reduction=None)
        elif loss_function == "label-smoothing-cross-entropy":
            self.loss_fct = label_smoothing.LabelSmoothingCrossEntropy(smoothing)

        self.init_weights()

    def forward(
        self,
        input_ids_pos=None,
        attention_mask_pos=None,
        token_type_ids_pos=None,
        inputs_embeds_pos=None,
        input_ids_neg=None,
        attention_mask_neg=None,
        token_type_ids_neg=None,
        inputs_embeds_neg=None,
        labels=None
    ):
        #forward pass for positive instances
        outputs_pos = self.miniLM(
            input_ids=input_ids_pos,
            attention_mask=attention_mask_pos,
            token_type_ids=token_type_ids_pos,
            inputs_embeds=inputs_embeds_pos
        )
        #print("labels:",labels)
        pooled_output_pos = outputs_pos[1]
        pooled_output_pos = self.dropout(pooled_output_pos)
        #print(pooled_output_pos.shape)
        logits_pos = self.classifier(pooled_output_pos)
        #print("pos_logit:", logits_pos)#2维，0,1得分，否是相关（pos or not）

        #forward pass for negative instances
        outputs_neg = self.miniLM(
            input_ids=input_ids_neg,
            attention_mask=attention_mask_neg,
            token_type_ids=token_type_ids_neg,
            inputs_embeds=inputs_embeds_neg
        )
        pooled_output_neg = outputs_neg[1]
        pooled_output_neg = self.dropout(pooled_output_neg)
        logits_neg = self.classifier(pooled_output_neg)
        #print("neg_logit:", logits_neg)#2维，0,1得分，否是相关（pos or not）

        logits_diff = logits_pos - logits_neg#此处的设计可以商榷！！！拼接还是相减

        # Calculating Cross entropy loss for pairs <q,d1,d2>
        # based on "Learning to Rank using Gradient Descent" 2005 ICML
        loss = None
        if labels is not None:
            loss = self.loss_fct(logits_diff.view(-1, self.num_labels), labels.view(-1))

        # for label, we only consider the first part
        # output = (logits_pos,) + outputs_pos[2:]
        output = (logits_pos, logits_diff)
        #print("before:", output)
        return ((loss,) + output) if loss is not None else output
        
	def get_input_embeddings(self):
        return self.miniLM.get_input_embeddings()

再运行，NotImplementedError的问题就得以解决。解决这类问题需要你有一定代码经验，善于由内到外，由后到前逐一分析解决报错的可能，同时要尽量保证不同代码块、类、实例之间的松耦合性。

GitCode 开源社区

旨在为数千万中国开发者提供一个无缝且高效的云端环境，以支持学习、使用和贡献开源项目。

更多推荐

沁言学术 vs Grammarly：中文学术写作与语料库本地化支持的表现剖析

Grammarly是全球写作工具，语料库以英文为主，支持基本中文检查；沁言学术是本土AI平台，语料库深度本地化，针对中文学术设计。中文学术写作：Grammarly基础语法/拼写（本地化弱），沁言学术AI生成/优化（深度支持）。语料库本地化：Grammarly通用库（英文主导），沁言学术本土库（CNKI等集成）。整体：Grammarly免费版通用，付费版高级；沁言学术免费版入门，AI付费优化。表现亮