进行深度学习实验的时候,对模型model进行:

model.get_input_embeddings()

报错:NotImplementedError

解决报错思路:
如果一眼凭借经验你看不出来,也无法通过简单的检索获得有效的指导时,应该深入报错位置进行检查。

报错位置的相应代码:

def get_input_embeddings(self) -> nn.Module:
     """
     Returns the model's input embeddings.

     Returns:`nn.Module`: A torch module mapping vocabulary to hidden states.
     """
     base_model = getattr(self, self.base_model_prefix, self)
     if base_model is not self:
         return base_model.get_input_embeddings()
     else:
         raise NotImplementedError

可见报错原因是目标模型经过getattr函数处理后得到的基模型base_model与模型本身self是相同的类型(默认的get_input_embeddings函数是用于目标模型的base_model与目标模型本身不同的场景),我推测代码因为目标模型是我在bert基础上自己搭建的pairwise模型,毕竟不是官方给定的模型,因此导致该函数无法准确找到目标模型的基模型(BERT),只能把self当做base_model。

出现NotImplementedError的原因就是基模型base_model与模型本身self是相同的类型。解决思路一是直接在get_input_embeddings函数代码上修改,但是这个函数代码是第三方库的代码。直接修改风险很大,我刚开始就尝试改为:

if base_model is not self:
     return base_model.get_input_embeddings()
else:
     return base_model.get_input_embeddings()
     raise NotImplementedError

但继续报错,因此直接在第三方库的代码上修改不是一个最好的办法,虽然少数情况这样可能有效。

因此遇到这种困境我们就要尝试把思路打开,从报错位置出发,不断溯源,寻找可能影响报错的因素。
就比如我们在get_input_embeddings函数中不论两者是否相同都返回base_model.get_input_embeddings(),这样就能不报错。但是该方式行不通,因此,我们可以尝试如何让base_model与self不同,或者重新定义目标模型对应的get_imput_embeddings函数。
通过查看self模型(目标模型)的结构,我们发现它的embeddings是bert的embeddings,因此我们能否在不修改第三方库代码的前提下传入bert的embeddings呢?
于是我尝试向外层进行分析。我找到目标模型的代码,如下:

class MinitForPairwiseLearning(BertPreTrainedModel):
    def __init__(self, config, loss_function="label-smoothing-cross-entropy", smoothing=0.1):
        super().__init__(config)
        print("config:",config)

        #There should be at least relevant and non relevant options.
        self.num_labels = config.num_labels+1
        self.miniLM = AutoModel.from_pretrained("cross-encoder/ms-marco-MiniLM-L-12-v2")
        #print("mini_model:", self.miniLM)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)#0.1
        self.classifier = nn.Linear(config.hidden_size, self.num_labels)#hidden_size=384

        if loss_function == "cross-entropy":
            self.loss_fct = nn.CrossEntropyLoss(size_average=False, reduce=True, reduction=None)
        elif loss_function == "label-smoothing-cross-entropy":
            self.loss_fct = label_smoothing.LabelSmoothingCrossEntropy(smoothing)

        self.init_weights()

    def forward(
        self,
        input_ids_pos=None,
        attention_mask_pos=None,
        token_type_ids_pos=None,
        inputs_embeds_pos=None,
        input_ids_neg=None,
        attention_mask_neg=None,
        token_type_ids_neg=None,
        inputs_embeds_neg=None,
        labels=None
    ):
        #forward pass for positive instances
        outputs_pos = self.miniLM(
            input_ids=input_ids_pos,
            attention_mask=attention_mask_pos,
            token_type_ids=token_type_ids_pos,
            inputs_embeds=inputs_embeds_pos
        )
        #print("labels:",labels)
        pooled_output_pos = outputs_pos[1]
        pooled_output_pos = self.dropout(pooled_output_pos)
        #print(pooled_output_pos.shape)
        logits_pos = self.classifier(pooled_output_pos)
        #print("pos_logit:", logits_pos)#2维,0,1得分,否是相关(pos or not)

        #forward pass for negative instances
        outputs_neg = self.miniLM(
            input_ids=input_ids_neg,
            attention_mask=attention_mask_neg,
            token_type_ids=token_type_ids_neg,
            inputs_embeds=inputs_embeds_neg
        )
        pooled_output_neg = outputs_neg[1]
        pooled_output_neg = self.dropout(pooled_output_neg)
        logits_neg = self.classifier(pooled_output_neg)
        #print("neg_logit:", logits_neg)#2维,0,1得分,否是相关(pos or not)

        logits_diff = logits_pos - logits_neg#此处的设计可以商榷!!!拼接还是相减

        # Calculating Cross entropy loss for pairs <q,d1,d2>
        # based on "Learning to Rank using Gradient Descent" 2005 ICML
        loss = None
        if labels is not None:
            loss = self.loss_fct(logits_diff.view(-1, self.num_labels), labels.view(-1))

        # for label, we only consider the first part
        # output = (logits_pos,) + outputs_pos[2:]
        output = (logits_pos, logits_diff)
        #print("before:", output)
        return ((loss,) + output) if loss is not None else output

可见目标模型中有基模型miniLM,属于官方给定的模型,且miniLMj的base_model应该是BERT,与本身self不同,因此我尝试在目标模型类中添加一个get_input_embeddings的函数,返回的是类中基模型的embeddings,如下:

class MinitForPairwiseLearning(BertPreTrainedModel):
    def __init__(self, config, loss_function="label-smoothing-cross-entropy", smoothing=0.1):
        super().__init__(config)
        print("config:",config)

        #There should be at least relevant and non relevant options.
        self.num_labels = config.num_labels+1
        self.miniLM = AutoModel.from_pretrained("cross-encoder/ms-marco-MiniLM-L-12-v2")
        #print("mini_model:", self.miniLM)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)#0.1
        self.classifier = nn.Linear(config.hidden_size, self.num_labels)#hidden_size=384

        if loss_function == "cross-entropy":
            self.loss_fct = nn.CrossEntropyLoss(size_average=False, reduce=True, reduction=None)
        elif loss_function == "label-smoothing-cross-entropy":
            self.loss_fct = label_smoothing.LabelSmoothingCrossEntropy(smoothing)

        self.init_weights()

    def forward(
        self,
        input_ids_pos=None,
        attention_mask_pos=None,
        token_type_ids_pos=None,
        inputs_embeds_pos=None,
        input_ids_neg=None,
        attention_mask_neg=None,
        token_type_ids_neg=None,
        inputs_embeds_neg=None,
        labels=None
    ):
        #forward pass for positive instances
        outputs_pos = self.miniLM(
            input_ids=input_ids_pos,
            attention_mask=attention_mask_pos,
            token_type_ids=token_type_ids_pos,
            inputs_embeds=inputs_embeds_pos
        )
        #print("labels:",labels)
        pooled_output_pos = outputs_pos[1]
        pooled_output_pos = self.dropout(pooled_output_pos)
        #print(pooled_output_pos.shape)
        logits_pos = self.classifier(pooled_output_pos)
        #print("pos_logit:", logits_pos)#2维,0,1得分,否是相关(pos or not)

        #forward pass for negative instances
        outputs_neg = self.miniLM(
            input_ids=input_ids_neg,
            attention_mask=attention_mask_neg,
            token_type_ids=token_type_ids_neg,
            inputs_embeds=inputs_embeds_neg
        )
        pooled_output_neg = outputs_neg[1]
        pooled_output_neg = self.dropout(pooled_output_neg)
        logits_neg = self.classifier(pooled_output_neg)
        #print("neg_logit:", logits_neg)#2维,0,1得分,否是相关(pos or not)

        logits_diff = logits_pos - logits_neg#此处的设计可以商榷!!!拼接还是相减

        # Calculating Cross entropy loss for pairs <q,d1,d2>
        # based on "Learning to Rank using Gradient Descent" 2005 ICML
        loss = None
        if labels is not None:
            loss = self.loss_fct(logits_diff.view(-1, self.num_labels), labels.view(-1))

        # for label, we only consider the first part
        # output = (logits_pos,) + outputs_pos[2:]
        output = (logits_pos, logits_diff)
        #print("before:", output)
        return ((loss,) + output) if loss is not None else output
        
	def get_input_embeddings(self):
        return self.miniLM.get_input_embeddings()

再运行,NotImplementedError的问题就得以解决。解决这类问题需要你有一定代码经验,善于由内到外,由后到前逐一分析解决报错的可能,同时要尽量保证不同代码块、类、实例之间的松耦合性。

Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐