解决NotImplementedError:与transformers中预训练模型的get_input_embeddings等函数有关
进行深度学习实验的时候,对模型model进行:
model.get_input_embeddings()
报错:NotImplementedError
解决报错思路:
如果一眼凭借经验你看不出来,也无法通过简单的检索获得有效的指导时,应该深入报错位置进行检查。
报错位置的相应代码:
def get_input_embeddings(self) -> nn.Module:
"""
Returns the model's input embeddings.
Returns:`nn.Module`: A torch module mapping vocabulary to hidden states.
"""
base_model = getattr(self, self.base_model_prefix, self)
if base_model is not self:
return base_model.get_input_embeddings()
else:
raise NotImplementedError
可见报错原因是目标模型经过getattr函数处理后得到的基模型base_model与模型本身self是相同的类型(默认的get_input_embeddings函数是用于目标模型的base_model与目标模型本身不同的场景),我推测代码因为目标模型是我在bert基础上自己搭建的pairwise模型,毕竟不是官方给定的模型,因此导致该函数无法准确找到目标模型的基模型(BERT),只能把self当做base_model。
出现NotImplementedError的原因就是基模型base_model与模型本身self是相同的类型。解决思路一是直接在get_input_embeddings函数代码上修改,但是这个函数代码是第三方库的代码。直接修改风险很大,我刚开始就尝试改为:
if base_model is not self:
return base_model.get_input_embeddings()
else:
return base_model.get_input_embeddings()
raise NotImplementedError
但继续报错,因此直接在第三方库的代码上修改不是一个最好的办法,虽然少数情况这样可能有效。
因此遇到这种困境我们就要尝试把思路打开,从报错位置出发,不断溯源,寻找可能影响报错的因素。
就比如我们在get_input_embeddings函数中不论两者是否相同都返回base_model.get_input_embeddings(),这样就能不报错。但是该方式行不通,因此,我们可以尝试如何让base_model与self不同,或者重新定义目标模型对应的get_imput_embeddings函数。
通过查看self模型(目标模型)的结构,我们发现它的embeddings是bert的embeddings,因此我们能否在不修改第三方库代码的前提下传入bert的embeddings呢?
于是我尝试向外层进行分析。我找到目标模型的代码,如下:
class MinitForPairwiseLearning(BertPreTrainedModel):
def __init__(self, config, loss_function="label-smoothing-cross-entropy", smoothing=0.1):
super().__init__(config)
print("config:",config)
#There should be at least relevant and non relevant options.
self.num_labels = config.num_labels+1
self.miniLM = AutoModel.from_pretrained("cross-encoder/ms-marco-MiniLM-L-12-v2")
#print("mini_model:", self.miniLM)
self.dropout = nn.Dropout(config.hidden_dropout_prob)#0.1
self.classifier = nn.Linear(config.hidden_size, self.num_labels)#hidden_size=384
if loss_function == "cross-entropy":
self.loss_fct = nn.CrossEntropyLoss(size_average=False, reduce=True, reduction=None)
elif loss_function == "label-smoothing-cross-entropy":
self.loss_fct = label_smoothing.LabelSmoothingCrossEntropy(smoothing)
self.init_weights()
def forward(
self,
input_ids_pos=None,
attention_mask_pos=None,
token_type_ids_pos=None,
inputs_embeds_pos=None,
input_ids_neg=None,
attention_mask_neg=None,
token_type_ids_neg=None,
inputs_embeds_neg=None,
labels=None
):
#forward pass for positive instances
outputs_pos = self.miniLM(
input_ids=input_ids_pos,
attention_mask=attention_mask_pos,
token_type_ids=token_type_ids_pos,
inputs_embeds=inputs_embeds_pos
)
#print("labels:",labels)
pooled_output_pos = outputs_pos[1]
pooled_output_pos = self.dropout(pooled_output_pos)
#print(pooled_output_pos.shape)
logits_pos = self.classifier(pooled_output_pos)
#print("pos_logit:", logits_pos)#2维,0,1得分,否是相关(pos or not)
#forward pass for negative instances
outputs_neg = self.miniLM(
input_ids=input_ids_neg,
attention_mask=attention_mask_neg,
token_type_ids=token_type_ids_neg,
inputs_embeds=inputs_embeds_neg
)
pooled_output_neg = outputs_neg[1]
pooled_output_neg = self.dropout(pooled_output_neg)
logits_neg = self.classifier(pooled_output_neg)
#print("neg_logit:", logits_neg)#2维,0,1得分,否是相关(pos or not)
logits_diff = logits_pos - logits_neg#此处的设计可以商榷!!!拼接还是相减
# Calculating Cross entropy loss for pairs <q,d1,d2>
# based on "Learning to Rank using Gradient Descent" 2005 ICML
loss = None
if labels is not None:
loss = self.loss_fct(logits_diff.view(-1, self.num_labels), labels.view(-1))
# for label, we only consider the first part
# output = (logits_pos,) + outputs_pos[2:]
output = (logits_pos, logits_diff)
#print("before:", output)
return ((loss,) + output) if loss is not None else output
可见目标模型中有基模型miniLM,属于官方给定的模型,且miniLMj的base_model应该是BERT,与本身self不同,因此我尝试在目标模型类中添加一个get_input_embeddings的函数,返回的是类中基模型的embeddings,如下:
class MinitForPairwiseLearning(BertPreTrainedModel):
def __init__(self, config, loss_function="label-smoothing-cross-entropy", smoothing=0.1):
super().__init__(config)
print("config:",config)
#There should be at least relevant and non relevant options.
self.num_labels = config.num_labels+1
self.miniLM = AutoModel.from_pretrained("cross-encoder/ms-marco-MiniLM-L-12-v2")
#print("mini_model:", self.miniLM)
self.dropout = nn.Dropout(config.hidden_dropout_prob)#0.1
self.classifier = nn.Linear(config.hidden_size, self.num_labels)#hidden_size=384
if loss_function == "cross-entropy":
self.loss_fct = nn.CrossEntropyLoss(size_average=False, reduce=True, reduction=None)
elif loss_function == "label-smoothing-cross-entropy":
self.loss_fct = label_smoothing.LabelSmoothingCrossEntropy(smoothing)
self.init_weights()
def forward(
self,
input_ids_pos=None,
attention_mask_pos=None,
token_type_ids_pos=None,
inputs_embeds_pos=None,
input_ids_neg=None,
attention_mask_neg=None,
token_type_ids_neg=None,
inputs_embeds_neg=None,
labels=None
):
#forward pass for positive instances
outputs_pos = self.miniLM(
input_ids=input_ids_pos,
attention_mask=attention_mask_pos,
token_type_ids=token_type_ids_pos,
inputs_embeds=inputs_embeds_pos
)
#print("labels:",labels)
pooled_output_pos = outputs_pos[1]
pooled_output_pos = self.dropout(pooled_output_pos)
#print(pooled_output_pos.shape)
logits_pos = self.classifier(pooled_output_pos)
#print("pos_logit:", logits_pos)#2维,0,1得分,否是相关(pos or not)
#forward pass for negative instances
outputs_neg = self.miniLM(
input_ids=input_ids_neg,
attention_mask=attention_mask_neg,
token_type_ids=token_type_ids_neg,
inputs_embeds=inputs_embeds_neg
)
pooled_output_neg = outputs_neg[1]
pooled_output_neg = self.dropout(pooled_output_neg)
logits_neg = self.classifier(pooled_output_neg)
#print("neg_logit:", logits_neg)#2维,0,1得分,否是相关(pos or not)
logits_diff = logits_pos - logits_neg#此处的设计可以商榷!!!拼接还是相减
# Calculating Cross entropy loss for pairs <q,d1,d2>
# based on "Learning to Rank using Gradient Descent" 2005 ICML
loss = None
if labels is not None:
loss = self.loss_fct(logits_diff.view(-1, self.num_labels), labels.view(-1))
# for label, we only consider the first part
# output = (logits_pos,) + outputs_pos[2:]
output = (logits_pos, logits_diff)
#print("before:", output)
return ((loss,) + output) if loss is not None else output
def get_input_embeddings(self):
return self.miniLM.get_input_embeddings()
再运行,NotImplementedError的问题就得以解决。解决这类问题需要你有一定代码经验,善于由内到外,由后到前逐一分析解决报错的可能,同时要尽量保证不同代码块、类、实例之间的松耦合性。
更多推荐
所有评论(0)