epoch|使用深度学习模型创作动漫故事，比较LSTM和GPT2的文本生成方法( 二 )

批次大小 = 32最大序列长度 = 30词嵌入维度 = 100隐藏层尺寸 = 512训练轮次 = 15def loss_fn(predicted,target):loss = nn.CrossEntropyLoss()return loss(predicted,target)#====================================================================================================================================def train_fn(model,device,dataloader,optimizer):model.train()tk0 = tqdm(dataloader,position=0,leave=True,total = num_batches)train_loss = AverageMeter()hid_state,cell_state = model.zero_state(config.batch_size)hid_state = hid_state.to(device)cell_state = cell_state.to(device)losses = []for inp,target in tk0:inp = torch.tensor(inp,dtype=torch.long).to(device)target = torch.tensor(target,dtype=torch.long).to(device)optimizer.zero_grad()pred,(hid_state,cell_state) = model(inp,(hid_state,cell_state))#print(pred.transpose(1,2).shape)loss = loss_fn(pred.transpose(1,2),target)hid_state = hid_state.detach()cell_state = cell_state.detach()loss.backward()_ = torch.nn.utils.clip_grad_norm_(model.parameters(),max_norm=2) # to avoid gradient explosionoptimizer.step()train_loss.update(loss.detach().item())tk0.set_postfix(loss = train_loss.avg)losses.append(loss.detach().item())return np.mean(losses)#====================================================================================================================================def run():device = 'cuda'model = LSTMModel(vocab_size=vocab_size,emb_dim=config.emb_dim,hid_dim=config.hidden_dim,num_layers=3).to(device)optimizer = torch.optim.Adam(model.parameters(),lr=0.001)scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer=optimizer, mode = 'min', patience=2, verbose=True, factor=0.5)epochs = config.epochsbest_loss = 999for i in range(1,epochs+1):train_dataloader = create_batches(batch_size=config.batch_size,input_tok=input_tok,seq_len=config.seq_len,target_tok=target_tok)print('Epoch..',i)loss = train_fn(model,device,train_dataloader,optimizer)if loss<best_loss:best_loss = losstorch.save(model.state_dict(),config.model_path)scheduler.step(loss)torch.cuda.empty_cache()return model生成动漫文本在文本生成步骤中，我们向模型提供一些输入文本，例如，' A young woman '，我们的函数将首先对其进行标记，然后将其传递到模型中。该函数还取我们想要输出的概要的长度。模型将输出每个词汇表标记的分数。然后我们将对这些分数应用softmax将它们转换成概率分布。然后我们使用top-k抽样，即从n个词汇表中选择概率最高的k个标记，然后随机抽样一个标记作为输出返回。然后，该输出被连接到输出的初始输入字符串中。这个输出标记将成为下一个时间步骤的输入。假设输出是“capable”，然后我们连接的文本是“A young woman capable”。我们一直这样做，直到输出最后的结束标记，然后打印输出。这里有一个很好的图表来理解模型在做什么

文章插图
def inference(model,input_text,device,top_k=5,length = 100):output = ''model.eval()tokens = config.tokenizer(input_text)h,c = model.zero_state(1)h = h.to(device)c = c.to(device)for t in tokens:output = output+t+' 'pred,(h,c) = model(torch.tensor(w2i[t.lower()]).view(1,-1).to(device),(h,c))#print(pred.shape)for i in range(length):_,top_ix = torch.topk(pred[0],k = top_k)choices = top_ix[0].tolist()choice = np.random.choice(choices)out = i2w[choice]output = output + out + ' 'pred,(h,c) = model(torch.tensor(choice,dtype=torch.long).view(1,-1).to(device),(h,c))return output# ============================================================================================================device = 'cpu'mod = LSTMModel(emb_dim=config.emb_dim,hid_dim=config.hidden_dim,vocab_size=vocab_size,num_layers=3).to(device)mod.load_state_dict(torch.load(config.model_path))print('AI generated Anime synopsis:')inference(model = mod, input_text = 'In the ', top_k = 30, length = 100, device = device)在上面的例子中，我给出的最大长度为100，输入文本为“In the”，这就是我们得到的输出In the days attempt it 's . although it has , however ! what they believe that humans of these problems . it seems and if will really make anything . as she must never overcome allowances with jousuke s , in order her home at him without it all in the world : in the hospital she makes him from himself by demons and carnage . a member and an idol team the power for to any means but the two come into its world for what if this remains was to wait in and is n't going ! on an这在语法上似乎是正确的，但却毫无意义。LSTM虽然更善于捕捉长期依赖比基本RNN但他们只能看到几步(字)或向前迈了几步,如果我们使用双向RNNs捕获文本的上下文因此生成很长的句子时,我们看到他们毫无意义。GPT2方式一点点的理论Transformers在捕获所提供的文本片段的上下文方面做得更好。他们只使用注意力层(不使用RNN)，这让他们更好地理解文本的上下文，因为他们可以看到尽可能多的时间步回(取决于注意力)。注意力有不同的类型，但GPT2所使用的注意力，是语言建模中最好的模型之一，被称为隐藏的自我注意。GPT2没有同时使用transformer 编码器和解码器堆栈，而是使用了一个高栈的transformer 解码器。根据堆叠的解码器数量，GPT2转换器有4种变体。