使用RNN和TensorFlow创建自己的Harry Potter短故事
数据科学 (Data Science)
“当然,这发生在你的脑海里,哈利,但是为什么在地球上那应该意味着那不是真实的呢?”¹ (“Of course it is happening inside your head, Harry, but why on earth should that mean that it is not real?”¹)
Still, waiting for your Hogwarts letter? Want to enjoy the feast in the Great Hall? Explore the secret passages in Hogwarts?Buy your first wand from Ollivander’s?*sigh* You are not alone.
还在等您的霍格沃茨信吗? 想在人民大会堂享受盛宴吗? 探索霍格沃茨的秘密通道?从奥利凡德的手中买第一把魔杖?*叹气*并不孤单。
I have (after all this time?) always been obsessed with Harry Potter, and I recently started learning neural networks. It’s fascinating to see how creative you can get with Deep Learning, so I thought why not brew them up?
( 一直以来? ) 我一直沉迷于Harry Potter ,最近我开始学习神经网络。 令人惊讶的是,您可以通过深度学习获得怎样的创造力,所以我想为什么不把它们酿造呢?
So I executed a simple text generation model using TensorFlow to create my own version of a Harry Potter short-story (can't get as good as J.K. Rowling, duh!)
因此,我使用TensorFlow执行了一个简单的文本生成模型,以创建自己的哈利波特短篇小说版本(不能像JK罗琳那样好!)
This article runs you through the entire code I wrote to implement it. But for all the Hermione’s out there, you can directly find the github code here and run it yourself!
本文为您介绍了我编写的用于实现它的全部代码。 但是对于所有Hermione来说,您都可以在这里直接找到github代码并自己运行它!
So here’s something which will cast a Banishing Charm on your boredom while you’re quarantined.
因此,这里的一些东西,会投上你的无聊一个放逐魅力 ,而你隔离。
背景 (Background)
什么是RNN? (What is an RNN?)
A Recurrent Neural Network is different from the other neural networks as it has a memory which stores information of all the layers it has processed so far and computes the next layer on the basis of this memory. For a simple introduction to RNNs, you can refer to this.
递归神经网络与其他神经网络不同,它具有一个内存 ,该内存可存储到目前为止已处理的所有层的信息,并根据此内存计算下一层。 有关RNN的简单介绍,可以参考this 。
GRU与LSTM (GRU vs LSTM)
Both of these are great for text generation but GRUs are a newer concept…and there isn’t actually a way to determine which one is better in general. Tuning your hyper-parameters well is what will improve your model performance more than choosing a good architecture.²
两者都非常适合生成文本,但是GRU是一个较新的概念……实际上,没有一种方法可以确定总体上哪个更好。 与选择良好的体系结构相比, 调整好超参数的方法可以更好地改善模型性能。²
However, if the amount of data is not a problem, LSTMs perform better. If you have less data, GRUs have fewer parameters so they train faster and work well to generalize the lesser data.
但是,如果数据量不是问题,则LSTM的性能会更好。 如果数据较少,则GRU的参数较少,因此训练速度更快,并且可以很好地推广较少的数据。
Feel free to check out this article for a more detailed explanation.
请随意查看本文以获取更详细的说明。
为什么基于角色? (Why character-based?)
When working with large datasets like this, the complete number of unique words in a corpus is much higher than the number of unique characters. A large dataset will have many many unique words, and when we assign one-hot encodings to such large matrices we’re likely to run into memory issues. Our labels alone can take up storage of terabytes of RAM.
当使用这样的大型数据集时, 语料库中唯一词的完整数量要比唯一字符的数量高得多 。 大型数据集将包含许多独特的单词,当我们为此类大型矩阵分配单编码时,我们很可能会遇到内存问题。 仅我们的标签就可以占用数TB的RAM。
So, the same principles which you use to predict words can be applied here, but now you’ll be working with much smaller vocabulary size.
因此,可以在这里应用与预测单词相同的原理,但是现在您将使用较小的词汇量。
代码 (The code)
So let’s get started!
因此,让我们开始吧!
首先,导入您需要的库 (First, import the libraries you need)
import tensorflow as tfimport numpy as np
import os
import time
现在,读取数据 (Now, read the data)
You can find and download transcripts of all the Harry Potter books from this Kaggle dataset. Here, I am combining all the seven books into one text file named ‘harrypotter.txt’. You can also train your model on any one book if you like. Just experiment with it!
你可以找到并从所有哈利·波特书籍下载誊本 Kaggle数据集。 在这里,我将全部七本书合并为一个名为“ harrypotter.txt”的文本文件。 如果愿意,您还可以在任何一本书上训练模型。 只是尝试一下!
files= [‘1SorcerersStone.txt’, ‘2ChamberofSecrets.txt’, ‘3ThePrisonerOfAzkaban.txt’, ‘4TheGobletOfFire.txt’, ‘5OrderofthePhoenix.txt’, ‘6TheHalfBloodPrince.txt’, ‘7DeathlyHollows.txt’]with open(‘harrypotter.txt’, ‘w’) as outfile:
for file in files:
with open(file) as infile:
outfile.write(infile.read())
text = open(‘harrypotter.txt’).read()
看数据 (Looking at the data)
print(text[:300])“Harry Potter and the Sorcerer’s Stone
“哈利·波特与魔法石
CHAPTER ONE
第一章
THE BOY WHO LIVED
一个住在。。。的男孩
Mr. and Mrs. Dursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much. They were the last people you’d expect to be involved in anything strange or mysterious, because they”³
排名第四的Privet Drive的Dursley夫妇很自豪地说他们很正常,非常感谢。 他们是您最后希望参与任何奇怪或神秘事物的人,因为他们“³
处理数据 (Processing the data)
We map all the unique character strings in vocab to numbers by making two look-up tables:
通过创建两个查询表,我们将vocab所有唯一字符串映射为数字:
mapping the characters to numbers (char2index)
将字符映射到数字( char2index )
mapping the numbers back to the characters (index2char)
将数字映射回字符( index2char )
Then convert our text to numbers..
然后将我们的文本转换为数字。
vocab = sorted(set(text))char2index = {u:i for i, u in enumerate(vocab)}
index2char = np.array(vocab)
text_as_int = np.array([char2index[c] for c in text])#how it looks:
print ('{} -- characters mapped to int -- > {}'.format(repr(text[:13]), text_as_int[:13]))
‘Harry Potter ‘ — characters mapped to int → [39 64 81 81 88 3 47 78 83 83 68 81 3]
'Harry Potter'-映射到int的字符→[39 64 81 81 88 3 47 78 83 83 68 81 3]
Each input sequence for our model will contain seq_length number of characters from the text, and its corresponding target sequence will be of the same length with all characters shifted one place to the right. So we break the text into chunks of seq_length+1.⁴
我们模型的每个输入序列将包含文本中seq_length个字符,并且其对应的目标序列将具有相同的长度,所有字符都向右移一位。 因此,我们将文本分成seq_length+1块。
tf.data.Dataset.from_tensor_slices converts the text vector into a stream of character indices and the batch method lets us group these characters into batches of the required length.
tf.data.Dataset.from_tensor_slices将文本向量转换为字符索引流,并且batch方法使我们可以将这些字符分组为所需长度的批处理。
By using the map method to apply a simple function to each batch, we create our inputs and targets.
通过使用map方法对每个批次应用简单的函数,我们创建了输入和目标。
seq_length = 100examples_per_epoch = len(text)//(seq_length+1)
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)def split_input_target(data):
input_text = data[:-1]
target_text = data[1:]
return input_text, target_textdataset = sequences.map(split_input_target)
Before feeding this data into the model, we shuffle the data and divide it into batches. tf.data maintains a buffer in which it shuffles elements.
在将这些数据输入模型之前,我们将数据混洗并将其分为几批。 tf.data维护一个缓冲区,在缓冲区中它会tf.data元素。
BATCH_SIZE = 64BUFFER_SIZE = 10000dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
建立模型 (Building the Model)
Given all the characters computed until this moment, what will the next character be? This is what we will be training our RNN model to predict.
给定到目前为止计算出的所有字符,下一个字符将是什么? 这就是我们将训练RNN模型进行预测的内容。
I have used tf.keras.Sequential to define the model since all the layers in it only have a single input and produce a single output. The different layers used are:
我已经使用tf.keras.Sequential定义了模型,因为其中的所有层都只有一个输入并产生一个输出。 使用的不同层是:
tf.keras.layers.Embedding : This is the input layer. An embedding is used to map all the unique characters to vectors in multi-dimensional space, having embedding_dim dimensions.
tf.keras.layers.Embedding :这是输入层。 嵌入用于将所有唯一字符映射到具有embedding_dim维度的多维空间中的向量。
tf.keras.layers.GRU: A type of RNN with rnn_units number of units.(You can also use an LSTM layer here to see what works best for your data)
tf.keras.layers.GRU :一种具有rnn_units个单位数的RNN。(您也可以在此处使用LSTM层,以查看最适合您的数据的层)
tf.keras.layers.Dense: This is the output layer, with vocab_size outputs.
tf.keras.layers.Dense :这是输出层,带有vocab_size输出。
It is also useful to define all the hyper-parameters separately so that it’s easier for you to change them later without editing the model definition.
分别定义所有超参数也很有用,这样以后您无需编辑模型定义就可以更轻松地对其进行更改。
Source资源 vocab_size = len(vocab)embedding_dim = 300
# Number of RNN units
rnn_units1 = 512
rnn_units2 = 256
rnn_units= [rnn_units1, rnn_units2]def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim,
batch_input_shape=[batch_size, None]), tf.keras.layers.GRU(rnn_units1, return_sequences=True,
stateful=True,recurrent_initializer='glorot_uniform'), tf.keras.layers.GRU(rnn_units2, return_sequences=True,
stateful=True,recurrent_initializer='glorot_uniform'), tf.keras.layers.Dense(vocab_size) ])
return modelmodel = build_model(
vocab_size = vocab_size,
embedding_dim=embedding_dim,
rnn_units=rnn_units,
batch_size=BATCH_SIZE)
训练模型 (Training the model)
The standard tf.keras.losses.sparse_categorical_crossentropy loss function works best with our model as it is applied across the last layer of the predictions. We set from_logits to True because the model returns logits. Then we choose the adam optimizer and compile our model.
标准的tf.keras.losses.sparse_categorical_crossentropy损失函数与我们的模型一起使用时效果最佳,因为它应用于预测的最后一层。 我们将from_logits设置为True,因为该模型返回logits。 然后,选择adam优化器并编译我们的模型。
def loss(labels, logits):return tf.keras.losses.sparse_categorical_crossentropy(labels,
logits, from_logits=True)model.compile(optimizer='adam', loss=loss, metrics=['accuracy'])
You can configure checkpoints like this to ensure that checkpoints are saved during training.
您可以像这样配置检查点,以确保在训练期间保存检查点。
# Directory where the checkpoints will be savedcheckpoint_dir = ‘./training_checkpoints’
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, “ckpt_{epoch}”)
checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_prefix, save_weights_only=True)
The training time of each epoch depends on your model layers and hyper-parameters used. I have set epochs to 50 to see how accuracy and loss change over time, but it may not be required to train for all 50 epochs. Make sure to stop training when you see your loss starts to increase or remains constant for a few epochs. The last epoch you train will be stored in latest_check . If using Google Colab, set the runtime to GPU to reduce training time.
每个时期的训练时间取决于您的模型层和使用的超参数。 我将时期设置为50,以查看准确性和损失如何随时间变化,但是可能不需要为所有50个时期训练。 当您看到损失开始增加或在几个时期保持恒定时,请务必停止训练。 您训练的最后一个纪元将存储在latest_check 。 如果使用Google Colab,请将运行时设置为GPU以减少训练时间。
EPOCHS= 50history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])
latest_check = tf.train.latest_checkpoint(checkpoint_dir)
文字产生 (Text generation)
If you wish to use a different batch size, you need to rebuild the model and reload the checkpoints before running. I have used batch_size of 1 to keep it simple.
如果希望使用其他批处理大小,则需要在运行之前重建模型并重新加载检查点。 我使用1的batch_size来保持简单。
(You can run a model.summary() to get insights on the layers of your model and the output shape after each layer)
(您可以运行model.summary()来深入了解模型的各层以及每一层之后的输出形状)
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)model.load_weights(latest_check)
model.build(tf.TensorShape([1, None]))
model.summary()
The following function now generates the text:
现在,以下函数将生成文本:
It accepts a start_string, initializes the RNN state and sets the number of output characters to num_generate
它接受一个start_string ,初始化RNN状态并将输出字符数设置为num_generate
Gets the prediction distribution of the next character using start_string and the RNN state. Then it calculates the index of the predicted character, which is our next input to the model.
使用start_string和RNN状态获取下一个字符的预测分布。 然后,它计算预测字符的索引,这是我们对该模型的下一个输入。
- The output state returned by the model is fed back into the model so that it now has more context, (as shown below). After predicting the next character, the cycle continues. This way the RNN learns as it builds up it’s memory from the previous outputs.⁴ 由模型返回的输出状态将反馈到模型中,以便它现在具有更多上下文(如下所示)。 预测下一个字符后,循环继续。 这样RNN就可以从以前的输出中建立内存来学习。⁴
A lower scaling results in a more predictable text whereas higher scaling gives a more surprising text.
较低的scaling会产生更可预测的文本,而较高的scaling会带来更令人惊讶的文本。
input_eval = tf.expand_dims(input_eval, 0) text_generated = [] scaling = 0.5 #kept at a lower value here # Here batch size == 1
model.reset_states()
for i in range(num_generate):
predictions = model(input_eval)
# remove the batch dimension
predictions = tf.squeeze(predictions, 0)
predictions = predictions / scaling
predicted_id = tf.random.categorical(predictions,
num_samples=1)[1,0].numpy()
input_eval = tf.expand_dims([predicted_id], 0)
text_generated.append(idx2char[predicted_id])return (start_string + ‘’.join(text_generated))
And you’re done!
大功告成!
产出 (Outputs)
You can try giving it different start strings to get different outputs.
您可以尝试给它不同的开始字符串以获得不同的输出。
Here is a part of the output using my favorite character:
这是使用我最喜欢的角色的输出的一部分:
print(generate_text(model, start_string=u”Severus Snape“))Severus Snape moved to the scarlet Hogwarts students. Hermione said, “Well, I think it’s all right, all right, a bit dead before. . . .”“I think I’ll have to go to the other than you be to help him a question of the staff table and the doors opened and he stared at the clock to Harry. “I think it make the sword of Gryffindor, who was there too, he was on his pillows, and he and Ron stared at him. “I am sure we can bother the boy — ““You should have been there,” said Ron, and he took a strange and color.“I mean, he was a really good …
西弗勒斯·斯内普(Severus Snape)搬到了猩红色的霍格沃茨大学(Hogwarts)的学生们。 赫敏说:“好吧,我认为这没事,没事,以前有点死了。 。 。 ”“我想我将不得不去其他人,帮助他解决职员桌的问题,门开了,他盯着哈利的钟。 “我认为这使格兰芬多的剑也在那里,他躺在枕头上,他和罗恩凝视着他。” “我确定我们会打扰这个男孩的-”“你应该去过那儿,”罗恩说,他带着一种奇怪的色彩。“我的意思是,他真的很好……
You can also try different sentences:
您还可以尝试其他句子:
Voldemort died of coronavirus.”“You didn’t know what to do,” said Harry, “it was a surrounding cloak, he was the one who sustain you to go to the way.”“Yeah, well, I think you might have done that!” she said, striding up the steps, and the strength were so far as he was a pretty great tent that was the first time they might have realized I saw him to be devastated and screaming of the crowd through the darkness at the time shouts and silence.“You see, Harry!”“I don’t know, see you haven’t got anything to do with a prater of the Ministry of Magic …
伏地魔死于冠状病毒。 ““你不知道该怎么办,”哈利说,“那是一个周围的斗篷,他是那个扶着你走的人。”“是的,我想你可能已经做到了!” 她说,迈上台阶,力量如此之大,以至于他是一个很棒的帐篷,这是他们第一次意识到我看到他被毁灭了,在呼喊和沉默中,黑暗中人群在尖叫“你明白了,哈利!”“我不知道,看看你与魔法部的伪装没有任何关系……
Here is one example if you train the model using just the first book, Sorcerer’s Stone³:
如果仅使用第一本书《 巫师之石》来训练模型,这是一个示例:
Dumbledore in the Leaky Cauldron, now empty. Harry had never been to London before. Although Hagrid seemed very cold and green eyes. He was still shaking.
漏水的大锅中的邓布利多 ,现在空了。 哈利以前从未去过伦敦。 尽管海格看上去非常冷淡和绿色。 他还在发抖。
Harry sat down next to the bowl of peas. “What did you talk to Professor Dumbledore.”
哈利在豌豆碗旁坐下。 “你对邓布利多教授说了什么。”
She eyed him with a mixture of shock and suspicion.
她震惊和怀疑地看着他。
“Who’s there?” he said suddenly as they climbed the street. He could just see the bundle of blankets on the step of number four.
“谁在那儿?” 他们爬上街道时,他突然说。 他只能在四号台阶上看到那捆毯子。
Dudley’s favorite punching bag was Harry, but he couldn’t often catch him. Harry didn’t say anything …
达德利最喜欢的出气筒是哈利,但他却很少能抓住他。 哈利什么都没说……
You’ll see the model knows when to capitalize words, make a new paragraph and it imitates a magical writing vocabulary!
您会看到模型知道何时将单词大写,编写新段落,并且它模仿了神奇的写作词汇!
Mischief Managed.
恶作剧管理。
To make the sentences more coherent, you can improve the model by
为了使句子更连贯,您可以通过以下方式改进模型:
changing the different parameter values like seq_length , rnn_units , embedding_dims , scaling to find the best settings
更改不同的参数值,例如seq_length , rnn_units , embedding_dims , scaling以找到最佳设置
- training it for more epochs 训练更多的时代
- adding more layers of GRU / LSTM 添加更多层的GRU / LSTM
This model can be trained on any other series you like. Do share your own stories in the comments and have fun!
可以在您喜欢的任何其他系列上训练该模型。 在评论中分享您自己的故事并玩得开心!
[1] J.K. Rowling, Harry Potter and the Deathly Hallows, 2007
[1] JK罗琳,《 哈利·波特与死亡圣器》 ,2007年
[2] Recurrent Neural Network Tutorial, Part 4 — Implementing a GRU/LSTM RNN with Python and Theano, OCTOBER 27, 2015 BY DENNY BRITZ
[2] 递归神经网络教程,第4部分—使用Python和Theano实现GRU / LSTM RNN ,2015年10月27日,作者: DENNY BRITZ
[3] J.K. Rowling, Harry Potter and the Sorcerer’s Stone, 1998
[3] JK罗琳,《 哈利·波特与魔法石》 ,1998年
[4] Text generation with an RNN, TensorFlow
[4] 使用RNN ,TensorFlow 生成文本
翻译自: https://medium.com/towards-artificial-intelligence/create-your-own-harry-potter-short-story-using-rnn-and-tensorflow-853b3ed1b8f3
总结
以上是生活随笔为你收集整理的使用RNN和TensorFlow创建自己的Harry Potter短故事的全部内容,希望文章能够帮你解决所遇到的问题。
- 上一篇: 价外费用含税吗
- 下一篇: bitnami如何使用_使用Bitnam