Language Generation with Recurrent Models

LSTM, Sampling, Smart Code Completion Tool

How Do You Generate Sequence Data?

The general way is to train a machine learning model then ask it to predict the next token, whether they be characters or words or n-grams. A model with this predictive capability is called a Language Model. The model is basically learning the latent space i.e. the statistical structure of the given data.

This model will spit out an output based on an input. Then replace the output as the input for another round of text generation. And repeat the process.

More concretely, given a sequence like “Cat in the ha”, the language model would predict, “t”. Assuming the model was trained on Dr. Seuss corpora.

The output unit for a character-level language model would be a softmax activation over all the possible characters.

Imagine the 26 letters of English, for a given sequence of text, there is a probability distribution over the 26 letters. For our “cat in the ha” sequence the letter “t” would possibly have the highest probability, say 0.25. Whereas “r” might be 0.03 and “m” might be like 0.05 and so on.

So when we generate the next character in the sequence, we are sampling from a probability space. There are some approaches to this.

Which Sampling Strategy To Pick?

If we always go with the highest probability characters, our model will probably never mess up. But the text it generates will probably be pretty stale, cliché and common. This sampling has minimum entropy.

On the other hand, if we pick randomly, we might as well generate meaningless sequence of characters like “wkrnj1lkm32l3kremflsdcm”. This sampling has maximum entropy.

However if we sample probabilistically using softmax activation, we would pick “t” 0.25 of the time, which gives the other less likely characters a chance to appear at least some of the time. This method has an entropy somewhere between min and max, what’s even better is, we can even control this with a knob.

The softmax temperature is a value we can use to adjust how randomly we wanna sample from the probability space. 0.01 means very deterministic and 0.99 means very random.

The way we do this is by inputting a distribution and getting back a redistributed distribution according to our entropy preference.

import numpy as npdef reweight_distribution(original_distribution, temperature=0.5):
distribution = np.log(original_distribution) / temperature
distribution = np.exp(distribution)
return distribution / np.sum(distribution)

How To Implement a Character Level LSTM Text Generation?

First we download a large corpus to train our network with:

Then we vectorize the characters in the text:

And create a model and compile it:

And finally adjust the temperature and give it a random prompt and let the trained model predict the next character:

The output for this text is a little nonsensical, but considering that it’s a single layer LSTM that takes a couple minutes to train, it’s pretty okay.

epoch 1 1565/1565 [==============================] - 168s 107ms/step - loss: 1.3974 
⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ Generating with seed: "a race which seeks to rise above its hereditary baseness and"
∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ temperature: 0.5
a race which seeks to rise above its hereditary baseness and interermal and man was other things has been commands of the simplic of the senses and straight and weakness of the seemed and come the instincts of the whole person of the work and although the most
epoch 2 1565/1565 [==============================] - 171s 109ms/step - loss: 1.3848
⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ Generating with seed: "admire and still readier to turn away. 36 =objection.=--o"
∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ temperature: 0.5
admire and still readier to turn away. 36 =objection.=--one has at the spirit of the stronger of the our graditions of a being body one conscience and interpretation of the child are better of contemplate himself and consciences of the contemplate of the su
epoch 3 1565/1565 [==============================] - 167s 107ms/step - loss: 1.3746
⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ Generating with seed: "taste when it is counter to our vanity. 177. with regard to"
∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ temperature: 0.5
taste when it is counter to our vanity. 177. with regard to the world for consential men, this suffering and shours it makes called as the profound in the amplement of distinguish and solitude and there is not to the point of many perhaps the stronger and hav
epoch 4 1565/1565 [==============================] - 171s 109ms/step - loss: 1.3643
⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ Generating with seed: " vital spot of truth when he warns all those endowed with re"
∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ temperature: 0.5
vital spot of truth when he warns all those endowed with regard and the through and every him are whom will the world of a new and that in the same "nature and in the souls and as he artist and the possible the antilition of the grateful of the say still dece
epoch 5 1565/1565 [==============================] - 167s 107ms/step - loss: 1.3556
⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ Generating with seed: " and profound enough to receive such belated fugitives. 256"
∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ temperature: 0.5
and profound enough to receive such belated fugitives. 256. a man in all the spirit of the superficial of prosonded with a man and by the same spirit of the powerful of a purposition of the conditions of the spirit is a fanish of the superstition of the spir

What would happen if we replaced the corpus to some codebase?

I merged only several random files and trained a model to get some gibberish like this:

133/133 [==============================] - 14s 109ms/step - loss: 1.3357 ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ Generating with seed: "if()  if(swift_built_standalone) project(swift c cxx asm) en" ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ temperature: 0.5 
if() if(swift_built_standalone) project(swift c cxx asm) endif() option(swift_host_variant_arch}") set(swift_host_variant_arch_default "${swift_host_variant_arch_default "${cmake_march_sdk_default "acchos "") endif("${cmake_system_name}") set(swift_host_variepoch 6

To be fair this was only 50k length corpus.

This file is a single file that has most of the ThreeJS core library, of length 1.3m. Furthermore, since code needs to be more structured, I decided to reduce the temperature to 0.35:

epoch 1 3593/3593 [==============================] - 409s 113ms/step - loss: 1.8626 
⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ Generating with seed: " math.log(math.max(width, height)) * math.log2e; "
∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ temperature: 0.35 math.log(math.max(width, height)) * math.log2e; this.startandition = new points(this.matrix);
this.matrixworld = shadow.color.prototype.color();
this._caches = new vector3();
}
var material = new vector3();
function = function
epoch 2 3593/3593 [==============================] - 399s 111ms/step - loss: 1.0805
⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ ⨕ Generating with seed: ".sqrt(this.distancetosquared(v));
};
_proto.distanc
"
∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ temperature: 0.35 .sqrt(this.distancetosquared(v));
};
_proto.distanc
e = function () {
var vertex.settexture(array, array, offset);
return this;
};
_proto.getpointlined( vector3();
this.component = function (intensity, origin, color) {

This stuff gives pseudocode a whole new meaning. How can we make this more useful?

What If We Just Try to Make a Smart Code Completion Tool?

When we are looking for code completion, it is just to complete the current line of code, it’s never really multi line stuff. Although maybe a Smart Code Snippet Tool could be cool too.

A typical line is about 50. Furthermore, our code completion tool shouldn’t really invent new code, we are just trying to save time by completing code we type very frequently. We almost want deterministic stuff. So we might use a deterministic temperature of 0.05.

Another consideration is, how much of a random string to prompt with, most people will type out a bit and then wait for code completion. I chose a string length of 10:

Our first random promp is “vertice”, in the original corpus, this appears a lot, some of the original uses were:

var vertices = [];vertices.push(x, -y, 0);_this.setAttribute('position', new Float32BufferAttribute(vertices, 3));

Our smart code completion tool, outputs this:

vertices = this.groups.prototype.color.clearcoatnormalmap

Which doesn’t make sense. However:

vertices = this.groups;

Would have made sense. As it appears several times throughout the codebase. We just need less temperature.

Now use use temperature of 0.01 and our prompt is:

"ditherin"

And the real code has uses like:

this.dithering = source.dithering;dithering_fragment: dithering_fragment,parameters.dithering ? '#define DITHERING' : '',

And our code completion tool outputs:

dithering = new vector3();

Which looks better but, this particular line of code never appears in the original codebase.

It’s very clear what’s happening, our character level code generation does make sense when you consider the words hyper locally. But it’s just not capturing the larger meaning of a small line of code.

We could do word level tokenization or stack the LSTM layers.

So in this one, I stacked 2 layers of LSTM, make sure to use full sequences in your preceding LSTM:

return_sequences=True

Our prompt is:

"py.call(th"

And the real code has uses like:

Light.prototype.copy.call(this, source);_Object3D.prototype.copy.call(this, source, false);

And our code completion tool outputs:

py.call(this, context);

Which isn’t bad, but also it’s not quite capturing our intent.

To improve this in general we can add couple more features, like doing code completion only from the start of a line and doing a word or n-gram level code prediction.

Other Articles

This post is part of a series of stories that explores the fundamentals of deep learning:1. Linear Algebra Data Structures and Operations
Objects and Operations
2. Computationally Efficient Matrices and Matrix Decompositions
Inverses, Linear Dependence, Eigen-decompositions, SVD
3. Probability Theory Ideas and Concepts
Definitions, Expectation, Variance
4. Useful Probability Distributions and Structured Probabilistic Models
Activation Functions, Measure and Information Theory
5. Numerical Method Considerations for Machine Learning
Overflow, Underflow, Gradients and Gradient Based Optimizations
6. Gradient Based Optimizations
Taylor Series, Constrained Optimization, Linear Least Squares
7. Machine Learning Background Necessary for Deep Learning I
Generalization, MLE, Kullback-Leibler Divergence
8. Machine Learning Background Necessary for Deep Learning II
Regularization, Capacity, Parameters, Hyper-parameters
9. Principal Component Analysis Breakdown
Motivation, Derivation
10. Feed-forward Neural Networks
Layers, definitions, Kernel Trick
11. Gradient Based Optimizations Under The Deep Learning Lens
Stochastic Gradient Descent, Cost Function, Maximum Likelihood
12. Output Units For Deep Learning
Stochastic Gradient Descent, Cost Function, Maximum Likelihood
13. Hidden Units For Deep Learning
Activation Functions, Performance, Architecture
14. The Common Approach to Binary Classification
The most generic way to setup your deep learning models to categorize movie reviews
15. General Architectural Design Considerations for Neural Networks
Universal Approximation Theorem, Depth, Connections
16. Classifying Text Data into Multiple Classes
Single-Label Multi-class Classification
17. Convolutional Models Overview
Convolutions, Kernels, Downsampling & Properties
18. Working Understanding of Convolutional Models
Creating, Preprocessing, Data Augmentation, Feature Extraction, Fine Tuning
19. Convolutional Models for Sequential Data
And easing into Recurrent Neural Networks
20. Recurrent Models Overview
Recurrent Layers: SimpleRNN, LSTM, GRU
21. Language Processing with Recurrent Models
Bidirectional RNNs, Encoding, Word Embeddings and Tips
22. Language Generating with Recurrent Models
LSTM, Sampling, Smart Code Completion Tool

Up Next…

Coming up next is probably more Computational Linguistics Theory. If you would like me to write another article explaining a topic in-depth, please leave a comment.

For the table of contents and more content click here.

I write about software && math

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store