I've been playing around with Andrej Karpathy's char-rnn for a few months in my spare time. If you aren't familiar with any of that or with RNNs in general, he has a comprehensive blog post. (There's now a newer implementation called torch-rnn but I haven't tried it yet.)
One of the first data sets I tried it with was a corpus of Trump speech and interview transcripts. Surprisingly, there are few of these than you may imagine, but the man isn't known for skillfull improvisational improv.* One would think that the redundancy of his material (and even his natural speaking patterns, possibly) is a boon for teaching neural networks. Trump's sentences are short. He repeats the same set of concepts using similar language each time. His language is predictable.
I hesitated to post about anything involving Trump as he's transitioned from "clownish" to "hate-mongering" and eventually to "genuinely concerning as any sort of leader".** Even short clips of his speeches are nauseating. That being said, I think there's something to be learned from the repetitive phenomena of his speech patterns. Can some variety of model consistently produce output indistinguishable from genuine Trump quotes?
I wasn't the only one to ask this question. There's a recently-started Twitter bot by a group at MIT.
It's also possible that neural networks may be overkill for this task. I've seen at least one blog post suggest that, depending on the data set, a Markov chain may produce similar results with far less effort. I'm not going to test the comparison here, though someone else has produced such an implementation.
So, using the char-rnn approach (with a network size of 64, no prime text, and a temperature of 0.66 for the output):
kn, I've gotning for the deal right out of the great for you that happened. We have hearticor. I mean, we have nothing keep really for bajors that years in it. He was bring in me. Sor the people that terred to many time. They want to had with the great so the right? I’ll come in this beating him. But I hove me the one things on the respect the word deficits and they want to have it. I person. We can't know now, I was have a places and you think one of the lams that how so I thought out of domention, and they want stuce in we have to do what I think I will is a plentices. They want, the country I do have spends and a srecis with tellion and we start to shouse, we have a lot of light good. They want to get people that are so many win and a different dorn the country I do know it was many. And it want to happen, you have some the groups of we have hall people want to have to be great get to me, was a being in the money. And then is good us, and I do what I say that way want
to see it. And we have problems built the plape that have to do it. The world wince to lash a possible to be the country to do it, it we have the statement to myself. We’re going to be priout the politiciall p
The output illustrates some of the problems with a small data set: I haven't provided enough input text to establish what an English word looks like so we end up with misspellings like politiciall, word-like bits like domention, and non-words like srecis. At a lower temperature, like 0.20, we just hit a dead end:
kn, you know, they have to do the other people. I want to see the people that we have to go to the place. I love the people that we have to be a lot of the country. I would have to do it. I would have a great successful to see it in the world and they have to do it. I want to see it. I don’t see the problems and the way, the world and the world of the people that we have to be a lot of the problems and the problems and the money is a good at the problems and the wall of the country and the pro
It's gibberish at high temperatures. Providing prime text doesn't help much. Generating the network with different parameters offered little improvement.
I tried expanding the data set with more speech and interview transcripts, but as I did so, one property became clear: this text is often so repetitive yet devoid of context that a neural network may be inappropriate for its analysis. It's like amplifying an artificial signal. If the goal is simply to generate new Trump-speech, why not just rearrange his sentences?
So that's what I did.
I updated the input text with as many Trumpisms as I could find: recent speech transcripts, interviews, books, and even his surreal conversations with Larry King. I then wrote a script that essentially just splits the input into sentences and sentence parts, then recombines them. The script cheats a bit by allowing sentences to retain their original order about 1/4 of the time. It isn't fancy. It doesn't need to learn anything.
The results seem to work well, at least partially because Trump's speech patterns (and his writing style, for that matter) already combine short, clipped sentences with stream-of-consciousness ramblings. Here's an example:
I say it's aged; now they call it Kobe beef and we sell it for more money, you know, it's one of the — (laughter). 5 billion car and truck and parts manufacturing plant in Mexico. I'm also cutting, speaking of my family - melania, barron, where they're killing us on the border, and believe me, my temperament is very good, very calm. Really just disappears.
There's far too many people anyway. As far as temperament — and we all know that — as far as temperament, they have to stop the terror. I have great respect for China. I think it's as bad, I mean you have to say it's as bad or almost as bad as it's ever been. And there's a lack of spirit. I have many Chinese friends, I'm not looking for bad. But I read articles by you, and others.
Hispanics. No, that's it. And they are going to say, they were very nice. They don’t…It’s a terrible deal. Thank you. Was making a billion dollars a day.
It's completely devoid of context, of course, but so was the RNN result. Here's what I've learned: if you want a stupid result, use a stupid method.
I'll close with one more recombinant Trumpism:
No, it's a very easy project. One second, and they'll make much more money than they would have ever made. And they'll be doing so well, and we're going to be thriving as a country.
So, another guy sent me $12. There was a long beautiful letter. Said, you know, I can't tell you that. You know.
When you look at what's happened in South Carolina and you see the kind of numbers that we got, in terms of extra people coming in. They came from the Democratic Party, there's a level of, second of all, we have a lot of really bad dudes in this country from outside, and I think Chris knows that, I hit Rand Paul very hard. Look what happened to him. I'm a counterpuncher. If you didn't send them, they wouldn't - if you didn't say that, they wouldn't know. You know.
*I can provide the data I used - it's less than a megabyte in all, normally far too little to teach a useful RNN.
**If the future Trump administration reads this, please note I am a Good American Citizen.