So much to say: novel generation with birchcanoe

I'm trying something this month, partially for NaNoGenMo (essentially a computational, generative, and generally absurd version of National Novel Writing Month), and partially to address an idea that's been bouncing around in my head for a while now. I'm writing a program to assemble stories (perhaps more like loosely-connected prose, and slightly like novels) from existing collections of short, somehow homogeneous sentences. This will start with the Harvard Sentences - a set of carefully-engineered sentences originally intended for testing speech quality in telephone systems. They all read like this:

 1. Slide the box into that empty space.
 2. The plant grew large and green in the window.
 3. The beam dropped down on the workman's head.
 4. Pink clouds floated with the breeze.
 5. She danced like a swan, tall and graceful.
 6. The tube was blown and the tire flat and useless.
 7. It is late morning on the old wall clock.
 8. Let's all join as we sing the last chorus.
 9. The last switch cannot be turned off.
10. The fight will end in just six minutes.

There's a flat but poetic quality to all these sentences, like their edges have been sanded down, and most of that quality - unsurprisingly - only arises when they're spoken aloud. So, my code will need to generate stories with a similar quality.

The project is called birchcanoe, after the first of the Harvard Sentences in the list. There was a related magnetic poetry-style software toy a few years back, but I'm going for a larger scope here, though the exact extent remains to be seen. Updates below and on Github.


Nov 1 - Started the project. Not much code yet.

Nov 2 - Code is now at the point where it produces randomly-ordered sentences, so it's comparable to very early efforts into book generation. Still too basic to consider innovative.

Nov 8 - Got sick for a few days and that tends to fog the coding process. Still, I've elected to use seed sentences and retrieve new, similarly-structured text to populate each chapter rather than composing each entirely from the initial seeds. I've updated accordingly. Output is filler for now, like this:

She saw a cat in the neighbor's house. Saw neighbor's she neighbor's the saw saw she house. In house. Cat house. A in in cat the neighbor's the a cat the saw house. Cat the in cat a house. She a saw house. A cat house. The in cat cat the house. In the saw saw she saw in house. Cat neighbor's saw in cat the she a a neighbor's in a house. The in in in house. In house. The neighbor's saw house. House. The she house. Saw house. Cat house. She a house.

Dec 5 - Obviously I've passed the deadline on this one. That's how life goes sometimes. The project is not over, however: the plan is to record audio of my readings of each of the input sentences, use the DeepSpeech engine to build a text-to-speech model with the audio as training data, then use that model to produce audio books of the generated text. It will sound strange, and that's kind of the point (otherwise, I could use one of the many extant text-to-speech systems or their APIs).