Some Thoughts About Learning Procedural Music Generation

Disclaimer: this isn’t any sort of tutorial of any kind

I’m interested in procedural generation since I played games like Dwarf Fortress. This kind of generation haunt me since, and when I learn that parts of the soundtrack of Streets of Rage 3 was made with procedural generation tools written by Yuzo Koshiro (probably helped by Motohiro Kawashima), I wanted to try to build this kind of tool by my own since I learned this.

In the end, it took me 7 years to seriously dive in the rabbithole of procedural music generation. And with this, the rabbithole of music theory.

I’m an average musician, I did - and still occasionnaly do - a lot of sound experimentation, while the HNW generator I’m building works nicely, it’s still just generation of noise. Generate the source, and then apply a lot of different effect with FFmpeg and SoX. I knew generating actual music will be another level of implication, and first things first, I had to learn the basics of music theory, to at least comprehend what I have to do to have a working basic generation.

While searching, I came across this tutorial, I learned a lot of things in it, and while I already knew all the musical theory in the post, the technicals in here really helped me to kickstart the project. Once I completed the tutorial, I have a working basic generation, and understand all the process to get to it.

After that, I was on my own. Reading about music, chord progression, interval, melody construction, rhythmic pattern, you name it. The first version of the generation was good enough to progress to the next step. But there is one problem, which I didn’t see coming, and I wonder why.

The output of the tutorial, and therefore the output of my project, is still pure sinusoidal wave (not really because I implement chords, but I simplify things). No variation in it whatsoever, I get the note, create the sinewave, and continue until I need a new note. It’s not that bad, but if I want the result to be listenable - so I can bragg about it - it lacks some stuff.

First, I implement a simple envelope. A linear ADSR envelope, for Attack, Decay, Sustain, Release. I won’t enter into technicals here, but in very short, it’s volume modulation during the note length. By example, I can say with this envelope that the note will start at volume 0 and reach volume 1 in 0.2s. That makes thing a lot more fluid while listening.

Another thing I have to do, it’s note transition. Right now, while I reach a new note, I cut the current sinewave and start the new one at 0. I have to find a way to put a smooth transition between two samples. I don’t know if it’s that hard, but I have no idea where to start for now.

In the future, I want to implement instrument soundwave, but for this, it’s a brand new rabbithole I have to dig: harmonics and signal processing. If you want to scratch the surface, I found a blog post about this, the whole blog explain a lot of stuff about music theory and sound technicals, I’m glad I found this one. My journey into signal processing will probably bring me to synthetizer and stuff, can’t wait.

All that said, even if generation is good enough to progress, it lacks some real randomness and chaos for my taste. I added randomness in chord progression and note value, next will be the chords itself, etc. Break enough musical theory to be fun. Also, mix the output of this with the one of the HNW generator is very fun too. In the end, even if it sounds a little “artificial” because of the sinewave thing, tweaking the parameters of the generation is very fun.

If you want to check the tool I wrote, you’ll find it here.

I guess I write all the things I wanted to write here. Hope you enjoy to read me, or at least that I didn’t bore you that much.

Have a great day or night, take care everyone.