Haskell synthesizer: playing with functional sound generation

Esta postagem também está disponível em: Portuguese (Brazil)

Some time ago, in a cloudy friday afternoon, I had nothing to do. Me and my friends started talking about the Unix philosophy, and in particular, how everything in Unix is a file, even devices are represented as files. So I thought: the sound card is a file! What happens if I write to it? Then we started feeding lots of files (PDFs, /dev/urandom, some source code and some images) to the sound card and where amused :)

But then we decided to have more fun and feed some more “organized” bytes to the sound card – we would generate our audio samples programmatically. And what better tool to do some data generation than Haskel!? So I started writing my first audio synthesizer, and after some hours it was ready and working awesomely… I was really excited with the results, so I just NEEDED to share this nice piece of code with the world :)

My Haskell tone synthesizer takes a “score file” as input, that describes the tune you want the synthesizer to generate. This input file has a syntax very similar to the old Nokia Ringtone Composer syntax… The program reads the score from the file and outputs a lot of bytes to standard output. If you redirect the standard output to your sound card, you will hear your favorite tunes! So let me guide you through a tour of this program’s source code:

This tour will be in top-bottom style: we will start by looking at the Main module and its major processing steps, and then dwell a bit deeper into some of the most important functions of each module. So, to start, here is the entire Main module:

As you can see in the second line of “main”, we need 2 command-line parameters: “file” and “bpmStr” (which will be read into an Int called “bpm”). “file” is the name of the file containing our score, and “bpm_” is the tempo in which the song will be played (in beats per minute – BPM). We then proceed to parse the score from the file, using the “parseFromFile” function (from Parsec). This function takes a parser to apply and a filename from which to read an input stream. Our conveniently-named “score” parser is defined in a separate file (Melody.hs) and will not be further discussed… You can see it if you download the full source code at the end of the post.

The “parseFromFile” function has type Parser a → String → Either ParseError a, which means its result is Either our desired outcome (Right [Note]) or an error (Left ParseError). We then apply the “either” function over “parsedScore” to do case analysis: in case a perfect parse ocurred, the “id” function is applied and the result is kept as-is; in case a parse error has been found, we apply (error ∘ show) – which just displays the error message on standard output and terminates the program. The next “big step” in the program is to turn the music into concrete form.

The parsed score is a sequence of musical notes with relative durations (1, 1/2, 1/4, etc.) and abstract pitch names (C, D, E, etc.), so the the task of “concrete bpm” is to take each abstract note and make both pitch and duration “concrete”. Its type is:

It takes a Tempo and a Note and produces a pair representing a concrete sound, where the first element is the frequency of this sound (in Hz) and the second element is its duration (in seconds).

After we have our list of sound pairs to play (“concreteMusic”), we reach the last and most important step, which is to actually generate the samples and output them, using “concreteMusic” to guide the generation process. This is done by “produceStream”. For each pair of frequency and duration “(f,t)”, the following is done:

  1. Samples for a (infinite) square wave of frequency f are generated by a call to “signal f”.
  2. This signal is then “sliced” to obtain a piece of length t, by a call to “slice t”

Finally, this whole sequence of “signal slices” is concatenated together into a ByteString which goes to standard output… Now going deeper, we will dive to file “Signal.hs” and take a look at the definitions of “signal” and “slice”, along with some helper functions:

Let’s start explaining this module (“Signal.hs”) from the ground up: our signal is an infinite sound wave. The most characteristic feature of a wave is that it’s periodic, which means it’s just an infinite repetition of a pattern. Therefore we have the “period” function here, which, given the number of samples to generate and the desired waveform, generates a list of samples that follows this waveform. So that’s how “signal” uses “period” in order to make a wave: it takes the requested frequency and then calculates (according to the sample rate) how many samples are there in one period. It uses the “period” function to generate a “singlePeriod”, which it then cycles and packs into an infinite and repetitive ByteString. The “silence” function is only there to produce a “dummy” period when we need a pause. It doesn’t need to have more than one sample, however, because it’s going to be transformed into an infinite ByteString of 0′s anyway…

The “cycleAndPack” function takes one period ([Int]) and transforms it into a (lazy) infinite ByteString. Its type should be very self-evident, after all. The “(map fromIntegral)” part transforms our [Int] into [Word8], which then serves as input for “B.pack”, which actually creates the ByteString, then made infinite (repeated indefinitely) by “B.cycle”.

Now to the slicing! :) The task of the “slice” function is to transform an infinite wave (infinite ByteString) of a certain frequency into a finite wave (finite ByteString), with the requested duration in seconds. The type of slice is:

Which makes evident its role as a “ByteString-transforming” function. The definition of “slice” – in point-free style – also helps explaining what it does: to slice an infinite ByteString is just to take n of its first elements, where n is the product of the desired time in seconds by the sampling rate in Hz.

Before continuing to show you the code, allow me to take a slight detour and say how Haskell’s purely-functional features help us express this program in such an elegant (and still efficient) way:

  1. Lazyness: Due to the possibility of using lazy evaluation, we were able to represent sound waves as infinite streams of samples, separating stream generation from stream usage.
  2. Referential transparency: Haskell’s purity means that every function is transparent. Thus, the value of any expression needs only to be computed once, and can be substituted for all occurrences of that expression. This is important because – in a single song – several notes will have the same frequency (and therefore will result in identical signals). Thanks to referential transparency, it is guaranteed that we only need to compute “signal f” once for each f, and thus we will only have one stream for each frequency in our song.

Well, having understood how “signal” and “slice” work, the puzzle is now almost entirely solved. The only remaining mystery should be how the actual input file (in Nokia-Composer-style) is parsed, but – as I said before – the parser is mostly straightforward and uses just basic Parsec. The input parser is the largest single piece of code in the whole synthesizer (only relatively big, but still small), so I’ll only show you a summarized version here and won’t even explain it. You can grab the full code at the end of the post and take a more careful look… So here it goes, the core of our input parser:

And now, a bit of fun: to finish up the post nicely, I have included some sample song files in the synthesizer’s tarball; then I ran the synthesizer over them and encoded the output as MP3. Now you can listen to two catchy tunes generated by Haskell!

  • Europe – The Final Countdown:
    • Song score:
      p4, p8, b16, a16, b4, e4, p4, p8, c'16, b16, c'8, b8, a4, p4, p8, c'16, b16, c'4, e4, p4, p8, a16, g16, a8, g8, f#8, a8, g4, p8, f#16, g16, a4, p8, g16, a16, b8, a8, g8, f#8, e4, c'4, b2, p4, b16, c'16, b16, a16, b1
    • MP3:

      Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

  • Zelda Main Theme:
    • Song score:
      a#4, f4, f8, f16, a#16, a#16, c'16, d'16, d#'16, f'2, p8, f'8, f'8, f#'16, g#'16, a#'2, p8, a#'8, a#'8, a#'8, g#'16, f#'16, g#'8, g#'16,  f#'16, f'2
    • MP3:

      Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

Finally, as always, there comes what REALLY matters: The Source Code. The code of this synthesizer, along with several of my other “toy programs”, lives in the “Katas” repository on GitHub. You can check it out here: https://github.com/joaopizani/katas . Download the code as a .zip file or even clone the repository and then play freely!… I strongly recommend that you use cabal-dev to build all your Haskell projects, so in case you want to follow my recommendation, the steps to build and run (in Unix) the synthesizer are something like this:

cd ToneSynthesizer
cabal-dev install
./cabal-dev/bin/tonesynthesizer songs/  | aplay -t raw -f U8 -r 16000

I wish you all a whole lot of excitement and fun while coding in Haskell, and would certainly appreciate suggestions and critique regarding the code I just shared with you. That’s all, folks! :)

This entry was posted in Haskell, Tech. Bookmark the permalink.

One Response to Haskell synthesizer: playing with functional sound generation

  1. Derek Elkins says:

    Look up the Karplus-Strong algorithm. Also many wind instruments are well approximated by a static weighted mixture of (decaying) harmonics which you can look up or even record and analyze yourself. In general, I recommend Julius Orion Smith’s pages and references from there. These are things that are extremely easy to implement, particularly in Haskell, and will go a long way toward producing music that’s listenable. Anyway, physically-based sound synthesis and audio processing in general is a rather fun field.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" cssfile="">