Working with WAV files

Craige_Hales · Nov 22, 2016 12:05 PM

I've made several projects using audio in different ways: midi files, speech api, calling a windows DLL, and reading/writing WAV files.

This time I'll be working with WAV files again. JMP's BlobToMatrix and MatrixToBlob functions to operate on blobs (Binary Large OBjects). LoadTextFile and SaveTextFile know how to read and write blobs. A WAV file is a binary object (you won't get much joy opening a WAV file in a text editor). The in-memory representation of a binary byte in a JSL matrix is eight bytes long, which means a 1GB (1.5 hour 16-bit 44.1KHz stereo) WAV file needs 8GB of memory. This won't be an issue for the 60-second file I need. The same ideas can be used with any moderate sized binary files, not just WAVs, but as you'll see, decoding a binary file is complicated.

In a previous post I collected data from web logs; I'm using an IP-to-location file from MAXMIND to convert the IP addresses to locations and place names. By slicing the data across time (the one month long log file) I can make a video. Here's graph builder showing one frame. I'll make 1800 (30FPS * 60 seconds).

Color indicates five lat-lon clusters Color indicates five lat-lon clusters

One of JMP's clustering platforms used latitude and longitude to group the points into five clusters; activity in each cluster will drive some aspect of the music to go with the video. The music generating algorithm uses a state machine that was built from this idea. The chord transitions are driven by one of the color groups, and the percussion section by another, and the guitar, and the bass guitar. It's possible I didn't use all five colors; I stopped when I had something that sounded good. Random is not a good sound.

The instrument voices (choir, guitar, percussion) are loaded from WAV file samples that I made with LMMS. After several false starts, I decided the easiest way to get the samples loaded into JMP was to put a complete set of notes for one instrument in one file, evenly spaced, with some dead time between notes.

Really low guitar notes at the start of the WAV file Really low guitar notes at the start of the WAV file

The choir is the same and the percussion is similar. Save these files, then load them in JMP like this:

// RIFF loader
    RiffUtil = New Namespace(
        "RiffUtil"
    );

This is going to be a large project, so start with a name space to help isolate the details from other parts of the project. RIFF turns out to be the name of the file format behind a WAV file. There will be several utility functions in this name space. Next, add a function to load a WAV.

   RiffUtil:loadWav = Function( {filename},    {Default Local},
    riff = RiffUtil:load( filename );
    {fmtOffset, fmtLength} = riff:chunks["fmt "];
    riff:offset = fmtOffset;
	
// http://soundfile.sapp.org/doc/WaveFormat/
    //JMP offset is 1-based, these are 0-based
    //20        2   AudioFormat      PCM = 1 (i.e. Linear quantization)
    //                               Values other than 1 indicate some 
    //                               form of compression.
    q = RiffUtil:getShort( riff, 1 );
    riff:AudioFormat = q[1];
    If( riff:AudioFormat != 1,        Throw( "not PCM audio" )    );
//22        2   NumChannels      Mono = 1, Stereo = 2, etc.
    q = RiffUtil:getShort( riff, 1 );
    riff:NumChannels = q[1];
    If( riff:NumChannels != 2,        Throw( "not 2-channel stereo" )    );
//24        4   SampleRate       8000, 44100, etc.
    q = RiffUtil:getLong( riff, 1 );
    riff:SampleRate = q[1];
    If( riff:SampleRate != 44100,        Throw( "not 44.1 kHz" )    );
//28        4   ByteRate         == SampleRate * NumChannels * BitsPerSample/8
    q = RiffUtil:getLong( riff, 1 );
    riff:ByteRate = q[1];
    If( riff:ByteRate != 44100 * 2 * 2,        Throw( "not expected byte rate" )    );
//32        2   BlockAlign       == NumChannels * BitsPerSample/8
    //                               The number of bytes for one sample including
    //                               all channels. I wonder what happens when
    //                               this number isn't an integer?
    q = RiffUtil:getShort( riff, 1 );
    riff:BlockAlign = q[1];
    If( riff:BlockAlign != 2 * 2,        Throw( "not expected block align" )    );
//34        2   BitsPerSample    8 bits = 8, 16 bits = 16, etc.
    q = RiffUtil:getShort( riff, 1 );
    riff:BitsPerSample = q[1];
    If( riff:BitsPerSample != 16,        Throw( "not expected bits per sample" )    );
    Show( riff:AudioFormat, riff:NumChannels, riff:SampleRate, riff:ByteRate, riff:BlockAlign, riff:BitsPerSample );
    {dataOffset, dataLength} = riff:chunks["data"];
    riff:offset = dataOffset;
    nShorts = dataLength / 2;
    result = Shape( RiffUtil:getShort( riff, nShorts ), nShorts / 2, 2 );
    riff << delete;// the namespace
    result; // return integers         or... /32768;// return normalized +/- 1.0
);

Most of the comments are taken from the referenced site and describe how the WAV file is parsed. This code handles one WAV format and throws if it gets the wrong format. (16 bit 44.1KHz stereo is all I needed.) Of course this function calls another load function before it does anything else.

RiffUtil:load = Function( {filename},    {Default Local},
    riff = New Namespace();
    riff:data = Blob To Matrix( Load Text File( filename, BLOB ), "uint", 1, "big" );
    riff:offset = 1;
    If( RiffUtil:getString( riff ) != "RIFF",        Throw( filename || " is not a valid wav (RIFF missing)" )    );
    q = RiffUtil:getLong( riff, 1 );
    riff:length = q[1];
    If( RiffUtil:getString( riff ) != "WAVE",        Throw( filename || " is not a valid wav (WAVE missing)" )    );
	// walk the chunks and record their offsets
    riff:chunks = Associative Array();
    While( riff:offset < riff:length - 8,
        chunkName = RiffUtil:getString( riff );
        q = RiffUtil:GetLong( riff, 1 );
        chunkLength = q[1];
        riff:chunks[chunkName] = Eval List( {riff:offset, chunkLength} );
        riff:offset += chunkLength;
    );
    riff;//return
);

This function does the first part of the heavy lifting, loading the blob into memory, verify the RIFF tag, then the WAVE tag, then peeking inside the WAVE data to split out the named sub-chunks. The earlier function then does more heavy lifting with the fmt and data chunks.

    RiffUtil:getString = Function( {riff},
        {Default Local}, // return a 4-character id
        four = riff:data[riff:offset + 0 :: riff:offset + 3];
        riff:offset += 4;
        Blob To Char( Matrix To Blob( four, "uint", 1, "big" ) );
    );

getString retrieves 4 bytes as a JSL string; other functions get 1- 2- or 4-byte numbers. riff:offset is a variable that tracks the current position in the blob for the next byte to process. Note above how "four" is assigned a matrix of four elements, then converted to a blob, then converted to a string. This is done without a loop and runs fast. Here are the numeric versions.

// these two transforms assist in the two's complement conversion; if the MSB is on
    // ("negative") then also subtract maximum value
    RiffUtil:transform2 = ((0 :: 127) * (256 ^ 1)) || ((128 :: 255) * (256 ^ 1) - (256 ^ 2));
    RiffUtil:getShort = Function( {riff, n},
        {Default Local},
        chooser = riff:offset + ((0 :: n - 1) * 2);
        little = riff:data[chooser];
        biggest = riff:data[chooser + 1];
        riff:offset += 2 * n;
        little + RiffUtil:transform2[biggest + 1];
    );
    RiffUtil:transform4 = ((0 :: 127) * (256 ^ 3)) || ((128 :: 255) * (256 ^ 3) - (256 ^ 4));
    RiffUtil:getLong = Function( {riff, n},
        {Default Local},
        chooser = riff:offset + ((0 :: n - 1) * 4);
        little = riff:data[chooser];
        big = riff:data[chooser + 1];
        bigger = riff:data[chooser + 2];
        biggest = riff:data[chooser + 3];
        riff:offset += 4 * n;
        little + big * (256) + bigger * (256 ^ 2) + RiffUtil:transform4[biggest + 1];
    );

Twos complement is a subject for another time. We'll need a wavWriter later; here it is. It has the exact opposite functions for strings and numbers.


// wav writer...44.1 kHz 16 bit stereo
    RiffUtil:WavWriter = Function( {filename, mat},
        {Default Local},
        uints = N Rows( mat ) * N Cols( mat );
        wav = New Namespace();
        wav:mat = J( 44 + 2 * uints, 1, 0 ); // pre-allocate buffer
        wav:offset = 1; // 1-based
        // http://soundfile.sapp.org/doc/WaveFormat/
        //JMP offset is 1-based, these are 0-based
        //0         4   ChunkID          Contains the letters "RIFF" in ASCII form
        //                               (0x52494646 big-endian form).
        RiffUtil:putString( wav, "RIFF" );
//4         4   ChunkSize        36 + SubChunk2Size, or more precisely:
        //                               4 + (8 + SubChunk1Size) + (8 + SubChunk2Size)
        //                               This is the size of the rest of the chunk 
        //                               following this number.  This is the size of the 
        //                               entire file in bytes minus 8 bytes for the
        //                               two fields not included in this count:
        //                               ChunkID and ChunkSize.
        RiffUtil:putLong( wav, J( 1, 1, 36 + 2 * uints ) );
//8         4   Format           Contains the letters "WAVE"
        //                               (0x57415645 big-endian form).
        RiffUtil:putString( wav, "WAVE" );
//
        //The "WAVE" format consists of two subchunks: "fmt " and "data":
        //The "fmt " subchunk describes the sound data's format:
        //
        //12        4   Subchunk1ID      Contains the letters "fmt "
        //                               (0x666d7420 big-endian form).
        RiffUtil:putString( wav, "fmt " );
//16        4   Subchunk1Size    16 for PCM.  This is the size of the
        //                               rest of the Subchunk which follows this number.
        RiffUtil:putLong( wav, [16] );
//20        2   AudioFormat      PCM = 1 (i.e. Linear quantization)
        //                               Values other than 1 indicate some 
        //                               form of compression.
        RiffUtil:putShort( wav, [1] );
//22        2   NumChannels      Mono = 1, Stereo = 2, etc.
        RiffUtil:putShort( wav, [2] );
//24        4   SampleRate       8000, 44100, etc.
        RiffUtil:putLong( wav, [44100] );
//28        4   ByteRate         == SampleRate * NumChannels * BitsPerSample/8
        RiffUtil:putLong( wav, [176400] );
//32        2   BlockAlign       == NumChannels * BitsPerSample/8
        //                               The number of bytes for one sample including
        //                               all channels. I wonder what happens when
        //                               this number isn't an integer?
        RiffUtil:putShort( wav, [4] );
//34        2   BitsPerSample    8 bits = 8, 16 bits = 16, etc.
        RiffUtil:putShort( wav, [16] );
//          2   ExtraParamSize   if PCM, then doesn't exist
        //          X   ExtraParams      space for extra parameters
        //
        //The "data" subchunk contains the size of the data and the actual sound:
        //
        //36        4   Subchunk2ID      Contains the letters "data"
        //                               (0x64617461 big-endian form).
        RiffUtil:putString( wav, "data" );
//40        4   Subchunk2Size    == NumSamples * NumChannels * BitsPerSample/8
        //                               This is the number of bytes in the data.
        //                               You can also think of this as the size
        //                               of the read of the subchunk following this 
        //                               number.
        RiffUtil:putLong( wav, J( 1, 1, 2 * uints ) );
//44        *   Data             The actual sound data.
        RiffUtil:putShort( wav, mat );
        Open( Save Text File( filename, Matrix To Blob( wav:mat, "uint", 1, "big" ) ) );
        wav << delete;
    );

    RiffUtil:putString = Function( {wav, txt},
        {Default Local},
        mat = Blob To Matrix( Char To Blob( txt ), "uint", 1, "big" );
        len = N Rows( mat );
        wav:mat[wav:offset + 0 :: wav:offset + len - 1] = mat;
        wav:offset += len;
        0;//return
    );

// should longs in wav files ever go negative?  what about > 2GB data blocks?
    RiffUtil:putLong = Function( {wav, mat},
        {Default Local}, 
//show(mat);
        len = N Rows( mat ) * N Cols( mat );
        biggest = Floor( mat / 256 ^ 3 );
        mat -= biggest * 256 ^ 3;
        bigger = Floor( mat / 256 ^ 2 );
        mat -= bigger * 256 ^ 2;
        big = Floor( mat / 256 );
        mat -= big * 256;
        little = mat;
        wav:mat[wav:offset + 0 :: wav:offset + 4 * len - 1] = Shape( little || big || bigger || biggest, 1, 4 * len );
        wav:offset += 4 * len;
    );

    RiffUtil:putShort = Function( {wav, mat},
        {Default Local}, // mat has two stereo columns of -32k...+32k ints
        mat[Loc( mat < 0 )] += 256 ^ 2; // two's complement: -1==65535 ... -32768 == 32768
        len = N Rows( mat ) * N Cols( mat ); // len unsigned (or two's complement) 16-bit values (L+R)
        mat = Shape( mat, len, 1 ); // now mat is a list of 16 bit samples, one per row
        big = Floor( mat / 256 );
        mat -= big * 256; // high bytes
        little = mat; // low bytes
        //show(nrows(mat),ncols(mat),wav:offset,len,nrows(wav:mat),ncols(wav:mat),nrows(little),ncols(little));
        wav:mat[wav:offset + 0 :: wav:offset + 2 * len - 1] = Shape( little || big, 1, 2 * len );
        wav:offset += 2 * len;
    );

You might have noticed the riff<<delete and wav<<delete statements; this code creates temporary name spaces to hold objects; those name spaces should be deleted when no longer needed. In this case, they have done their work (reading data into a matrix, or writing a matrix to a file) and would just take up memory until JMP closes if they were not deleted.

Now it is time to load the notes. I'll make another name space for the note loader JSL to keep it isolated.


    noteLoader = New Namespace(
        "noteLoader"
    );
    noteLoader:load1 = Function( {path, samples, firstnote},
        {Default Local},
        Show( path );
        result = New Namespace();
        result:notes = Associative Array();
        temp = RiffUtil:loadWav( path );
        For( pos = 1, pos < N Rows( temp ) - samples, pos += samples, 
        //show(firstnote,pos,pos+samples-1,nrows(temp),ncols(temp));
            result:notes[firstnote] = temp[pos :: pos + samples - 1, 0];
            firstnote += 1;
        );
        result; // return
    );

"load1" does what I settled on for this project; "load" fell by the way side as the project progressed. load1 uses the wav loader to get a matrix of the audio data, then uses the samples parameter to step through the data and create an associative array of shorter matrices of audio data for single notes. Firstnote is the number asigned (by me) to the ascending sequence of notes.

There is also a composer name space; the composer JSL knows how to add notes to a composition.


    composer = New Namespace(
        "composer"
    );
    composer:create = Function( {seconds},
        {Default Local},
        composer:mat = J( seconds * 44100, 2, 0 );
        composer:choirPad4 = noteLoader:load1( path || "choir1/choirPad4.wav", 44100 * 8, 36);
        composer:guitarVarA = noteLoader:load1( path || "Guitar1/varA.wav", 44100 * 2, 12 );
        composer:guitarVarB = noteLoader:load1( path || "Guitar1/varB.wav", 44100 * 2, 12 );
        composer:guitarVarC = noteLoader:load1( path || "Guitar1/varC.wav", 44100 * 2, 12 );
        composer:guitarVarD = noteLoader:load1( path || "Guitar1/varD.wav", 44100 * 2, 12 );
        composer:percus = noteLoader:load1( path || "Drum1/percus.wav", 44100 * 2, 1 );
    );
    composer:compose = Function( {instrument, note, position, volume},    {Default Local},
        position = Round( 1 + position * 44100 );
        duration = N Rows( instrument:notes[note] );
        composer:mat[position :: (position + duration - 1), 0] += (instrument:notes[note]) * volume;
    );
    composer:new = Function( {},  {Default Local},
        composer:mat[0, 0] = 0
    );
    composer:limit = Function( {},  {},
        N Rows( composer:mat )
    );
    composer:save = Function( {filename},  {Default Local},
        x = Max( Abs( composer:mat ) );
        composer:mat = (20000 / x) * composer:mat; // normalize to +/- 20,000 out of +/- 32767
        RiffUtil:wavWriter( filename, composer:mat );
    );

The composer also knows how to start over (new) and finish (save).

Here's the audio in a youtube video. The attached files will reproduce it, approximately (because of the random choices), if you get a copy of LMMS to make the WAV files. Load the LMMS projects and export.