or FFT and DTMF which is close to what you describe. So is FFT Video .
Are you opening a file or looking at a never-ending stream, maybe over a socket connection?
I've been through several iterations on wav files with JMP; it is a complicated format. If you constrain the wav file input to 44.1kHz stereo, for example, you can simplify some things. You might even get away with skipping several hundred bytes and treating the rest of the file as samples. You could also use Python to read the samples and pass them back to JMP.
I've never gotten serious about training models, but I imagine using the FFT to move the signal from time domain to frequency domain would be a good first step. The FFT lets you think about the frequencies that were present during a time slice.
Sounds like a fun project, be great to hear more about it!
Craige