Wednesday, February 22, 2012

Real-time spectrogram (and other audio-related data) visualization using Marsyas and OpenCV

Visualizing the output of Marsyas networks can be a tricky thing, because data is streamed in real time. I have found that a fast way to do that is by using python's OpenCV bindings, so that we can view, for example, a spectrogram being streamed in real time. For this to work, you will basically need to get data from a mrs_realvec to a numpy array. To do that, after you have created the network you will use:
 
net.tick()
out = net.getControl("mrs_realvec/processedData").to_realvec()
out = numpy.array(out)

That means that you will need to have: import numpy to get that functionality. You will have to pre-define a 2-dimensional numpy array that will store past values of your spectrogram. In our solution, we want the spectrogram to flow from right to left, hence we will need a 2-dimensional array where the columns represent time and the lines represent frequencies. The array may be initializes as:
 
Int_Buff = numpy.zeros([DFT_size, nTime])

where DFT_size is the size of your DFT and nTime is the number of time sample you want to store. After getting the out array, you should remove the first column of Int_Buff and append the new data to the last position. Before doing so, it is necessary to add a dimension to out and transposing it (this is due to the way numpy is implemented - one-dimensional arrays cannot be appended to two-dimensional array, and the conversion assumes the output is a line array, which is not what we want at this point). So, we will have:
 
if numpy.ndim(out)==1:     # If out is a 1-dimensional array,
 out = numpy.array([out]) # convert it to 2-dimensional array
Int_Buff = Int_Buff[:,1:] # Remove first column of Int_Buff
Int_Buff = numpy.hstack([Int_Buff,numpy.transpose(out)]) # Transpose / horizontal stack 

From that, you may yse the function array2cv(), defined in http://opencv.willowgarage.com/wiki/PythonInterface, to convert from a numpy array to cv's image format. Of course, to deal with that you will need to have:
 
import cv
im=array2cv(Int_Buff)

Remember that before dealing with images you will need to create a window where things will be displayed. For that, use:
 
cv.NamedWindow("Marsyas Spectral Analysis", cv.CV_WINDOW_AUTOSIZE)

So, the following lines tell OpenCV to show your data:
 
cv.ShowImage("Marsyas Spectral Analysis", im)
cv.WaitKey(10)

If you only do the steps above, you will probably get a black screen. You will want to normalize your output array before stacking it to your memory, using:
 
out = out/numpy.max(out)

Also, you may notice that, so far, the bass frequencies are on top while the trebles are on the bottom of the screen. That is the reverse of what we usually want. We will need to reverse the order of the output array, using:
 
out = out [::-1]

All of these ideas are coded in the spectral_analysis.py utility, already in the Marsyas repository. The actual implementation adds some other utilities, for example, the possibility of trimming the spectrogram so that only a certain frequency range is shown. The current program is an example implementation, and may be expanded for other uses, if necessary. If you just want to see how your voice's spectrogram looks like, try it:
 
python spectral_analysis.py