The Visual Microphone: Passive Recovery of Sound from Video

Reviews

main
+

Information

Paper topic: Images
Software type: Code
Able to run a replicability test: True
Replicability score: 3
Software language: Matlab / Mathematica / ..
License: unspecified
Build mechanism: Not applicable (python, Matlab..)
Dependencies: matlab / pyrTools
Documentation score {0,1,2}: 1
Reviewer: Nicolas Bonneel <nicolas.bonneel@liris.cnrs.fr>
Time spent for the test (build->first run, timeout at 100min): 40min

Source code information

Code URL: http://people.csail.mit.edu/abedavis/research/VisMic/VMSlim.zip
MD5 hash of the code archive: 05EF484CB2D16F8886F3DCE13187991B

Comments

The code partially implements the paper, as there is no support for low-framerate videos by exploiting rolling shutter.
For the remaining high fps videos, some of them did not work at all as they resulted in errors (randomly either "Unable to read the file." or "Dot indexing is not supported for variables of this type (l. 275 of VideoReader/read)) which I could not debug, perhaps due to some codec issue. This was the case of Chips2-2200Hz-Mary_MIDI-input.avi,Chips1-2200Hz-Mary_Had-input.avi and Plant-2200Hz-Mary_MIDI-input.avi).
I successfully ran the code on Chips1-20000Hz-Mary_Had-input.avi. The script (which loads a file 'crabchipsRamp.avi' which I did not find) needs to be adapted so that dsamplefactor = 1 instead of 0.1, otherwise the result is almost pure noise, and of course samplingrate = 20000. **Beware** as well that the default nscales = 1 while the paper's results were produced with nscales = 4 (page 4 in the paper), although I didn't hear much difference in the result.
With these settings, I managed to recover a sound in about 1.5 hours on a good laptop, but the sound is much noisier (though still impressive!) than the result shown in the accompanying webpage. The resulting spectrogram can be found here: https://pasteboard.co/ILOq404.png
and the corresponding sound here: https://voca.ro/3qdSKf1zGkX
The webpage states that the output were further processed with "speech enhancement audio denoising" (the paper indicates [Loizou 2005]), though I could not find code for that algorithm.
Since matlab R2015, wavwrite has been replaced by audiowrite.