VisemeNet: Audio-Driven Animator-Centric Speech Animation




  • Paper topic: Animation and Simulation
  • Software type: Code
  • Able to run a replicability test: True
  • Replicability score: 4
  • Software language: Python
  • License: unspecified
  • Build mechanism: Not applicable (python, Matlab..)
  • Dependencies: tensorflow / cudnn / numpy / scipy / matplotlib / python_speech_features / Maya
  • Documentation score {0,1,2}: 1
  • Reviewer: Nicolas Bonneel <>
  • Time spent for the test (build->first run, timeout at 100min): 40min

Source code information


Note that the VisemeNet code has strong requirements on software and library versions. The code does run on Python 3.5 but not on Python 3.6.5. Also, Python 3.5 now comes with a default scipy 1.4.0 which is not suitable, though an older scipy 1.1.0 works.
I could test the prediction with the provided trained network on the single provided audio file, as well as the Maya script to use the results on a public face rig. This worked nicely.
I did not test the training as data are only accessible upon (non-anonymous) request, and training instructions and scripts are not provided (although a non-standalone file is present, it does not run on its own).

--alternative test on linux--
I failed to have the tensorflow package within anaconda to work with

If you want to contribute with another review, please follow these instructions.

Please consider to cut/paste/edit the raw JSON data attached to this paper.