The authors nevertheless provide a facial rig for Maya, which has been used in the paper "VisemeNet: Audio-Driven Animator-Centric Speech Animation" which provides code. Beware the maya scene includes a dependency to renderman (for rendering) which is abandonned by Nvidia since 2018 ( https://www.nvidia.com/en-us/design-visualization/solutions/rendering/product-updates/ ) but the rig can still be used without the fancy rendering.
If you want to contribute with another review, please follow these instructions.
Please consider to cut/paste/edit the raw JSON data attached to this paper.