ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
Paper
•
2207.06389
•
Published
Model type: Diffusion-based text-to-speech generation model
Language(s): English
Model Description: A conditional diffusion probabilistic model capable of generating high fidelity speech efficiently.
Resources for more information: FastDiff GitHub Repository, FastDiff Paper. ProDiff GitHub Repository, ProDiff Paper.
Cite as:
@inproceedings{huang2022prodiff,
title={ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech},
author={Huang, Rongjie and Zhao, Zhou and Liu, Huadai and Liu, Jinglin and Cui, Chenye and Ren, Yi},
booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
year={2022}
@inproceedings{huang2022fastdiff,
title={FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis},
author={Huang, Rongjie and Lam, Max WY and Wang, Jun and Su, Dan and Yu, Dong and Ren, Yi and Zhao, Zhou},
booktitle = {Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, {IJCAI-22}},
year={2022}
This model card was written based on the DALL-E Mini model card.