Datasets
Audio Players
Speech Toolkits
- espnet
- Merlin
- Silero Models
- aeneas: automagically synchronize audio and text (aka forced alignment)
- awesome-speech-recognition-speech-synthesis-papers
Speech Synthesis
- Resemble.ai
- Mozilla TTS
- W3 Speak
- Text to speech: SSML by Google
- tacotron
- coqui-ai TTS
- Coqui: Freeing Speech
- TensorFlowTTS
- gTTS
- larynx
- Parallel WaveGAN
- MockingBird
Persian
- Lilak project
- Tihu
- AlisterTA: Persian-text-to-speech
- Persian pronounciation
- Persian_Tacotron2
- Ariana
- Gata
- Amer Andish
Voice assistants
NVIDIA
- NeMo - Text to Speech
- NVIDIA Nemo
- NVIDIA Nemo example scripts
- Nemo TTS models
- NVIDIA Nemo Jupyter Notebooks
- I AM AI
- Voice swap sample
- Generate Natural Sounding Speech from Text in Real-Time
- All the Feels: NVIDIA Shares Expressive Speech Synthesis Research at Interspeech
- Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
Text to Speech
- NVIDIA Blog: Speech Synthesis
- Building a Text-to-Speech Service that Sounds like You, Using NVIDIA NGC and NVIDIA A100 GPUs
- All the Feels: NVIDIA Shares Expressive Speech Synthesis Research at Interspeech
- Text to speech: Isaac SDK