当前位置：网站首页>[speech synthesis] tensorflowtts Chinese text to speech

[speech synthesis] tensorflowtts Chinese text to speech

2022-07-24 00:09:00 【Wang Xiaoxi WW】

【 speech synthesis 】TensorFlowTTS Chinese text to speech

List of articles

【 speech synthesis 】TensorFlowTTS Chinese text to speech

brief introduction

This project is based on TensorFlowTTS Chinese speech synthesis based on Demo TensorFlowTTS It's an offline 、 Open source speech synthesis （text to speech) Model . It supports a variety of cutting-edge model choices , Have SOTA Level effect .

The source project path address is ：https://gitee.com/sherlocking_755/tts-demo

The reference materials of the project are ： An article teaches you how to get started with speech synthesis , Train a Chinese voice tts

Environment configuration

1、windows End （ The attempt failed ）

First pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple To configure TensorFlowTTS Environmental Science

stay windows Run in tensorFlowTTS Project will report an error ：

Traceback (most recent call last):
  File "D:\programSoftware\python\anaconda\envs\temp_env\lib\tempfile.py", line 258, in _mkstemp_inner
    fd = _os.open(file, flags, 0o600)
PermissionError: [Errno 13] Permission denied: 'D:\\programSoftware\\python\\anaconda\\envs\\temp_env\\lib\\site-packages\\librosa\\util\\__pycache__\\tmpyrv1bpb4'

During handling of the above exception, another exception occurred:

online Say modify

f = tempfile.NamedTemporaryFile(mode='w+', delete=False)

But I changed it and it didn't work

2、ubuntu End （ feasible ）

Here it is. WSL2 + docker Desktop Environment configuration under , That is to say docker Pull from ECS Anaconda Mirror image , And instantiate anaconda Containers

docker run -it --name="anaconda" -p 8888:8888 continuumio/anaconda3 /bin/bash

Next, configure under this container TensorFlowTTS The environment , Directly in linux The configuration in the system should also be quite different .

direct pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple that will do .

The following problems may exist during configuration , If you encounter it, you can refer to the following link ：

install llvmlite Report errors ： Reference resources https://blog.csdn.net/qq_41977618/article/details/119572879,https://www.cnblogs.com/kele-dad/p/12955804.html
install pyaudio Report errors ： Reference resources https://www.csdn.net/tags/MtjaYgysNjcxODQtYmxvZwO0O0OO0O0O.html

The program runs

1、 Load data

If it's true windows End , Then the decompressed nltk_data Put it in C:\Users\ user name \AppData\Roaming

If it's in ubuntu in , Then the decompressed nltk_data Put it in /root that will do

2、 Load model

First the tacotron2.part1.rar Extract several model files to the project root directory , load tacotron2 Model

MelGAN The model file is already in the project root directory

Then modify them respectively tacotron2 and MelGAN Model configuration file ：tacotron2.baker.v1.yaml and TensorFlowTTS/examples/multiband_melgan/conf/multiband_melgan.baker.v1.yaml, Here you can use the default configuration

Run directly under the project path python tts-demo.py that will do .

3、 Possible problems

When running a project , You may encounter the following problems , Please refer to the following link ：

When running a project ,numba Report errors , Reference resources https://blog.csdn.net/qq_41590635/article/details/112499219
```
TypeError: create_target_machine() got an unexpected keyword argument 'jitdebug'
```
numba Not recommended tsinghua Source , The package downloaded is wrong , Use the original pip Just go to the source

An error is prompted at runtime ,nltk_data stay /root I didn't find it under the folder , In the project nltk_data Copied to the /root Then you can ：

Resource cmudict not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('cmudict')

  For more information see: https://www.nltk.org/data.html

  Attempted to load corpora/cmudict

  Searched in:
    - '/root/nltk_data'
    - '/opt/conda/nltk_data'
    - '/opt/conda/share/nltk_data'
    - '/opt/conda/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************

The test results

Here we use the following 4 Test Chinese sentences

" Old trees and crows , Under a small bridge near a cottage a stream flows , Old road west wind thin horse . Westward declines the sun , Heartbroken people in the end of the world ",
" This is an open source end-to-end Chinese speech synthesis system ",
" Harden took the initiative to cut his salary 2 year 1450 Ten thousand dollars remain 76 ers ",
" Wu aping hasn't found it yet , The monk in Xuanzang Temple surprised me "

The average time of speech synthesis ：

phoneme seq: sil k u1 #0 t eng2 #0 l ao3 #0 sh u4 #0 h uen1 #0 ^ ia1 #0 x iao3 #0 q iao2 #0 l iou2 #0 sh uei3 #0 r en2 #0 j ia1 #0 g u3 #0 d ao4 #0 x i1 #0 f eng1 #0 sh ou4 #0 m a3 #0 x i1 #0 ^ iang2 #0 x i1 #0 x ia4 #0 d uan4 #0 ch ang2 #0 r en2 #0 z ai4 #0 t ian1 #0 ^ ia2 sil
index = 0, cost = 2.2670693397521973
phoneme seq: sil zh e4 #0 sh iii4 #0 ^ i2 #0 g e4 #0 k ai1 #0 ^ van2 #0 d e5 #0 d uan1 #0 d ao4 #0 d uan1 #0 zh ong1 #0 ^ uen2 #0 ^ v3 #0 ^ in1 #0 h e2 #0 ch eng2 #0 x i4 #0 t ong3 sil
index = 1, cost = 1.7127163410186768
phoneme seq: sil h a1 #0 d eng1 #0 zh u3 #0 d ong4 #0 j iang4 #0 x in1 #0 n ian2 #0 ^ uan4 #0 m ei3 #0 j in1 #0 x v4 #0 l iou2 #0 r en2 #0 d uei4 sil
index = 2, cost = 1.4303524494171143
phoneme seq: sil ^ u2 #0 ^ a5 #0 p ing2 #0 h ai2 #0 m ei2 #0 zh ao3 #0 d ao4 #0 x van2 #0 z ang4 #0 s ii4 #0 ch uan2 #0 zh en1 #0 h e2 #0 sh ang4 #0 q ve4 #0 r ang4 #0 ^ uo3 #0 j ing1 #0 d ai1 #0 l e5 sil
index = 3, cost = 1.7882094383239746
mean cost = 1.7995868921279907