Coqui TTS 安装与测试

Haulyn5

12423人浏览 · 2022-09-09 10:27:19

Haulyn5 · 2022-09-09 10:27:19 发布

前言

本篇记录一下 Coqui TTS 的安装。Coqui-TTS 的主要作者是德国人，这个库似乎之前和 Mozilla 的 TTS （https://github.com/mozilla/TTS）有千丝万缕的关系，但是现在后者的 TTS 已经停止更新，而 Coqui TTS 更新一直很稳定，是目前少数几个更新比较稳定的开源语音库。（其他有 ESPNET，SpeechBrain 等）

GitHub 官网：

GitHub - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production - GitHub - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and productionhttps://github.com/coqui-ai/TTS

正文

首先直接跳到 Readme 的安装部分。

pip install TTS

或者

git clone https://github.com/coqui-ai/TTS
pip install -e .[all,dev,notebooks]  # Select the relevant extras

直接用 pip 感觉就不太行，当然要 git clone 一下，极客一点。结果第二行就没看懂，pip install -e 是什么，后面方括号是什么……（-e 表示拉了一个链接，目录的更新实时生效；方括号及方括号内的东西其实不加也行）

于是还是 pip install TTS 了

这里普通用户应该直接开始漫长等待就好了，注意 TTS 是依赖 torch 的，由于 torch 庞大的体积，所以可能是要等很久。但是我这里由于环境问题，只能用特定版本的 torch，否则用不了 GPU。。所以就要先安装 torch 在安装 TTS了，避免自动安装的 torch 无法使用。这里 TTS 的依赖是 torch 版本大于 1.7 即可。

下面是克隆环境的 conda 的命令：

conda create --name coquiTTS --clone espnet
--clone 命令能够复制之前的库，速度应该是稍微快一点，但是实测会重新下载 torch，不知道是什么原因，暴躁的我直接 cp 命令手动拷贝了hh（目~~前似乎没有什么坏的影响，本来还以为会有一些元文件被覆盖会出错~~ ）

好，影响就是我 pip 安装的包都安装回原来的路径了，手动捂脸哭，本来想着不要影响原来的环境，~~正确的解法是加上 --offline~~

conda create --name coquiTTS --clone original_env  --offline

不知道为什么，torch 本来我本机环境有，还是会要自己下载，offline 模式会报错，说尝试 offline 模式建立远程连接。所以只能

conda create --name coquiTTS --clone original_env

复制环境后，pip 安装 TTS，会有很多乱七八糟的依赖，所以记得设置源。不然会很慢：

pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
pip install TTS

安装完会有很长很长的输出。(结果是我给安装到原来的环境里了）

测试

先是官方的测试。

tts --list_models

这里的输出是（TTS版本 0.8.0）：

Name format: type/language/dataset/model
1: tts_models/multilingual/multi-dataset/your_tts
2: tts_models/en/ek1/tacotron2
3: tts_models/en/ljspeech/tacotron2-DDC
4: tts_models/en/ljspeech/tacotron2-DDC_ph
5: tts_models/en/ljspeech/glow-tts
6: tts_models/en/ljspeech/speedy-speech
7: tts_models/en/ljspeech/tacotron2-DCA
8: tts_models/en/ljspeech/vits
9: tts_models/en/ljspeech/fast_pitch
10: tts_models/en/vctk/vits
11: tts_models/en/vctk/fast_pitch
12: tts_models/en/sam/tacotron-DDC
13: tts_models/en/blizzard2013/capacitron-t2-c50
14: tts_models/en/blizzard2013/capacitron-t2-c150
15: tts_models/es/mai/tacotron2-DDC
16: tts_models/fr/mai/tacotron2-DDC
17: tts_models/uk/mai/glow-tts
18: tts_models/zh-CN/baker/tacotron2-DDC-GST
19: tts_models/nl/mai/tacotron2-DDC
20: tts_models/de/thorsten/tacotron2-DCA
21: tts_models/de/thorsten/vits
22: tts_models/de/thorsten/tacotron2-DDC
23: tts_models/ja/kokoro/tacotron2-DDC
24: tts_models/tr/common-voice/glow-tts
25: tts_models/it/mai_female/glow-tts
26: tts_models/it/mai_female/vits
27: tts_models/it/mai_male/glow-tts
28: tts_models/it/mai_male/vits
29: tts_models/ewe/openbible/vits
30: tts_models/hau/openbible/vits
31: tts_models/lin/openbible/vits
32: tts_models/tw_akuapem/openbible/vits
33: tts_models/tw_asante/openbible/vits
34: tts_models/yor/openbible/vits
1: vocoder_models/universal/libri-tts/wavegrad
2: vocoder_models/universal/libri-tts/fullband-melgan
3: vocoder_models/en/ek1/wavegrad
4: vocoder_models/en/ljspeech/multiband-melgan
5: vocoder_models/en/ljspeech/hifigan_v2
6: vocoder_models/en/ljspeech/univnet
7: vocoder_models/en/blizzard2013/hifigan_v2
8: vocoder_models/en/vctk/hifigan_v2
9: vocoder_models/en/sam/hifigan_v2
10: vocoder_models/nl/mai/parallel-wavegan
11: vocoder_models/de/thorsten/wavegrad
12: vocoder_models/de/thorsten/fullband-melgan
13: vocoder_models/de/thorsten/hifigan_v1
14: vocoder_models/ja/kokoro/hifigan_v1
15: vocoder_models/uk/mai/multiband-melgan
16: vocoder_models/tr/common-voice/hifigan

下面是查看模型信息

tts --model_info_by_name tts_models/tr/common-voice/glow-tts

> model type : tts_models
> language supported : tr
> dataset used : common-voice
> model name : glow-tts
> description : Turkish GlowTTS model using an unknown speaker from the Common-Voice dataset.
> default_vocoder : vocoder_models/tr/common-voice/hifigan

tts --text "text for TTS" --out_path ./test_speech.wav

终于到了激动人心的时刻，要合成语音了。执行命令后，开始自动下载预训练模型，用了 3 分钟，经典连接中断报错。网络环境，哎。

 > Downloading model to /root/.local/share/tts/tts_models--en--ljspeech--tacotron2-DDC
Traceback (most recent call last):
  File "/root/miniconda3/envs/coqui/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/root/miniconda3/envs/coqui/lib/python3.8/site-packages/urllib3/connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/root/miniconda3/envs/coqui/lib/python3.8/site-packages/urllib3/connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "/root/miniconda3/envs/coqui/lib/python3.8/http/client.py", line 1348, in getresponse
    response.begin()
  File "/root/miniconda3/envs/coqui/lib/python3.8/http/client.py", line 316, in begin
    version, status, reason = self._read_status()
  File "/root/miniconda3/envs/coqui/lib/python3.8/http/client.py", line 285, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

注意！Coqui 下载模型失败后，并没有检查有没有下载成功。这个再次合成，会报错。需要自己手动删掉下载的目录，再重新下载，否则会一直报错。参考链接：https://github.com/coqui-ai/TTS/issues/927

cd ~/.local/share/tts/
rm -r tts_models--en--ljspeech--tacotron2-DDC/

然后非常玄学的事情发生了，我再次尝试下载时，没有做任何修改，就下载成功了。所以同学们可以多尝试几次。

tts --text "text for TTS" --out_path ./test_speech.wav

100%|████████████████████████████                                                                                                                                                    █████████████████████████████████                                                                                                                                                          █████████████████████████████████                                                                                                                                                          █████████████████████████████████                                                                                                                                                          ████████████████████| 113M/113M [                                                                                                                                                          05:58<00:00, 315kiB/s]
 > Model's license - apache 2.0
 > Check https://choosealicense.c                                                                                                                                                          om/licenses/apache-2.0/ for more                                                                                                                                                           info.
 > Downloading model to /root/.lo                                                                                                                                                          cal/share/tts/vocoder_models--en-                                                                                                                                                          -ljspeech--hifigan_v2
100%|█| 3.80M/3.80M [00:01<00:00,
 > Model's license - apache 2.0
 > Check https://choosealicense.c                                                                                                                                                          om/licenses/apache-2.0/ for more                                                                                                                                                           info.
 > Using model: Tacotron2
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:20
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:False
 | > symmetric_norm:True
 | > mel_fmin:0
 | > mel_fmax:8000.0
 | > pitch_fmin:0.0
 | > pitch_fmax:640.0
 | > spec_gain:1.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:2.718281828459045
 | > hop_length:256
 | > win_length:1024
 > Model's reduction rate `r` is                                                                                                                                                           set to: 1
 > Vocoder Model: hifigan
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:20
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:False
 | > symmetric_norm:True
 | > mel_fmin:0
 | > mel_fmax:8000.0
 | > pitch_fmin:0.0
 | > pitch_fmax:640.0
 | > spec_gain:1.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:False
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:2.718281828459045
 | > hop_length:256
 | > win_length:1024
 > Generator Model: hifigan_gener                                                                                                                                                          ator
 > Discriminator Model: hifigan_d                                                                                                                                                          iscriminator
Removing weight norm...
 > Text: text for TTS
 > Text splitted to sentences.
['text for TTS']
 > Processing time: 0.78575992584                                                                                                                                                          22852
 > Real-time factor: 0.4602105388                                                                                                                                                          021246
 > Saving output to ./test_speech                                                                                                                                                          .wav

后面这个库应该是提供 tts server 开一个 HTTP 端口做在线 demo 的，但是我这里因为环境问题可能要之后更了。

后记

20230217 更新，发现 TTS 库会限定 librosa==0.8.0，导致我的 librosa 版本不对，然后画频谱图和 Matplotlib 的 API 会不兼容，进而报错。于是默认环境的 TTS 就卸载了。

GitCode 开源社区

旨在为数千万中国开发者提供一个无缝且高效的云端环境，以支持学习、使用和贡献开源项目。

更多推荐

[转载]在Windows环境下安装GNU Radio

转自：在Windows环境下安装GNURadio_恐弱智_新浪博客GNU Radio是用Python开发的，大部分开源的工程能够在Linux环境下运行良好，而Windows下却运行的很勉强，而且安装配置都很复杂。GNU Radio算是个例外了，不光提供了Windows的二进制安装，还有比较详细的说明。我是Python小白，所以折腾了好久才弄好，特意记录下来，免得以后再装还折腾。GNU Radio的

GitCode 开源社区

centOS 8 使用dnf安装Docker

DNF是什么？CentOS 8使用YUM软件包管理器版本v4.0.4。现在，该版本使用DNF(已删除YUM)。DNF是软件包管理器。它会在Linux发行版上安装，执行更新并删除软件包。使用DNF安装Docker跳过具有损坏依赖性的程序包一个有效的解决方案是使您的CentOS 8系统使用以下--nobest命令安装最符合条件的版本：sudo dnf install docker...

GitCode 开源社区

定时同步数据库表(mysql+linux+crontab)

sync.sh里面的参数需要改变，ip/username/password/database/tablesync.sh#!/bin/sh# Please change the IP and password of the data source db.# Then change the table name.filename=/home/nington/db/$(date +%Y-%m