https://github.com/SparkAudio/Spark-TTS
https://huggingface.co/models?sort=downloads&search=TTS
安装
克隆并安装
克隆仓库
git clone https://github.com/SparkAudio/Spark-TTS.git
cd Spark-TTS
创建 Conda 环境:
conda create -n sparktts -y python=3.12
conda activate sparktts
pip install -r requirements.txt
# If you are in mainland China, you can set the mirror as follows:
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
模型下载
通过python下载:
from huggingface_hub import snapshot_download
snapshot_download("SparkAudio/Spark-TTS-0.5B", local_dir="pretrained_models/Spark-TTS-0.5B")
通过 git clone 下载:
mkdir -p pretrained_models
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/SparkAudio/Spark-TTS-0.5B pretrained_models/Spark-TTS-0.5B
基本用法
你可以简单地使用以下命令运行该演示:
cd example
bash infer.sh
或者也可以直接在命令行中执行以下命令进行推理:
python -m cli.inference \
--text "text to synthesis." \
--device 0 \
--save_dir "path/to/save/audio" \
--model_dir pretrained_models/Spark-TTS-0.5B \
--prompt_text "transcript of the prompt audio" \
--prompt_speech_path "path/to/prompt_audio"
Web UI 使用
您可以通过运行 来启动UI界面python webui.py --device 0
,可以进行语音克隆和语音创建。语音克隆支持上传参考音频或直接录制音频。
通过Dockerfile构建并运行
Dockerfile
FROM python:3.12-slim
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
RUN echo 'from huggingface_hub import snapshot_download' > download_models.py
RUN echo 'snapshot_download("SparkAudio/Spark-TTS-0.5B", local_dir="pretrained_models/Spark-TTS-0.5B")' >> download_models.py
RUN python download_models.py
CMD ["python", "webui.py", "--device", "0", "--server_port", "5006"]
构建
docker build -t spark-tts:v1 .
运行
docker run -itd -p 5006:5006 spark-tts:v1