https://github.com/SparkAudio/Spark-TTS
https://huggingface.co/models?sort=downloads&search=TTS

安装

克隆并安装

克隆仓库

git clone https://github.com/SparkAudio/Spark-TTS.git
cd Spark-TTS

创建 Conda 环境:

conda create -n sparktts -y python=3.12
conda activate sparktts
pip install -r requirements.txt
# If you are in mainland China, you can set the mirror as follows:
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com

模型下载

通过python下载:

from huggingface_hub import snapshot_download

snapshot_download("SparkAudio/Spark-TTS-0.5B", local_dir="pretrained_models/Spark-TTS-0.5B")

通过 git clone 下载:

mkdir -p pretrained_models

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install

git clone https://huggingface.co/SparkAudio/Spark-TTS-0.5B pretrained_models/Spark-TTS-0.5B

基本用法

你可以简单地使用以下命令运行该演示:

cd example
bash infer.sh

或者也可以直接在命令行中执行以下命令进行推理:

python -m cli.inference \
    --text "text to synthesis." \
    --device 0 \
    --save_dir "path/to/save/audio" \
    --model_dir pretrained_models/Spark-TTS-0.5B \
    --prompt_text "transcript of the prompt audio" \
    --prompt_speech_path "path/to/prompt_audio"

Web UI 使用
您可以通过运行 来启动UI界面python webui.py --device 0,可以进行语音克隆和语音创建。语音克隆支持上传参考音频或直接录制音频。

通过Dockerfile构建并运行

Dockerfile

FROM python:3.12-slim
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
RUN echo 'from huggingface_hub import snapshot_download' > download_models.py
RUN echo 'snapshot_download("SparkAudio/Spark-TTS-0.5B", local_dir="pretrained_models/Spark-TTS-0.5B")' >> download_models.py
RUN python download_models.py
CMD ["python", "webui.py", "--device", "0", "--server_port", "5006"]

构建

docker build -t spark-tts:v1 .

运行

docker run -itd -p 5006:5006 spark-tts:v1
最后修改:2025 年 03 月 26 日
如果觉得我的文章对你有用,请随意赞赏