fumiama/Retrieval-based-Voice-Conversion-WebUI

mirror of https://github.com/fumiama/Retrieval-based-Voice-Conversion-WebUI.git synced 2026-06-05 01:10:22 +08:00

Files

Alex Murkoff 1e22d468ea feat(audio): use PyAV instead of ffmpeg (#31 )

* feat(audio): use PyAV instead of ffmpeg

replaced usage of ffmpeg in favor of PyAV (`av`)

* refactor(audio): store all of the audio related functions in the `infer.lib.audio`

refactors previous commit to have singular functions for each task, all located in `infer.lib.audio`

* fix(audio): remove downsample_audio from mdxnet.py

it is no longer needed, since it's imported from infer.lib.audio

* docs: remove every ffmpeg mention in the documentation to avoid confusion

* chore(requirements): remove ffmpeg-python and ffmpy from all requirements

* fix(audio): fix loading for UVR

wrapped gathering of META info from the stream into a function

fixes loading for UVR

* fix(audio): use np.frombuffer() instead of direct conversion of the resampled frames

this fixes traceback on preprocessing

* feat(audio): pre-allocate decoded_audio array in the load_audio function

this should improve performance, even if just a little

* Revert "docs: remove every ffmpeg mention in the documentation to avoid confusion"

This reverts commit 1e05bbce03.

* chore(format): run black on dev

* fix(requirements): revert removal of ffmpeg in unitest.yml and Dockerfile

* Revert "fix(requirements): revert removal of ffmpeg in unitest.yml and Dockerfile"

This reverts commit e28a0eebb2.

* feat(audio): pre-allocate numpy array to store the AudioFrame data in ndarray of dtype float32

* chore(format): run black on dev

* fix(audio): fix the decoded_audio size estimation

in estimated_total_samples we multiply by `sr` instead of `container.streams.audio[0].rate` since we want to estimate size of the OUTPUT file, not the input one. - Added dynamic resizing, in case something goes wrong and the size of decoded_audio is estimated incorrectly

Fixed function `load_audio` when the input audio's samplerate does not match the desired samplerate (`sr`)

* chore(format): run black on dev

* refactor(audio): remove `clean_path()` function as it serves no purpose anymore

* docs: remove everything related to ffmpeg

this includes everything except for formats support specification in the training_tips docs, since it has nothing to do with what ffmpeg does/did but rather what audio formats are supported (all the ones that ffmpeg supports!)

* docs: fix order of the steps in preparation in the READMEs

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

2024-06-12 20:13:26 +09:00

8.8 KiB

Raw Blame History

Retrieval-based-Voice-Conversion-WebUI

An easy-to-use voice conversion framework based on VITS.

Changelog | FAQ (Frequently Asked Questions)

The base model is trained using nearly 50 hours of high-quality open-source VCTK training set. Therefore, there are no copyright concerns, please feel free to use.

Please look forward to the base model of RVCv3 with larger parameters, larger dataset, better effects, basically flat inference speed, and less training data required.

There's a one-click downloader for models/integration packages/tools. Welcome to try.

Training and inference Webui	Real-time voice changing GUI

go-web.bat	go-realtime-gui.bat
You can freely choose the action you want to perform.	We have achieved an end-to-end latency of 170ms. With the use of ASIO input and output devices, we have managed to achieve an end-to-end latency of 90ms, but it is highly dependent on hardware driver support.

Features:

Reduce tone leakage by replacing the source feature to training-set feature using top1 retrieval;
Easy + fast training, even on poor graphics cards;
Training with a small amounts of data (>=10min low noise speech recommended);
Model fusion to change timbres (using ckpt processing tab->ckpt merge);
Easy-to-use WebUI;
UVR5 model to quickly separate vocals and instruments;
High-pitch Voice Extraction Algorithm InterSpeech2023-RMVPE to prevent a muted sound problem. Provides the best results (significantly) and is faster with lower resource consumption than Crepe_full;
AMD/Intel graphics cards acceleration supported;
Intel ARC graphics cards acceleration with IPEX supported.

Check out our Demo Video here!

Environment Configuration

Python Version Limitation

It is recommended to use conda to manage the Python environment.

For the reason of the version limitation, please refer to this bug.

python --version # 3.8 <= Python < 3.11

Linux/MacOS One-click Dependency Installation & Startup Script

By executing run.sh in the project root directory, you can configure the venv virtual environment, automatically install the required dependencies, and start the main program with one click.

sh ./run.sh

Manual Installation of Dependencies

Install pytorch and its core dependencies, skip if already installed. Refer to: https://pytorch.org/get-started/locally/
```
pip install torch torchvision torchaudio
```
If you are using Nvidia Ampere architecture (RTX30xx) in Windows, according to the experience of #21, you need to specify the cuda version corresponding to pytorch.
```
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
```
Install the corresponding dependencies according to your own graphics card.

Nvidia GPU
```
pip install -r requirements/main.txt
```
AMD/Intel GPU
```
pip install -r requirements/dml.txt
```
AMD ROCM (Linux)
```
pip install -r requirements/amd.txt
```
Intel IPEX (Linux)
```
pip install -r requirements/ipex.txt
```

Preparation of Other Files

1. Assets

RVC requires some models located in the assets folder for inference and training.

Check/Download Automatically (Default)

By default, RVC can automatically check the integrity of the required resources when the main program starts.

Even if the resources are not complete, the program will continue to start.

If you want to download all resources, please add the --update parameter.
If you want to skip the resource integrity check at startup, please add the --nocheck parameter.

Download Manually

All resource files are located in Hugging Face space

You can find some scripts to download them in the tools folder

You can also use the one-click downloader for models/integration packages/tools

Below is a list that includes the names of all pre-models and other files required by RVC.

./assets/hubert/hubert_base.pt

rvcmd assets/hubert # RVC-Models-Downloader command

./assets/pretrained

rvcmd assets/v1 # RVC-Models-Downloader command

./assets/uvr5_weights

rvcmd assets/uvr5 # RVC-Models-Downloader command

If you want to use the v2 version of the model, you need to download additional resources in

./assets/pretrained_v2

rvcmd assets/v2 # RVC-Models-Downloader command

2. Download the required files for the rmvpe vocal pitch extraction algorithm

If you want to use the latest RMVPE vocal pitch extraction algorithm, you need to download the pitch extraction model parameters and place them in assets/rmvpe.

rmvpe.pt

rvcmd assets/rmvpe # RVC-Models-Downloader command

Download DML environment of RMVPE (optional, for AMD/Intel GPU)

rmvpe.onnx

rvcmd assets/rmvpe # RVC-Models-Downloader command

3. AMD ROCM (optional, Linux only)

If you want to run RVC on a Linux system based on AMD's ROCM technology, please first install the required drivers here.

If you are using Arch Linux, you can use pacman to install the required drivers.

pacman -S rocm-hip-sdk rocm-opencl-sdk

For some models of graphics cards, you may need to configure the following environment variables (such as: RX6700XT).

export ROCM_PATH=/opt/rocm
export HSA_OVERRIDE_GFX_VERSION=10.3.0

Also, make sure your current user is in the render and video user groups.

sudo usermod -aG render $USERNAME
sudo usermod -aG video $USERNAME

Getting Started

Direct Launch

Use the following command to start the WebUI.

python web.py

Linux/MacOS

./run.sh

For I-card users who need to use IPEX technology (Linux only)

source /opt/intel/oneapi/setvars.sh
./run.sh

Using the Integration Package (Windows Users)

Download and unzip RVC-beta.7z. After unzipping, double-click go-web.bat to start the program with one click.

rvcmd packs/general/latest # RVC-Models-Downloader command

Credits

ContentVec
VITS
HIFIGAN
Gradio
Ultimate Vocal Remover
audio-slicer
Vocal pitch extraction:RMVPE
- The pretrained model is trained and tested by yxlllc and RVC-Boss.

8.8 KiB Raw Blame History