Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Takes too much VRAM to transcribe audios #13

Open
PS-AI opened this issue Nov 21, 2024 · 4 comments
Open

Takes too much VRAM to transcribe audios #13

PS-AI opened this issue Nov 21, 2024 · 4 comments

Comments

@PS-AI
Copy link

PS-AI commented Nov 21, 2024

Hi,

Thank you for your work.

I tested CrisperWhisper today with an audio of 2 minute duration on a NVIDAI A100 GPU. The model VRAM footprint is only 3.5GB which is great. However, when processing a 2 minute audio, I get a CUDA out of memory error as the GPU usage goes above 40 GB.

Is this something that will be fixed soon. If not, what would be the best solution to handle long audios.

@LaurinmyReha
Copy link
Contributor

How are you running this exactly. 40GB should definately be more than sufficient VRAM :)

Have you tried running the example code from the repo like this replacing the 'your_audio_path.mp3' with your actual audio?

`import os
import sys
import torch

from datasets import load_dataset
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
#from utils import adjust_pauses_for_hf_pipeline_output

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "nyrahealth/CrisperWhisper"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
chunk_length_s=30,
batch_size=16,
return_timestamps='word',
torch_dtype=torch_dtype,
device=device,
)

hf_pipeline_output = pipe('your_audio_path.mp3')
print(hf_pipeline_output)`

If this does not solve your issue please send me some code so i can reproduce the issue :)

@PS-AI
Copy link
Author

PS-AI commented Nov 21, 2024

Thank you for your response. I have given my code below.

I am running it in google Collab on a A100 GPU. I am using the same code that you sent after installing required libraries and logging into HuggingFace. I get a CUDA OOM error when transcribing a 2 minute audio.

!pip install torch torchaudio
!pip install transformers
!pip install accelerate
!huggingface-cli login

#From Laurin
import os
import sys
import torch

#from datasets import load_dataset
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
#from utils import adjust_pauses_for_hf_pipeline_output

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "nyrahealth/CrisperWhisper"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
chunk_length_s=30,
batch_size=16,
return_timestamps='word',
torch_dtype=torch_dtype,
device=device,
)
hf_pipeline_output = pipe('/content/2min.wav')
print(hf_pipeline_output)

@smoothdvd
Copy link

smoothdvd commented Nov 22, 2024

Same on A100 80G:

python transcribe.py --f audio.aac

An error occurred while transcribing the audio: CUDA out of memory. Tried to allocate 768.00 MiB. GPU 0 has a total capacity of 79.14 GiB of which 164.75 MiB is free. Including non-PyTorch memory, this process has 78.97 GiB memory in use. Of the allocated memory 73.51 GiB is allocated by PyTorch, and 4.96 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

@LaurinmyReha
Copy link
Contributor

okay, lowering the batch size ( for example 1 in the extreme case) to fit your GPU size and/or adjusting the beam size should resolve your issue. Could you try this out and let me know how it went?

number of beams you can adjust by using the generate_kwargs argument:
hf_pipeline_output = pipe('/content/2min.wav', generate_kwargs = {"num_beams": 1})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants