Takes too much VRAM to transcribe audios #13

PS-AI · 2024-11-21T16:42:39Z

Hi,

Thank you for your work.

I tested CrisperWhisper today with an audio of 2 minute duration on a NVIDAI A100 GPU. The model VRAM footprint is only 3.5GB which is great. However, when processing a 2 minute audio, I get a CUDA out of memory error as the GPU usage goes above 40 GB.

Is this something that will be fixed soon. If not, what would be the best solution to handle long audios.

LaurinmyReha · 2024-11-21T16:49:46Z

How are you running this exactly. 40GB should definately be more than sufficient VRAM :)

Have you tried running the example code from the repo like this replacing the 'your_audio_path.mp3' with your actual audio?

`import os
import sys
import torch

from datasets import load_dataset
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
#from utils import adjust_pauses_for_hf_pipeline_output

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "nyrahealth/CrisperWhisper"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
chunk_length_s=30,
batch_size=16,
return_timestamps='word',
torch_dtype=torch_dtype,
device=device,
)

hf_pipeline_output = pipe('your_audio_path.mp3')
print(hf_pipeline_output)`

If this does not solve your issue please send me some code so i can reproduce the issue :)

PS-AI · 2024-11-21T17:30:28Z

Thank you for your response. I have given my code below.

I am running it in google Collab on a A100 GPU. I am using the same code that you sent after installing required libraries and logging into HuggingFace. I get a CUDA OOM error when transcribing a 2 minute audio.

!pip install torch torchaudio
!pip install transformers
!pip install accelerate
!huggingface-cli login

#From Laurin
import os
import sys
import torch

#from datasets import load_dataset
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
#from utils import adjust_pauses_for_hf_pipeline_output

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "nyrahealth/CrisperWhisper"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
chunk_length_s=30,
batch_size=16,
return_timestamps='word',
torch_dtype=torch_dtype,
device=device,
)
hf_pipeline_output = pipe('/content/2min.wav')
print(hf_pipeline_output)

smoothdvd · 2024-11-22T07:09:41Z

Same on A100 80G:

python transcribe.py --f audio.aac

An error occurred while transcribing the audio: CUDA out of memory. Tried to allocate 768.00 MiB. GPU 0 has a total capacity of 79.14 GiB of which 164.75 MiB is free. Including non-PyTorch memory, this process has 78.97 GiB memory in use. Of the allocated memory 73.51 GiB is allocated by PyTorch, and 4.96 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

LaurinmyReha · 2024-11-22T13:20:40Z

okay, lowering the batch size ( for example 1 in the extreme case) to fit your GPU size and/or adjusting the beam size should resolve your issue. Could you try this out and let me know how it went?

number of beams you can adjust by using the generate_kwargs argument:
hf_pipeline_output = pipe('/content/2min.wav', generate_kwargs = {"num_beams": 1})

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Takes too much VRAM to transcribe audios #13

Takes too much VRAM to transcribe audios #13

PS-AI commented Nov 21, 2024

LaurinmyReha commented Nov 21, 2024

PS-AI commented Nov 21, 2024

smoothdvd commented Nov 22, 2024 •

edited

Loading

LaurinmyReha commented Nov 22, 2024

Takes too much VRAM to transcribe audios #13

Takes too much VRAM to transcribe audios #13

Comments

PS-AI commented Nov 21, 2024

LaurinmyReha commented Nov 21, 2024

PS-AI commented Nov 21, 2024

smoothdvd commented Nov 22, 2024 • edited Loading

LaurinmyReha commented Nov 22, 2024

smoothdvd commented Nov 22, 2024 •

edited

Loading