So I have been following this project with anticipation, and finally decided to give it a go.
$ ./main -m models/bark_v0/
bark_model_load: loading model from 'models/bark_v0/'
bark_model_load: reading bark text model
gpt_model_load: n_in_vocab = 129600
gpt_model_load: n_out_vocab = 10048
gpt_model_load: block_size = 1024
gpt_model_load: n_embd = 1024
gpt_model_load: n_head = 16
gpt_model_load: n_layer = 24
gpt_model_load: n_lm_heads = 1
gpt_model_load: n_wtes = 1
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1894.87 MB
gpt_model_load: memory size = 192.00 MB, n_mem = 24576
gpt_model_load: model size = 1701.69 MB
bark_model_load: reading bark vocab
bark_model_load: reading bark coarse model
gpt_model_load: n_in_vocab = 12096
gpt_model_load: n_out_vocab = 12096
gpt_model_load: block_size = 1024
gpt_model_load: n_embd = 1024
gpt_model_load: n_head = 16
gpt_model_load: n_layer = 24
gpt_model_load: n_lm_heads = 1
gpt_model_load: n_wtes = 1
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1443.87 MB
gpt_model_load: memory size = 192.00 MB, n_mem = 24576
gpt_model_load: model size = 1250.69 MB
bark_model_load: reading bark fine model
gpt_model_load: n_in_vocab = 1056
gpt_model_load: n_out_vocab = 1056
gpt_model_load: block_size = 1024
gpt_model_load: n_embd = 1024
gpt_model_load: n_head = 16
gpt_model_load: n_layer = 24
gpt_model_load: n_lm_heads = 7
gpt_model_load: n_wtes = 8
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1411.25 MB
gpt_model_load: memory size = 192.00 MB, n_mem = 24576
gpt_model_load: model size = 1218.26 MB
bark_model_load: reading bark codec model
encodec_model_load: model size = 44.32 MB
bark_model_load: total model size = 4170.64 MB
bark_generate_audio: prompt: 'this is an audio'
bark_generate_audio: number of tokens in prompt = 513, first 8 tokens: 20579 20172 20199 33733 129595 129595 129595 129595
bark_forward_text_encoder: ...........................................................................................................
bark_forward_text_encoder: mem per token = 4.80 MB
bark_forward_text_encoder: sample time = 17.30 ms
bark_forward_text_encoder: predict time = 6746.21 ms / 18.48 ms per token
bark_forward_text_encoder: total time = 6825.61 ms
bark_forward_coarse_encoder: ...................................................................................................................................................................................................................................................................................................................................
bark_forward_coarse_encoder: mem per token = 8.51 MB
bark_forward_coarse_encoder: sample time = 4.79 ms
bark_forward_coarse_encoder: predict time = 30730.57 ms / 94.85 ms per token
bark_forward_coarse_encoder: total time = 30784.73 ms
fine_gpt_eval: failed to allocate 50200313856 bytes
bark_forward_fine_encoder: ggml_aligned_malloc: insufficient memory (attempted to allocate 47874.75 MB)
GGML_ASSERT: ggml.c:4408: ctx->mem_buffer != NULL
Aborted (core dumped)
So I have been following this project with anticipation, and finally decided to give it a go.