I was running LLM in Google Colaboratory and got the following error.
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map='auto')
ValueError: The current
device_maphad weights offloaded to the disk. Please provide anoffload_folderfor them. Alternatively, make sure you havesafetensorsinstalled if the model you are using offers the weights in this format.
It says "Please provide an offload_folder" (please specify an offload_folder), but there is a trap that this is not appropriate in the case of "It worked fine earlier, but it stopped working after some trial-and-error.
This is a situation where the GPU and memory are insufficient, so the disk is used to force the system to run, even though it is slow. This is the state of "I don't have enough GPU and memory.
The current
device_maphad weights offloaded to the disk.
Silent offloading to memory is occurring before this error occurs.
isCounter({0: 28, 'cpu': 15})when it should beCounter({'cpu': 41, 0: 2})`. This is hard to notice.hf_device_map.model.generate may fail with OutOfMemoryError: CUDA out of memory. or model.generate may fail with OutOfMemoryError: CUDA out of memory.. in model.generate.Why does this happen when Python has GC?
In the case of model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map='auto')
So the setting "auto-determines resource allocation by looking at the resource status" at the time of object creation on the right side determines offloading to disk by looking at the current memory consumption before the old one is released by the GC.
In addition, it seems that even if GC is done on the Python side, the cache may remain on the CUDA side, so it is easiest to restart the runtime since it is a pain in the ass.
Trying this now. py
model = None
tokenizer = None
torch.cuda.empty_cache()
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map='auto')
This page is auto-translated from [/nishio/Please provide an offload_folder](https://scrapbox.io/nishio/Please provide an offload_folder) using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.