Dataset not caching

I have a Gradio app in a Space. Its first action is:

self.dataset = load_dataset(dataset_name)
self.dataset = self.dataset.cast_column("audio", Audio(sampling_rate=16000))

I’ve added persistent storage to my Space, but every time I restart the app with a git push, it spends several minutes reloading the dataset. It doesn’t seem to be caching it at all. What am I missing?

Thanks!

hi @danavery ,

In order to use the persistent storage to cache the dataset, you have to set the huggingface_hub cache folder to /data.

got to: https://huggingface.co/spaces/...../settings
Set Variable

HF_HOME to /data/.huggingface

Thank you!
I had wrongly assumed it would use the default cache path somehow, but it’s helpful to know that it doesn’t.