pszemraj/flan-subsets-deduped
Viewer • Updated • 12.2M • 65 • 3
How to use BEE-spoke-data/tFINE-680m-e32-d16-gqa-flan with Transformers:
# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("BEE-spoke-data/tFINE-680m-e32-d16-gqa-flan")
model = AutoModelForSeq2SeqLM.from_pretrained("BEE-spoke-data/tFINE-680m-e32-d16-gqa-flan")FLAN-tuned variant of a tFINE (t5) model with GQA.
install transformers fork with GQA updates for t5 (⚠️WIP🚧):
pip install -U git+https://github.com/pszemraj/transformers.git@t5-gqa
then
# pip install -U git+https://github.com/pszemraj/transformers.git@t5-gqa
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("BEE-spoke-data/tFINE-680m-e32-d16-gqa-flan")
model = AutoModelForSeq2SeqLM.from_pretrained(
"BEE-spoke-data/tFINE-680m-e32-d16-gqa-flan"
)
prompt = "What is the capital of France?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
generated_ids = model.generate(**inputs, max_new_tokens=64, no_repeat_ngram_size=3)
print(
tokenizer.batch_decode(
generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True
)[0]
)
Quick eval for: BEE-spoke-data/tFINE-680m-e32-d16-gqa-flan
hf (pretrained=BEE-spoke-data/tFINE-680m-e32-d16-gqa-flan,trust_remote_code=True,dtype=bfloat16,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| boolq | 2 | none | 0 | acc | ↑ | 0.7040 | ± | 0.0080 |
| openbookqa | 1 | none | 0 | acc | ↑ | 0.1580 | ± | 0.0163 |
| none | 0 | acc_norm | ↑ | 0.2420 | ± | 0.0192 | ||
| piqa | 1 | none | 0 | acc | ↑ | 0.6132 | ± | 0.0114 |
| none | 0 | acc_norm | ↑ | 0.6159 | ± | 0.0113 | ||
| social_iqa | 0 | none | 0 | acc | ↑ | 0.4319 | ± | 0.0112 |
| tinyArc | 0 | none | 25 | acc_norm | ↑ | 0.2898 | ± | N/A |
| tinyHellaswag | 0 | none | 10 | acc_norm | ↑ | 0.3295 | ± | N/A |
| tinyMMLU | 0 | none | 0 | acc_norm | ↑ | 0.2980 | ± | N/A |
| winogrande | 1 | none | 0 | acc | ↑ | 0.5020 | ± | 0.0141 |
used config 'all'
The following hyperparameters were used during training:
Base model
BEE-spoke-data/tFINE-680m-e32-d16-gqa-1024