26f0520163fd96ad81335a1b4cc1bc8b

This model is a fine-tuned version of albert/albert-large-v2 on the nyu-mll/glue dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Accuracy	F1 Macro	Rouge1	Rougel	Rougelsum
No log	0	0	0.6805	0	0.6236	0.5469	0.3828	0.5469	0.5469	0.5469
No log	1	19	0.8507	0.0078	1.2106	0.4375	0.3043	0.4375	0.4375	0.4375
No log	2	38	0.7282	0.0156	0.8184	0.4375	0.3043	0.4375	0.4375	0.4375
No log	3	57	0.6877	0.0312	0.7378	0.5625	0.36	0.5625	0.5625	0.5625
No log	4	76	0.7622	0.0625	0.7900	0.4375	0.3043	0.4375	0.4375	0.4375
No log	5	95	0.7397	0.125	0.8675	0.4219	0.3361	0.4219	0.4219	0.4219
0.0838	6	114	0.7155	0.25	1.0477	0.4375	0.3621	0.4375	0.4375	0.4375
0.0838	7	133	0.7113	0.5	1.3004	0.4531	0.3871	0.4531	0.4531	0.4531

Safetensors

Model size

17.7M params

Tensor type

F32

Base model

Finetuned

(25)

this model