VideoMAE_BdSLW401_20_epochs_p5_SR_10

This model is a fine-tuned version of MCG-NJU/videomae-base-finetuned-kinetics on an unknown dataset. It achieves the following results on the evaluation set: (Validation Result)

Loss: 0.0473
Accuracy: 0.9920
Precision: 0.9928
Recall: 0.9920
F1: 0.9920

Model description

This model can recognize 401 mostly used Bangla Sign Gloss used in this paper (https://arxiv.org/abs/2503.02360v1)

Intended uses & limitations

Use this model for fine-tuning or cross-sign language word fine-tuning purposes.

Cite: https://arxiv.org/abs/2506.04367v1

@article{shawon2025fine, title={Fine-Tuning Video Transformers for Word-Level Bangla Sign Language: A Comparative Analysis for Classification Tasks}, author={Shawon, Jubayer Ahmed Bhuiyan and Mahmud, Hasan and Hasan, Kamrul}, journal={arXiv preprint arXiv:2506.04367}, year={2025} }

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
training_steps: 97180
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Precision	Recall	F1
10.3074	0.05	4859	2.3824	0.6847	0.7288	0.6847	0.6538
1.7678	1.0500	9719	0.4062	0.9052	0.9190	0.9052	0.9014
0.63	2.05	14578	0.1821	0.9506	0.9603	0.9506	0.9492
0.5045	3.0500	19438	0.1665	0.9544	0.9614	0.9544	0.9537
0.391	4.05	24297	0.1415	0.9647	0.9704	0.9647	0.9639
0.3131	5.0500	29157	0.1286	0.9713	0.9758	0.9713	0.9704
0.2343	6.05	34016	0.1306	0.9745	0.9789	0.9745	0.9744
0.1352	7.0500	38876	0.0948	0.9772	0.9804	0.9772	0.9772
0.1432	8.05	43735	0.1018	0.9774	0.9806	0.9774	0.9774
0.0935	9.0500	48595	0.1065	0.9779	0.9801	0.9779	0.9777
0.0278	10.05	53454	0.0846	0.9850	0.9869	0.9850	0.9849
0.1197	11.0500	58314	0.1027	0.9804	0.9833	0.9804	0.9803
0.0607	12.05	63173	0.0727	0.9868	0.9881	0.9868	0.9868
0.0004	13.0500	68033	0.0760	0.9856	0.9872	0.9856	0.9856
0.0155	14.05	72892	0.0709	0.9886	0.9898	0.9886	0.9886
0.0043	15.0500	77752	0.0628	0.9888	0.9899	0.9888	0.9888
0.0	16.05	82611	0.0685	0.9875	0.9889	0.9875	0.9874
0.0002	17.0500	87471	0.0537	0.9904	0.9914	0.9904	0.9904
0.0013	18.05	92330	0.0481	0.9920	0.9929	0.9920	0.9920
0.0	19.0499	97180	0.0473	0.9920	0.9928	0.9920	0.9920

Framework versions

Transformers 4.46.1
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.20.1

Downloads last month: 6

Safetensors

Model size

86.5M params

Tensor type

F32

Inference Providers NEW

Video Classification

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Shawon16/VideoMAE_BdSLW401_20_epochs_p5_SR_10

Base model

MCG-NJU/videomae-base-finetuned-kinetics

Finetuned

(276)

this model

Shawon16
/

VideoMAE_BdSLW401_20_epochs_p5_SR_10