VideoMAE_BdSLW401_20_epochs_p5_SR_10
This model is a fine-tuned version of MCG-NJU/videomae-base-finetuned-kinetics on an unknown dataset. It achieves the following results on the evaluation set: (Validation Result)
- Loss: 0.0473
- Accuracy: 0.9920
- Precision: 0.9928
- Recall: 0.9920
- F1: 0.9920
Model description
This model can recognize 401 mostly used Bangla Sign Gloss used in this paper (https://arxiv.org/abs/2503.02360v1)
Intended uses & limitations
Use this model for fine-tuning or cross-sign language word fine-tuning purposes.
Cite: https://arxiv.org/abs/2506.04367v1
@article{shawon2025fine, title={Fine-Tuning Video Transformers for Word-Level Bangla Sign Language: A Comparative Analysis for Classification Tasks}, author={Shawon, Jubayer Ahmed Bhuiyan and Mahmud, Hasan and Hasan, Kamrul}, journal={arXiv preprint arXiv:2506.04367}, year={2025} }
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 8
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- training_steps: 97180
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|---|---|---|
| 10.3074 | 0.05 | 4859 | 2.3824 | 0.6847 | 0.7288 | 0.6847 | 0.6538 |
| 1.7678 | 1.0500 | 9719 | 0.4062 | 0.9052 | 0.9190 | 0.9052 | 0.9014 |
| 0.63 | 2.05 | 14578 | 0.1821 | 0.9506 | 0.9603 | 0.9506 | 0.9492 |
| 0.5045 | 3.0500 | 19438 | 0.1665 | 0.9544 | 0.9614 | 0.9544 | 0.9537 |
| 0.391 | 4.05 | 24297 | 0.1415 | 0.9647 | 0.9704 | 0.9647 | 0.9639 |
| 0.3131 | 5.0500 | 29157 | 0.1286 | 0.9713 | 0.9758 | 0.9713 | 0.9704 |
| 0.2343 | 6.05 | 34016 | 0.1306 | 0.9745 | 0.9789 | 0.9745 | 0.9744 |
| 0.1352 | 7.0500 | 38876 | 0.0948 | 0.9772 | 0.9804 | 0.9772 | 0.9772 |
| 0.1432 | 8.05 | 43735 | 0.1018 | 0.9774 | 0.9806 | 0.9774 | 0.9774 |
| 0.0935 | 9.0500 | 48595 | 0.1065 | 0.9779 | 0.9801 | 0.9779 | 0.9777 |
| 0.0278 | 10.05 | 53454 | 0.0846 | 0.9850 | 0.9869 | 0.9850 | 0.9849 |
| 0.1197 | 11.0500 | 58314 | 0.1027 | 0.9804 | 0.9833 | 0.9804 | 0.9803 |
| 0.0607 | 12.05 | 63173 | 0.0727 | 0.9868 | 0.9881 | 0.9868 | 0.9868 |
| 0.0004 | 13.0500 | 68033 | 0.0760 | 0.9856 | 0.9872 | 0.9856 | 0.9856 |
| 0.0155 | 14.05 | 72892 | 0.0709 | 0.9886 | 0.9898 | 0.9886 | 0.9886 |
| 0.0043 | 15.0500 | 77752 | 0.0628 | 0.9888 | 0.9899 | 0.9888 | 0.9888 |
| 0.0 | 16.05 | 82611 | 0.0685 | 0.9875 | 0.9889 | 0.9875 | 0.9874 |
| 0.0002 | 17.0500 | 87471 | 0.0537 | 0.9904 | 0.9914 | 0.9904 | 0.9904 |
| 0.0013 | 18.05 | 92330 | 0.0481 | 0.9920 | 0.9929 | 0.9920 | 0.9920 |
| 0.0 | 19.0499 | 97180 | 0.0473 | 0.9920 | 0.9928 | 0.9920 | 0.9920 |
Framework versions
- Transformers 4.46.1
- Pytorch 2.5.1+cu124
- Datasets 3.1.0
- Tokenizers 0.20.1
- Downloads last month
- 6
Model tree for Shawon16/VideoMAE_BdSLW401_20_epochs_p5_SR_10
Base model
MCG-NJU/videomae-base-finetuned-kinetics