You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.


Potato Tuber Disease Captioning Model (BLIP)

This model is a fine-tuned version of Salesforce/blip-image-captioning-base trained to generate descriptive captions for potato tuber diseases.

The model analyzes an input image of a potato tuber and produces a natural language description of the visible disease symptoms.

Example output:

A cut-section of a potato tuber showing moderate dry rot within the internal tissue.

Dataset

Training was performed using the Sandesh-Lav/potato-tuber-caption-dataset.

Dataset characteristics:

Property Value
Total images ~1846
Train split 1661
Validation split 92
Test split 93

Each image is paired with a scientific description of potato tuber diseases, including:

  • bacterial soft rot
  • dry rot
  • greening disorder
  • internal tissue damage

Training

The model was fine-tuned using the Hugging Face Transformers Trainer.

Training configuration:

Parameter Value
Base model BLIP image captioning
Epochs 5
Batch size 4
Learning rate 5e-5
Optimizer AdamW

Training loss after fine-tuning:

Training loss ≈ 0.19

Evaluation

Evaluation was performed on the held-out test set.

Metrics used:

Metric Score
BLEU-4 0.74
ROUGE-L 0.95
METEOR 0.92
CIDEr 5.43

The high scores indicate that the generated captions closely match the reference disease descriptions.

Note: The dataset contains structured scientific captions, which may lead to higher similarity scores compared to open-domain caption datasets.


Usage

Load the model using Hugging Face Transformers:

from transformers import BlipProcessor, BlipForConditionalGeneration
from PIL import Image

processor = BlipProcessor.from_pretrained("Sandesh-Lav/potato-caption-blip")
model = BlipForConditionalGeneration.from_pretrained("Sandesh-Lav/potato-caption-blip")

image = Image.open("potato.jpg").convert("RGB")

inputs = processor(images=image, return_tensors="pt")

out = model.generate(**inputs, max_new_tokens=40)

caption = processor.decode(out[0], skip_special_tokens=True)

print(caption)

Example output:

A potato tuber showing severe bacterial soft rot with tissue degradation.

Applications

This model can be used for:

  • agricultural disease documentation
  • automated crop disease reporting
  • smart farming systems
  • agricultural AI research

Limitations

  • The dataset contains structured captions describing potato tuber diseases.
  • The model may struggle with images outside this domain.
  • Performance may decrease on different lighting conditions or potato varieties.

Citation

If you use this model, please cite the dataset and BLIP architecture.


Author

Developed by as part of an academic mini-project.


Downloads last month
43
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Sandesh-Lav/potato-caption-blip