💜 Github   |   🤗 Hugging Face   |   📚 Cookbooks  
🖥️ Demo  

# 🏆 sherif1313/Arabic-GLM-OCR-v2

A powerful Arabic OCR model (proficient learner)

📌 Overview

This model is an advanced Arabic OCR system designed to combine deep linguistic understanding with high accuracy in visual text extraction.

The model was trained using a unique strategy focused on:

Reducing the model's active capacity during training Maintaining the stability of visual features Promoting genuine language understanding rather than rote memorization

🚀 Key Features

🔹 Model size: Approximately 2 GB 🔹 Performance: Outperforms much larger models in most tasks 🔹 Type: Robust learning model (requires fine-tuning for inference)

✅ Deep understanding of Arabic language context ✅ Intelligent spelling correction ✅ High visual accuracy in text extraction ✅ Noise reduction ✅ Highly stable training behavior ✅ Strong generalization on non-visual data 🧪 Evaluation Results Metric Value Evaluation loss 0.1041 Training-evaluation gap 0% - 2.5% Excellent stability

📌 This indicates near-perfect training equilibrium with minimal overshoot.

🧠 Training Philosophy

  1. Reduce Training Capacity

The model was trained using only half its capacity in order to:

Preserve visual representations Prevent image deterioration Improve overall stability 2. From "Memorizing Shapes" to "Learning Rules"

Instead of:

Memorizing word shapes

The model now learns:

Grammar rules and image-text relationships

  1. Controlling Inference

The training included:

Reducing excessive inference Limiting the linking of complex ideas Reverting processed information to its original size before output

🎯 Objective:

Forcing the model to accurately copy text instead of paraphrasing it

  1. Multilevel Reasoning Capability

The model was given internal inference capabilities during:

Reading the page Analyzing the text Generating output

This leads to:

Better understanding of invisible data Stronger real-world performance ⚙️ Inference Settings (Very Important)

⚠️ This is a powerful learner ← Requires precise control during inference

🎯 Use Cases 📄 OCR for Arabic books 📰 Text extraction from images 📚 Manuscript digitization 🧾 Document processing 🔍 Text enhancement after OCR ⚠️ Important Notes The model may attempt autocorrect if not properly constrained. To accurately copy text, use directives such as: Extract the text exactly as it is, without correction or paraphrasing.

📦 Why is the model small?

Despite its small size (approximately 2 GB), its outstanding performance is due to:

Effective training methodology Minimized cognitive noise Focus on patterns Significant Highly Efficient Representation Learning 🏁 Conclusion

This model achieves a rare balance between:

Visual Accuracy 👁️ Language Comprehension 🧠 Training Stability ⚖️

💡 It can be considered a sophisticated model for Arabic OCR, competing with larger systems.

License Model Size Python
Apache-2.0 2.2GB 3.12

⚠️ Important Notes

In some cases, the model may attempt to correct the text if it is not properly configured. For exact copying: Use a clear prompt such as: "Extract the text as is, without modification"

❌ Do not use high temperature settings → will cause hallucinations. ✅ Use "Restricted" settings for optimal accuracy. ✅ Best suited for OCR tasks, not creative writing. Send feedback Press tab for actions

Recommended Settings It includes:

with torch.no_grad():

generated_ids = model.generate( **inputs, max_new_tokens=512, # Keep repeating the loop do_sample=True, temperature=0.4, top_p=0.9, repetition_penalty=1.1

🖼️ Visualizations

🛠️ How to use it

git clone https://github.com/zai-org/glm-ocr.git cd glm-ocr uv venv --python 3.12 --seed && source .venv/bin/activate uv pip install -e .

from transformers import AutoProcessor, AutoModelForImageTextToText
import torch

MODEL_PATH = "sherif1313/Arabic-GLM-OCR-v2"
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "url": "test_image.png"
            },
            {
                "type": "text",
                "text": "Text Recognition:"
            }
        ],
    }
]
processor = AutoProcessor.from_pretrained(MODEL_PATH)
model = AutoModelForImageTextToText.from_pretrained(
    pretrained_model_name_or_path=MODEL_PATH,
    torch_dtype="auto",
    device_map="auto",
)
inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt"
).to(model.device)
inputs.pop("token_type_ids", None)
generated_ids = model.generate(**inputs, max_new_tokens=2018)
output_text = processor.decode(generated_ids[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False)
print(output_text)

🛠️ How to use it web


import gradio as gr
from transformers import AutoProcessor, AutoModelForImageTextToText
import torch
from PIL import Image
import re  

# --- KONFIGURASI MODEL ---
MODEL_PATH = "sherif1313/Arabic-GLM-OCR-v2"

# Deteksi perangkat secara otomatis
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if device == "cuda" else torch.float32
print(f"🚀 Mesin OCR dimulai: Device={device} | Dtype={dtype}")

# --- INISIALISASI MODEL (dengan pengecekan error) ---
try:
    print("⏳ Memuat processor...")
    processor = AutoProcessor.from_pretrained(MODEL_PATH, trust_remote_code=True)

    print("⏳ Memuat model (mungkin butuh waktu beberapa menit)...")
    model = AutoModelForImageTextToText.from_pretrained(
        MODEL_PATH,
        dtype=dtype,
        trust_remote_code=True,
        low_cpu_mem_usage=True,
        device_map="auto"
    )
    model.eval()
    print("✅ Model siap digunakan!")
except Exception as e:
    print(f"❌ Gagal memuat model: {e}")
    raise  # Hentikan eksekusi jika model gagal dimuat

# --- DAFTAR GAMBAR CONTOH (pastikan file-file ini ada di folder yang sama dengan skrip) ---
EXAMPLE_IMAGES = [
    
]

# --- FUNGSI OCR ---
import re  # تأكد من وجود هذا في أعلى الملف

def proses_intelijen(image):
    if image is None:
        return "⚠️ Silakan unggah gambar terlebih dahulu."

    messages = [
        {
            "role": "user",
            "content": [
                {"type": "image", "image": image},
                {"type": "text", "text": "Text Recognition:"}
            ],
        }
    ]

    try:
        # --- معالجة الصورة وتوليد النص (كما هو في كودك الأصلي) ---
        inputs = processor.apply_chat_template(
            messages,
            add_generation_prompt=True,
            tokenize=True,
            return_dict=True,
            return_tensors="pt"
        ).to(model.device)

        with torch.no_grad():
            generated_ids = model.generate(
                **inputs,
                max_new_tokens=512,
                do_sample=False
            )

        hasil = generated_ids[0][len(inputs["input_ids"][0]):]
        teks_final = processor.decode(hasil, skip_special_tokens=True)

        # ----------------------------------------------------------------
        # --- منطق التنظيف المتقدم (إزالة التكرار و HTML والنقاط) ---
        # ----------------------------------------------------------------

        # 1. حذف وسوم HTML القبيحة (مثل <html>, <td>, etc.)
        teks_final = re.sub(r'<[^>]+>', '', teks_final)

        # 2. حذف التكرار المتتالي للجمل (مهم جداً في حالتك)
        # هذا السطر يبحث عن أي جملة أو مجموعة كلمات تظهر مرتين أو أكثر متتاليتين
        # ويستبدلها بمظهر واحد فقط.
        # (.{10,}?) يعني: التقط نصاً طوله 10 أحرف فأكثر (لتجنب تكرار حروف قصيرة)
        # (\s+\1)+ يعني: متبوعاً بمسافات ونفس النص السابق مكرراً
        teks_final = re.sub(r'(\b.{10,}?)(\s+\1)+', r'\1', teks_final)



        # ----------------------------------------------------------------

        return teks_final

    except Exception as e:
        return f"🚨 Terjadi kesalahan: {str(e)}"

# --- ANTARMUKA GRADIO ---
css_custom = """
.container { max-width: 1200px; margin: auto; padding-top: 20px; }
h1 { text-align: center; color: #3b82f6; }
"""

with gr.Blocks(css=css_custom, title="Arabic GLM-OCR") as app:
    with gr.Column(elem_classes="container"):
        gr.Markdown("# Arabic GLM-OCR")
        gr.Markdown("Arabic OCR powered by GLM-OCR.")

        with gr.Row():
            with gr.Column(scale=1):
                input_img = gr.Image(type="pil", label="Upload Gambar", height=450)
                scan_btn = gr.Button("🚀 MULAI SCAN", variant="primary", size="lg")

            with gr.Column(scale=1):
                output_txt = gr.Textbox(label="Hasil Teks", lines=24)

        # Tambahkan contoh gambar yang bisa diklik
        gr.Examples(
            examples=EXAMPLE_IMAGES,
            inputs=input_img,
            outputs=output_txt,
            fn=proses_intelijen,
            cache_examples=False,  # Set ke True jika ingin mempercepat (butuh disk space)
            label="Contoh Gambar (klik untuk memuat)"
        )

    # Hubungkan tombol dengan fungsi
    scan_btn.click(fn=proses_intelijen, inputs=input_img, outputs=output_txt)

if __name__ == "__main__":
    app.launch()    demo.queue().launch(theme=gr.themes.Soft(), allowed_paths=["examples"])
Downloads last month
4
Safetensors
Model size
1B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Makadi86/Arabic-GLM-OCR-v2-backup

Base model

zai-org/GLM-OCR
Finetuned
(25)
this model