deepseek-ai
/

DeepSeek-OCR-2

@@ -1,15 +1,17 @@
 ---
-pipeline_tag: image-text-to-text
 language:
 - multilingual
 tags:
 - deepseek
 - vision-language
 - ocr
 - custom_code
-license: apache-2.0
-library_name: transformers
 ---
 <div align="center">
   <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek AI" />
 </div>
@@ -40,14 +42,19 @@ library_name: transformers
 <p align="center">
   <a href="https://github.com/deepseek-ai/DeepSeek-OCR-2"><b>🌟 Github</b></a> |
   <a href="https://huggingface.co/deepseek-ai/DeepSeek-OCR-2"><b>📥 Model Download</b></a> |
-  <a href="https://github.com/deepseek-ai/DeepSeek-OCR-2/blob/main/DeepSeek_OCR2_paper.pdf"><b>📄 Paper Link</b></a> |
-  <a href="https://github.com/deepseek-ai/DeepSeek-OCR-2/blob/main/DeepSeek_OCR2_paper.pdf"><b>📄 Arxiv Paper Link</b></a> |
 </p>
 <h2>
 <p align="center">
-  <a href="">DeepSeek-OCR 2: Visual Causal Flow</a>
 </p>
 </h2>
 <p align="center">
 <img src="assets/fig1.png" style="width: 900px" align=center>
 </p>
@@ -80,8 +87,10 @@ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
 model = AutoModel.from_pretrained(model_name, _attn_implementation='flash_attention_2', trust_remote_code=True, use_safetensors=True)
 model = model.eval().cuda().to(torch.bfloat16)
-# prompt = "<image>\nFree OCR. "
-prompt = "<image>\n<|grounding|>Convert the document to markdown. "
 image_file = 'your_image.jpg'
 output_path = 'your/output/dir'
@@ -100,8 +109,10 @@ Refer to [🌟GitHub](https://github.com/deepseek-ai/DeepSeek-OCR-2/) for guidan
 ## Main Prompts
 ```python
-# document: <image>\n<|grounding|>Convert the document to markdown.
-# without layouts: <image>\nFree OCR.
 ```
@@ -115,4 +126,13 @@ We also appreciate the benchmark [OmniDocBench](https://github.com/opendatalab/O
 ## Citation
 ```bibtex
-coming soon~

 ---
 language:
 - multilingual
+library_name: transformers
+license: apache-2.0
+pipeline_tag: image-text-to-text
+arxiv: 2601.20552
 tags:
 - deepseek
 - vision-language
 - ocr
 - custom_code
 ---
 <div align="center">
   <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek AI" />
 </div>
 <p align="center">
   <a href="https://github.com/deepseek-ai/DeepSeek-OCR-2"><b>🌟 Github</b></a> |
   <a href="https://huggingface.co/deepseek-ai/DeepSeek-OCR-2"><b>📥 Model Download</b></a> |
+  <a href="https://huggingface.co/papers/2601.20552"><b>📄 Paper Link</b></a> |
 </p>
 <h2>
 <p align="center">
+  <a href="https://huggingface.co/papers/2601.20552">DeepSeek-OCR 2: Visual Causal Flow</a>
 </p>
 </h2>
+DeepSeek-OCR 2 introduces **DeepEncoder V2**, a novel vision encoder capable of dynamically reordering visual tokens based on image semantics. Unlike conventional vision-language models (VLMs) that process visual tokens in a rigid raster-scan order, DeepEncoder V2 mimics human visual perception by employing a causally-informed sequential processing mechanism. This architecture enables the model to achieve genuine 2D reasoning through cascaded 1D causal reasoning structures.
+- **Authors:** Haoran Wei, Yaofeng Sun, Yukun Li
+- **Paper:** [DeepSeek-OCR 2: Visual Causal Flow](https://huggingface.co/papers/2601.20552)
 <p align="center">
 <img src="assets/fig1.png" style="width: 900px" align=center>
 </p>
 model = AutoModel.from_pretrained(model_name, _attn_implementation='flash_attention_2', trust_remote_code=True, use_safetensors=True)
 model = model.eval().cuda().to(torch.bfloat16)
+# prompt = "<image>
+Free OCR. "
+prompt = "<image>
+<|grounding|>Convert the document to markdown. "
 image_file = 'your_image.jpg'
 output_path = 'your/output/dir'
 ## Main Prompts
 ```python
+# document: <image>
+<|grounding|>Convert the document to markdown.
+# without layouts: <image>
+Free OCR.
 ```
 ## Citation
 ```bibtex
+@misc{wei2025deepseekocr2,
+      title={DeepSeek-OCR 2: Visual Causal Flow},
+      author={Haoran Wei and Yaofeng Sun and Yukun Li},
+      year={2025},
+      eprint={2601.20552},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2601.20552}
+}
+```