Link model to paper and update citation

#10
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +31 -11
README.md CHANGED
@@ -1,15 +1,17 @@
1
  ---
2
- pipeline_tag: image-text-to-text
3
  language:
4
  - multilingual
 
 
 
 
5
  tags:
6
  - deepseek
7
  - vision-language
8
  - ocr
9
  - custom_code
10
- license: apache-2.0
11
- library_name: transformers
12
  ---
 
13
  <div align="center">
14
  <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek AI" />
15
  </div>
@@ -40,14 +42,19 @@ library_name: transformers
40
  <p align="center">
41
  <a href="https://github.com/deepseek-ai/DeepSeek-OCR-2"><b>🌟 Github</b></a> |
42
  <a href="https://huggingface.co/deepseek-ai/DeepSeek-OCR-2"><b>📥 Model Download</b></a> |
43
- <a href="https://github.com/deepseek-ai/DeepSeek-OCR-2/blob/main/DeepSeek_OCR2_paper.pdf"><b>📄 Paper Link</b></a> |
44
- <a href="https://github.com/deepseek-ai/DeepSeek-OCR-2/blob/main/DeepSeek_OCR2_paper.pdf"><b>📄 Arxiv Paper Link</b></a> |
45
  </p>
46
  <h2>
47
  <p align="center">
48
- <a href="">DeepSeek-OCR 2: Visual Causal Flow</a>
49
  </p>
50
  </h2>
 
 
 
 
 
 
51
  <p align="center">
52
  <img src="assets/fig1.png" style="width: 900px" align=center>
53
  </p>
@@ -80,8 +87,10 @@ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
80
  model = AutoModel.from_pretrained(model_name, _attn_implementation='flash_attention_2', trust_remote_code=True, use_safetensors=True)
81
  model = model.eval().cuda().to(torch.bfloat16)
82
 
83
- # prompt = "<image>\nFree OCR. "
84
- prompt = "<image>\n<|grounding|>Convert the document to markdown. "
 
 
85
  image_file = 'your_image.jpg'
86
  output_path = 'your/output/dir'
87
 
@@ -100,8 +109,10 @@ Refer to [🌟GitHub](https://github.com/deepseek-ai/DeepSeek-OCR-2/) for guidan
100
 
101
  ## Main Prompts
102
  ```python
103
- # document: <image>\n<|grounding|>Convert the document to markdown.
104
- # without layouts: <image>\nFree OCR.
 
 
105
  ```
106
 
107
 
@@ -115,4 +126,13 @@ We also appreciate the benchmark [OmniDocBench](https://github.com/opendatalab/O
115
  ## Citation
116
 
117
  ```bibtex
118
- coming soon~
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - multilingual
4
+ library_name: transformers
5
+ license: apache-2.0
6
+ pipeline_tag: image-text-to-text
7
+ arxiv: 2601.20552
8
  tags:
9
  - deepseek
10
  - vision-language
11
  - ocr
12
  - custom_code
 
 
13
  ---
14
+
15
  <div align="center">
16
  <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek AI" />
17
  </div>
 
42
  <p align="center">
43
  <a href="https://github.com/deepseek-ai/DeepSeek-OCR-2"><b>🌟 Github</b></a> |
44
  <a href="https://huggingface.co/deepseek-ai/DeepSeek-OCR-2"><b>📥 Model Download</b></a> |
45
+ <a href="https://huggingface.co/papers/2601.20552"><b>📄 Paper Link</b></a> |
 
46
  </p>
47
  <h2>
48
  <p align="center">
49
+ <a href="https://huggingface.co/papers/2601.20552">DeepSeek-OCR 2: Visual Causal Flow</a>
50
  </p>
51
  </h2>
52
+
53
+ DeepSeek-OCR 2 introduces **DeepEncoder V2**, a novel vision encoder capable of dynamically reordering visual tokens based on image semantics. Unlike conventional vision-language models (VLMs) that process visual tokens in a rigid raster-scan order, DeepEncoder V2 mimics human visual perception by employing a causally-informed sequential processing mechanism. This architecture enables the model to achieve genuine 2D reasoning through cascaded 1D causal reasoning structures.
54
+
55
+ - **Authors:** Haoran Wei, Yaofeng Sun, Yukun Li
56
+ - **Paper:** [DeepSeek-OCR 2: Visual Causal Flow](https://huggingface.co/papers/2601.20552)
57
+
58
  <p align="center">
59
  <img src="assets/fig1.png" style="width: 900px" align=center>
60
  </p>
 
87
  model = AutoModel.from_pretrained(model_name, _attn_implementation='flash_attention_2', trust_remote_code=True, use_safetensors=True)
88
  model = model.eval().cuda().to(torch.bfloat16)
89
 
90
+ # prompt = "<image>
91
+ Free OCR. "
92
+ prompt = "<image>
93
+ <|grounding|>Convert the document to markdown. "
94
  image_file = 'your_image.jpg'
95
  output_path = 'your/output/dir'
96
 
 
109
 
110
  ## Main Prompts
111
  ```python
112
+ # document: <image>
113
+ <|grounding|>Convert the document to markdown.
114
+ # without layouts: <image>
115
+ Free OCR.
116
  ```
117
 
118
 
 
126
  ## Citation
127
 
128
  ```bibtex
129
+ @misc{wei2025deepseekocr2,
130
+ title={DeepSeek-OCR 2: Visual Causal Flow},
131
+ author={Haoran Wei and Yaofeng Sun and Yukun Li},
132
+ year={2025},
133
+ eprint={2601.20552},
134
+ archivePrefix={arXiv},
135
+ primaryClass={cs.CV},
136
+ url={https://arxiv.org/abs/2601.20552}
137
+ }
138
+ ```