Hi @nielsr,
Thanks in advance for implementing this model in the HuggingFace library ![]()
I annotated several Images using Label Studio ML Backend Tesseract: label-studio-ml-backend/label_studio_ml/examples/tesseract at master · heartexlabs/label-studio-ml-backend · GitHub

With this tool you draw the box with the selected label and it extracts the text for you. You can see this in the above gif.
After that I exported the annotations and created a dataset using the bbox format expected by the model, I saw this here
Finally, I trained the model for Token Classification.
However, the model is not working well at inference time. At inference time I set the processor to apply OCR:
processor = AutoProcessor.from_pretrained("microsoft/layoutlmv3-base", apply_ocr=True)
And I just pass an image:
encoding = processor(image, truncation=True, return_tensors="pt")
The model doesn´t classify the tokens well. However, If i pass the bboxes and text from my annotations it works properly.
How is this model supposed to be used for inference? Do you need to pass the hand-drawn bboxes and text?
I want to use this model to extract information automatically and if I have to pass these annotations manually it makes no sense.
Maybe I did something wrong at labelling? Should I run the image through tesseract and then label all the bboxes it returns instead of drawing them by hand?