Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
In a Training Loop 🔄
1782
331
159
Stefan Schweter
PRO
stefan-it
Follow
junwen-liu's profile picture
NickyNicky's profile picture
qqee1133's profile picture
3,708 followers
·
391 following
https://schweter.bayern
stefan-it
stefan-it
AI & ML interests
Flair Library 💕, NER & PoS Tagging, LM Pretraining (mostly encoder-only & encoder-decoder), Historical Language Models, German Language Models, Bavarian NLP 🥨
Recent Activity
upvoted
a
collection
1 day ago
🤏 Smol-Data
reacted
to
hannayukhymenko
's
post
with 🔥
1 day ago
Do you translate your benchmarks from English correctly? 🤔 Turns out, for many languages it is much harder than you can imagine! Introducing Recovered in Translation 🌍 together with @aalexandrov ritranslation.insait.ai Translating benchmarks is a painful process, requiring a lot of manual inspection and adjustments. You start from setting up the whole pipeline and adapting to every format type, including task specifics. There already exist some massive benchmarks, but they still have some simple (and sometimes silly) bugs, which can hurt the evaluations :( We present a novel automated translation framework to help with that! Eastern and Southern European languages introduce richer linguistic structures compared to English and for benchmarks which heavily rely on grammatical coherence machine translation presents a risk of harming evaluations. We discover potential answer leakage or misleading through grammatical structure of the questions. Some benchmarks are also just outdated and need to be retranslated with newer and better models. We present a framework with novel test-time scaling methods which allow to control time and cost investments, while at the same time mitigate the need for human-in-the-loop verification. While working on Ukrainian-focused MamayLM models, we had to translate 10+ benchmarks in a short span of time. Finding human evaluators is costly and time-consuming, same goes for using professional translators. With our pipeline we were able to do it in 3 days🏎️ We hope our findings will help enable stronger multilingual evaluations and developments. We release all produced benchmarks on Hugging Face together with the source code and Arxiv paper 🤗 Paper: https://huggingface.co/papers/2602.22207 Code: https://github.com/insait-institute/ritranslation Benchmarks: https://huggingface.co/collections/INSAIT-Institute/multilingual-benchmarks
reacted
to
hannayukhymenko
's
post
with ❤️
1 day ago
Do you translate your benchmarks from English correctly? 🤔 Turns out, for many languages it is much harder than you can imagine! Introducing Recovered in Translation 🌍 together with @aalexandrov ritranslation.insait.ai Translating benchmarks is a painful process, requiring a lot of manual inspection and adjustments. You start from setting up the whole pipeline and adapting to every format type, including task specifics. There already exist some massive benchmarks, but they still have some simple (and sometimes silly) bugs, which can hurt the evaluations :( We present a novel automated translation framework to help with that! Eastern and Southern European languages introduce richer linguistic structures compared to English and for benchmarks which heavily rely on grammatical coherence machine translation presents a risk of harming evaluations. We discover potential answer leakage or misleading through grammatical structure of the questions. Some benchmarks are also just outdated and need to be retranslated with newer and better models. We present a framework with novel test-time scaling methods which allow to control time and cost investments, while at the same time mitigate the need for human-in-the-loop verification. While working on Ukrainian-focused MamayLM models, we had to translate 10+ benchmarks in a short span of time. Finding human evaluators is costly and time-consuming, same goes for using professional translators. With our pipeline we were able to do it in 3 days🏎️ We hope our findings will help enable stronger multilingual evaluations and developments. We release all produced benchmarks on Hugging Face together with the source code and Arxiv paper 🤗 Paper: https://huggingface.co/papers/2602.22207 Code: https://github.com/insait-institute/ritranslation Benchmarks: https://huggingface.co/collections/INSAIT-Institute/multilingual-benchmarks
View all activity
Organizations
stefan-it
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
2 datasets
6 days ago
windprak/steuerllm_instruct_dataset
Preview
•
Updated
19 days ago
•
41
•
1
castorini/NanoKnow-Fineweb-Edu-Index
Updated
6 days ago
•
1.37k
•
2
liked
a dataset
7 days ago
BabyLM-community/babylm-deu
Viewer
•
Updated
Oct 15, 2025
•
36.6k
•
53
•
2
liked
a dataset
11 days ago
Eurolingua/HPLT3_DE_0.9_Quantile_Adult_Filtered
Viewer
•
Updated
11 days ago
•
9.99M
•
28
•
1
liked
a dataset
12 days ago
turkish-nlp-suite/BellaTurca
Viewer
•
Updated
12 days ago
•
53.9M
•
1.04k
•
10
liked
3 datasets
13 days ago
sentence-transformers/s2orc
Viewer
•
Updated
May 6, 2024
•
132M
•
1.52k
•
16
openeurollm/propella-annotations
Viewer
•
Updated
about 2 hours ago
•
5.85B
•
9.59k
•
13
scrapegraphai/scrapegraphai-100k
Viewer
•
Updated
Dec 21, 2025
•
93.7k
•
71
•
23
liked
a model
19 days ago
windprak/open_steuerllm
Text Generation
•
28B
•
Updated
15 days ago
•
33
•
2
liked
a dataset
25 days ago
utter-project/EuroBlocks-SFT-2512
Viewer
•
Updated
26 days ago
•
1.09M
•
706
•
17
liked
a dataset
26 days ago
fineinstructions/fineinstructions_nemotron
Viewer
•
Updated
Jan 30
•
1.23B
•
2.71k
•
4
liked
a dataset
about 1 month ago
fineinstructions/finetemplates
Viewer
•
Updated
Jan 30
•
18.6M
•
266
•
2
liked
a Space
about 1 month ago
Running
1
OCR Dataset Generator
📝
1
Generate synthetic OCR datasets for low-resource languages
liked
a model
about 2 months ago
nvidia/Nemotron-Orchestrator-8B
Text Generation
•
Updated
Dec 2, 2025
•
16k
•
558
liked
a dataset
about 2 months ago
HuggingFaceFW/finetranslations
Viewer
•
Updated
Jan 9
•
3.33B
•
33.1k
•
272
liked
a dataset
2 months ago
bltlab/open-ner-standardized
Viewer
•
Updated
Dec 19, 2025
•
831k
•
357
•
2
liked
a dataset
3 months ago
minilingua-ai/mcqa-minilingua-sft
Viewer
•
Updated
Jul 27, 2025
•
17.2k
•
86
•
1
liked
3 models
3 months ago
minilingua-ai/MiniLingua-1b
Updated
Dec 27, 2025
•
108
•
2
Cognitive-Lab/NetraEmbed
Visual Document Retrieval
•
4B
•
Updated
Dec 10, 2025
•
383
•
24
Cognitive-Lab/ColNetraEmbed
Visual Document Retrieval
•
Updated
Dec 10, 2025
•
384
•
4
Load more