stereoplegic 's Collections Byte-level
updated
ByT5: Towards a token-free future with pre-trained byte-to-byte models
Paper
• 2105.13626
• Published
• 4
Beyond Language Models: Byte Models are Digital World Simulators
Paper
• 2402.19155
• Published
• 53
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Paper
• 2305.07185
• Published
• 10
Byte-Level Recursive Convolutional Auto-Encoder for Text
Paper
• 1802.01817
• Published
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
Paper
• 2403.09622
• Published
• 17
Bytes are All You Need: End-to-End Multilingual Speech Recognition and
Synthesis with Bytes
Paper
• 1811.09021
• Published
• 1
Neural Machine Translation with Byte-Level Subwords
Paper
• 1909.03341
• Published
• 1
Neural Machine Translation without Embeddings
Paper
• 2008.09396
• Published
ÚFAL at MultiLexNorm 2021: Improving Multilingual Lexical
Normalization by Fine-tuning ByT5
Paper
• 2110.15248
• Published
MonoByte: A Pool of Monolingual Byte-level Language Models
Paper
• 2209.11035
• Published
Are Character-level Translations Worth the Wait? Comparing Character-
and Subword-level Models for Machine Translation
Paper
• 2302.14220
• Published
Bilingual End-to-End ASR with Byte-Level Subwords
Paper
• 2205.00485
• Published
MambaByte: Token-free Selective State Space Model
Paper
• 2401.13660
• Published
• 60
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language
Representation
Paper
• 2103.06874
• Published
• 2
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
Paper
• 2404.14408
• Published
• 7
Integrating Multi-scale Contextualized Information for Byte-based Neural
Machine Translation
Paper
• 2405.19290
• Published
Word-Level Representation From Bytes For Language Modeling
Paper
• 2211.12677
• Published
byteSteady: Fast Classification Using Byte-Level n-Gram Embeddings
Paper
• 2106.13302
• Published