Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
fancyzhx
's Collections
Audio Datasets
Robotic Datasets
Video Datasets
Image Datasets
Text Datasets
Text Datasets
updated
Jun 20, 2025
Upvote
-
Running
132
TxT360: Trillion Extracted Text
📖
132
Explore the TxT360 LLM pre‑training dataset
CASIA-LM/ChineseWebText2.0
Viewer
•
Updated
Dec 2, 2024
•
2k
•
3.3k
•
28
HPLT/HPLT2.0_cleaned
Viewer
•
Updated
Nov 13, 2025
•
9.03B
•
18.4k
•
36
TrevorDohm/Pile_Tokenized
Viewer
•
Updated
Feb 20, 2024
•
134M
•
19
Upvote
-
Share collection
View history
Collection guide
Browse collections