Drop-Upcycling
updated
9B
•
Updated
•
4
9B
•
Updated
•
3
19B
•
Updated
•
6
9B
•
Updated
•
6
9B
•
Updated
•
4
0.4B
•
Updated
•
7
0.4B
•
Updated
•
11
0.4B
•
Updated
•
6
0.4B
•
Updated
•
15
9B
•
Updated
•
3
19B
•
Updated
•
6
0.4B
•
Updated
•
8
9B
•
Updated
•
3
0.4B
•
Updated
•
14
•
1
2B
•
Updated
•
4
0.2B
•
Updated
•
14
4B
•
Updated
•
4
14B
•
Updated
•
4
llm-jp/Dense-btx-code-expert-152M
0.2B
•
Updated
•
4
•
1
llm-jp/Dense-btx-english-expert-1.5B
2B
•
Updated
•
2
llm-jp/Dense-btx-code-expert-1.5B
2B
•
Updated
•
2
•
1
llm-jp/Dense-btx-japanese-expert-1.5B
2B
•
Updated
•
6
•
1
llm-jp/Dense-btx-english-expert-152M
0.2B
•
Updated
•
4
llm-jp/Dense-btx-japanese-expert-152M
0.2B
•
Updated
•
4
Drop-Upcycling: Training Sparse Mixture of Experts with Partial
Re-initialization
Paper
•
2502.19261
•
Published
•
6
Text Generation
•
73B
•
Updated
•
233
llm-jp/llm-jp-3-8x13b-instruct3
Text Generation
•
73B
•
Updated
•
83
•
8
llm-jp/llm-jp-3-8x1.8b-instruct3
Text Generation
•
9B
•
Updated
•
157
•
4
Text Generation
•
9B
•
Updated
•
29
llm-jp/llm-jp-3-8x13b-instruct2
Text Generation
•
73B
•
Updated
•
23
llm-jp/llm-jp-3-8x1.8b-instruct2
Text Generation
•
9B
•
Updated
•
36
llm-jp/llm-jp-3.1-8x13b-instruct4
Text Generation
•
73B
•
Updated
•
569
•
4
Text Generation
•
73B
•
Updated
•
373