sail/longspec-Llama-3-8B-Instruct-262k
Text Generation β’ 0.3B β’ Updated
β’ 1
None defined yet.
TeamHOI: Learning a Unified Policy for Cooperative Human-Object Interactions with Any Team Size
Rethinking the Trust Region in LLM Reinforcement Learning