new

Get trending papers in your email inbox!

Subscribe

Trending Papers

byAK and the research community

Trending Papers

TradingAgents: Multi-Agents LLM Financial Trading Framework

A multi-agent framework using large language models for stock trading simulates real-world trading firms, improving performance metrics like cumulative returns and Sharpe ratio.

  • 4 authors
· Dec 28, 2024

AI-Trader: Benchmarking Autonomous Agents in Real-Time Financial Markets

AI-Trader presents the first fully automated live benchmark for evaluating large language models in financial decision-making across multiple markets with autonomous information processing.

  • 6 authors
· Dec 1, 2025
Submitted by
thuzhaowang

Pixal3D: Pixel-Aligned 3D Generation from Images

Pixal3D introduces a pixel-aligned 3D generation approach that addresses fidelity issues in 3D asset creation by establishing direct pixel-to-3D correspondences through back-projection conditioning.

TencentARC ARC Lab, Tencent PCG · May 11, 2026
Submitted by
liangjiaqing

GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization (V1.0)

GenericAgent is a self-evolving large language model agent system that maximizes context information density through hierarchical memory, reusable SOPs, and efficient compression to overcome long-horizon limitations.

Fudan-University Fudan University · Apr 18, 2026
Submitted by
jianchen0311

DFlash: Block Diffusion for Flash Speculative Decoding

DFlash is a speculative decoding framework that uses a lightweight block diffusion model for parallel token drafting, achieving significant speedup over existing autoregressive methods while maintaining high-quality outputs.

z-lab Z Lab · Feb 5, 2026
Submitted by
RuofengYang

ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration

ARIS is an open-source research harness that uses cross-model adversarial collaboration to ensure reliable long-term research outcomes through coordinated execution, orchestration, and assurance layers.

Kronos: A Foundation Model for the Language of Financial Markets

Kronos, a specialized pre-training framework for financial K-line data, outperforms existing models in forecasting and synthetic data generation through a unique tokenizer and autoregressive pre-training on a large dataset.

  • 7 authors
· Aug 2, 2025
Submitted by
Paranioar

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

Unified vision-language models treat understanding and generation as integrated processes rather than separate tasks, demonstrating strong performance across multiple multimodal capabilities including image synthesis and action reasoning.

sensenova SenseNova · May 12, 2026
Submitted by
taesiri

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

MinerU2.5, a 1.2B-parameter document parsing vision-language model, achieves state-of-the-art recognition accuracy with computational efficiency through a coarse-to-fine parsing strategy.

  • 61 authors
· Sep 26, 2025
Submitted by
akhaliq

Efficient Memory Management for Large Language Model Serving with PagedAttention

PagedAttention algorithm and vLLM system enhance the throughput of large language models by efficiently managing memory and reducing waste in the key-value cache.

  • 9 authors
· Sep 12, 2023
Submitted by
akhaliq

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Mem0, a memory-centric architecture with graph-based memory, enhances long-term conversational coherence in LLMs by efficiently extracting, consolidating, and retrieving information, outperforming existing memory systems in terms of accuracy and computational efficiency.

  • 5 authors
· Apr 28, 2025
Submitted by
akhaliq

OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

OpenDevin is a platform for developing AI agents that interact with the world by writing code, using command lines, and browsing the web, with support for multiple agents and evaluation benchmarks.

  • 24 authors
· Jul 23, 2024
Submitted by
hongyyyyy

Adam's Law: Textual Frequency Law on Large Language Models

A novel framework for improving large language model performance through textual frequency analysis, including laws, distillation, and curriculum training approaches.

FaceMind FaceMind · Apr 2, 2026
Submitted by
Osilly

Flow-OPD: On-Policy Distillation for Flow Matching Models

Flow-OPD addresses limitations in Flow Matching text-to-image models through a two-stage alignment approach combining on-policy distillation and manifold anchor regularization, achieving significant improvements in generation quality and alignment metrics.

  • 11 authors
· May 8, 2026
Submitted by
Wenxuan123

RoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmark

RoboMemArena presents a large-scale robotic memory benchmark with diverse tasks and real-world evaluation, while PrediMem demonstrates improved memory management through a dual-system vision-language architecture with predictive coding.

  • 13 authors
· May 11, 2026
Submitted by
taesiri

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

PaddleOCR-VL, a vision-language model combining NaViT-style dynamic resolution and ERNIE, achieves state-of-the-art performance in document parsing and element recognition with high efficiency.

PaddlePaddle PaddlePaddle · Oct 16, 2025
Submitted by
taesiri

HumanNet: Scaling Human-centric Video Learning to One Million Hours

HumanNet presents a large-scale human-centric video dataset with rich annotations for embodied intelligence, demonstrating that egocentric human video can effectively replace robot data for training vision-language-action models.

  • 2 authors
· May 7, 2026
Submitted by
unilm

VibeVoice Technical Report

VibeVoice synthesizes long-form multi-speaker speech using next-token diffusion and a highly efficient continuous speech tokenizer, achieving superior performance and fidelity.

MicrosoftResearch Microsoft Research · Aug 26, 2025
Submitted by
taesiri

AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications

AgentScope enhances agentic applications by providing flexible tool-based interactions, unified interfaces, and advanced infrastructure based on the ReAct paradigm, supporting efficient and safe development and deployment.

  • 23 authors
· Aug 22, 2025
Submitted by
akhaliq

Very Large-Scale Multi-Agent Simulation in AgentScope

Enhancements to the AgentScope platform improve scalability, efficiency, and ease of use for large-scale multi-agent simulations through distributed mechanisms, flexible environments, and user-friendly tools.

  • 8 authors
· Jul 25, 2024
Submitted by
Yirany

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

MiniCPM-o 4.5 enables real-time full-duplex multimodal interaction through Omni-Flow, a unified streaming framework that aligns inputs and outputs temporally for simultaneous perception and response.

openbmb OpenBMB · Apr 30, 2026
Submitted by
taesiri

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe

MiniCPM-V 4.5, a 8B parameter multimodal large language model, achieves high performance and efficiency through a unified 3D-Resampler architecture, a unified learning paradigm, and a hybrid reinforcement learning strategy.

  • 34 authors
· Sep 16, 2025

OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

A novel GPT-based model, OmniFlatten, enables real-time natural full-duplex spoken dialogue through a multi-stage post-training technique that integrates speech and text without altering the original model's architecture.

  • 9 authors
· Oct 23, 2024
Submitted by
andito

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

SmolDocling is a compact vision-language model that performs end-to-end document conversion with robust performance across various document types using 256M parameters and a new markup format.

ibm-granite IBM Granite · Mar 14, 2025
Submitted by
Sicong

World Model for Robot Learning: A Comprehensive Survey

World models as predictive representations of environmental dynamics have become essential for robot learning, supporting policy learning, planning, and simulation across various embodied applications.

  • 18 authors
· Apr 30, 2026
Submitted by
taesiri

NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation

NanoResearch is a multi-agent framework that enhances research automation through personalized assistance by accumulating reusable skills, maintaining user-specific experience, and internalizing implicit preferences through co-evolving components.

  • 15 authors
· May 11, 2026
Submitted by
nielsr

Geometric Context Transformer for Streaming 3D Reconstruction

LingBot-Map is a feed-forward 3D foundation model that reconstructs scenes from video streams using a geometric context transformer architecture with specialized attention mechanisms for coordinate grounding, dense geometric cues, and long-range drift correction, achieving stable real-time performance at 20 FPS.

robbyant Robbyant · Apr 15, 2026
Submitted by
Jiashuz

MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation

MACE-Dance is a music-driven dance video generation framework that combines cascaded Mixture-of-Experts with diffusion models and specialized training strategies to achieve high-quality visual appearance and realistic human motion.

GD-ML AMAP-ML · May 7, 2026
Submitted by
lovesnowbest

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

UI-TARS-2, a native GUI-centered agent model, addresses challenges in data scalability, multi-turn reinforcement learning, and environment stability, achieving significant improvements over its predecessor and strong baselines across various benchmarks.

ByteDance-Seed ByteDance Seed · Sep 2, 2025

LightRAG: Simple and Fast Retrieval-Augmented Generation

LightRAG improves Retrieval-Augmented Generation by integrating graph structures for enhanced contextual awareness and efficient information retrieval, achieving better accuracy and response times.

  • 5 authors
· Oct 8, 2024
Submitted by
Wenxuan123

CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models

A novel approach decouples auxiliary training objectives from standard supervised finetuning to enhance model capabilities while reducing computational overhead through capability vector merging and orthogonal regularization.

OpenHelix-Team OpenHelix-Team · May 11, 2026

EverMemOS: A Self-Organizing Memory Operating System for Structured Long-Horizon Reasoning

EverMemOS presents a self-organizing memory system for large language models that processes dialogue streams into structured memory cells and scenes to enhance long-term interaction capabilities.

  • 11 authors
· Jan 5, 2026
Submitted by
Rbin

RAG-Anything: All-in-One RAG Framework

RAG-Anything is a unified framework that enhances multimodal knowledge retrieval by integrating cross-modal relationships and semantic matching, outperforming existing methods on complex benchmarks.

Submitted by
taesiri

δ-mem: Efficient Online Memory for Large Language Models

A lightweight memory mechanism called δ-mem enhances large language models by augmenting a frozen attention backbone with a compact associative memory state that provides low-rank corrections to attention computations.

mindlab-research Mind Lab · May 12, 2026

A decoder-only foundation model for time-series forecasting

A large language model adapted for time-series forecasting achieves near-optimal zero-shot performance on diverse datasets across different time scales and granularities.

  • 4 authors
· Oct 14, 2023

AutoDev: Automated AI-Driven Development

AutoDev is an AI-driven software development framework that automates complex engineering tasks within a secure Docker environment, achieving high performance in code and test generation.

  • 5 authors
· Mar 13, 2024

Zep: A Temporal Knowledge Graph Architecture for Agent Memory

Zep, a memory layer service, outperforms MemGPT in the DMR benchmark and LongMemEval by excelling in dynamic knowledge integration and temporal reasoning, critical for enterprise use cases.

  • 5 authors
· Jan 20, 2025
Submitted by
akhaliq

Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

rStar enhances small language models' reasoning capabilities through a self-play mutual generation-discrimination process without fine-tuning, improving accuracy across various reasoning tasks.

  • 6 authors
· Aug 12, 2024
Submitted by
taesiri

DeepCode: Open Agentic Coding

DeepCode, a fully autonomous framework, addresses the challenges of document-to-codebase synthesis by optimizing information flow through source compression, structured indexing, knowledge injection, and error correction, achieving state-of-the-art performance and surpassing human experts.

  • 5 authors
· Dec 8, 2025
Submitted by
hao-li

Agent READMEs: An Empirical Study of Context Files for Agentic Coding

Agentic coding tools receive goals written in natural language as input, break them down into specific tasks, and write or execute the actual code with minimal human intervention. Central to this process are agent context files ("READMEs for agents") that provide persistent, project-level instructions. In this paper, we conduct the first large-scale empirical study of 2,303 agent context files from 1,925 repositories to characterize their structure, maintenance, and content. We find that these files are not static documentation but complex, difficult-to-read artifacts that evolve like configuration code, maintained through frequent, small additions. Our content analysis of 16 instruction types shows that developers prioritize functional context, such as build and run commands (62.3%), implementation details (69.9%), and architecture (67.7%). We also identify a significant gap: non-functional requirements like security (14.5%) and performance (14.5%) are rarely specified. These findings indicate that while developers use context files to make agents functional, they provide few guardrails to ensure that agent-written code is secure or performant, highlighting the need for improved tooling and practices.

  • 11 authors
· Nov 17, 2025
Submitted by
akhaliq

3D Gaussian Splatting for Real-Time Radiance Field Rendering

A method using 3D Gaussians for scene representation and optimized rendering allows high-quality, real-time novel-view synthesis at 1080p resolution.

  • 4 authors
· Aug 8, 2023
Submitted by
youganglyu

EvoScientist: Towards Multi-Agent Evolving AI Scientists for End-to-End Scientific Discovery

EvoScientist is an adaptive multi-agent framework that enhances scientific discovery by continuously learning from past interactions through persistent memory modules.

  • 12 authors
· Mar 9, 2026
Submitted by
hkuzxc

HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents

HyperEyes is a parallel multimodal search agent that enables concurrent entity searches while optimizing inference efficiency through dual-grained reinforcement learning and a specialized benchmark for evaluating both accuracy and efficiency.

XiaohongshuAI Xiaohongshu · May 8, 2026
Submitted by
Jiafei1224

MolmoAct2: Action Reasoning Models for Real-world Deployment

MolmoAct2 presents an open-action reasoning model for robotics that improves upon previous systems through specialized vision-language-model backbones, new datasets, open-weight action tokenizers, architectural redesign for continuous-action prediction, and adaptive reasoning for reduced latency.

allenai Ai2 · May 4, 2026
Submitted by
akhaliq

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

LlamaFactory is a unified framework enabling efficient fine-tuning of large language models across various tasks using a web-based user interface.

  • 5 authors
· Mar 20, 2024
Submitted by
ChengsongHuang

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

AutoTTS automates test-time scaling strategy discovery by formulating it as controller synthesis over reasoning trajectories and probe signals, achieving improved accuracy-cost tradeoffs with minimal computational overhead.

google Google · May 8, 2026
Submitted by
csuhan

OpenGame: Open Agentic Coding for Games

OpenGame is an open-source agentic framework for end-to-end web game creation that uses specialized code models and evaluation benchmarks to overcome challenges in interactive application development.

  • 11 authors
· Apr 20, 2026
Submitted by
yuezhengrong

What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion

Research investigates latent manifold properties for diffusion models and proposes a Prior-Aligned AutoEncoder that explicitly optimizes latent space structure for improved generative modeling.

alibaba-inc alibaba-inc · May 8, 2026

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

LeWorldModel presents a stable end-to-end JEPA framework that trains efficiently from raw pixels using minimal loss terms while maintaining competitive performance in control tasks and encoding meaningful physical structures.

galilai-group galilai-group · Mar 13, 2026
Submitted by
ZhuofengLi

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

Direct corpus interaction enables more effective agentic search by allowing agents to query raw text directly, outperforming traditional retrieval methods in complex tasks.

TIGER-Lab TIGER-Lab · May 3, 2026