TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale
Abstract
TingIS is an enterprise-grade incident discovery system that uses multi-stage event linking with LLMs, cascaded routing, and noise reduction to efficiently identify critical issues from high-volume, noisy customer reports.
Real-time detection and mitigation of technical anomalies are critical for large-scale cloud-native services, where even minutes of downtime can result in massive financial losses and diminished user trust. While customer incidents serve as a vital signal for discovering risks missed by monitoring, extracting actionable intelligence from this data remains challenging due to extreme noise, high throughput, and semantic complexity of diverse business lines. In this paper, we present TingIS, an end-to-end system designed for enterprise-grade incident discovery. At the core of TingIS is a multi-stage event linking engine that synergizes efficient indexing techniques with Large Language Models (LLMs) to make informed decisions on event merging, enabling the stable extraction of actionable incidents from just a handful of diverse user descriptions. This engine is complemented by a cascaded routing mechanism for precise business attribution and a multi-dimensional noise reduction pipeline that integrates domain knowledge, statistical patterns, and behavioral filtering. Deployed in a production environment handling a peak throughput of over 2,000 messages per minute and 300,000 messages per day, TingIS achieves a P90 alert latency of 3.5 minutes and a 95\% discovery rate for high-priority incidents. Benchmarks constructed from real-world data demonstrate that TingIS significantly outperforms baseline methods in routing accuracy, clustering quality, and Signal-to-Noise Ratio.
Community
We present TingIS, an enterprise-grade end-to-end risk discovery system that processes 300,000 customer incidents every day and up to 2,000 incidents per minute. By synergizing efficient indexing techniques with Large Language Models, TingIS achieves a 3.5-minute 90% latency and a 95% risk discovery rate in real-world deployment. This paper has been accepted for publication at ACL 2026 Industry Track.
Interesting breakdown of this paper on arXivLens: https://arxivlens.com/PaperView/Details/tingis-real-time-risk-event-discovery-from-noisy-customer-incidents-at-enterprise-scale-6408-3945b77d
Covers the executive summary, detailed methodology, and practical applications.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Retrieval-Augmented LLMs for Security Incident Analysis (2026)
- From Load Tests to Live Streams: Graph Embedding-Based Anomaly Detection in Microservice Architectures (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2604.21889 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper