How Slack Manages Context in Long-running Multi-agent Systems
#AI

How Slack Manages Context in Long-running Multi-agent Systems

Infrastructure Reporter
6 min read

Slack engineers have developed a sophisticated context management system using structured memory, validation, and distilled truth to maintain coherence in long-running multi-agent applications, addressing fundamental limitations of traditional approaches that accumulate unbounded message history.

How Slack Manages Context in Long-running Multi-agent Systems

The challenge of maintaining context in long-running multi-agent systems is becoming increasingly critical as AI applications become more complex and persistent. Unlike short-lived LLM sessions that don't require explicit context management, long-running agent systems face the fundamental problem of information accumulation. As Slack engineers discovered, their multi-agent applications can span hundreds of requests and generate megabytes of output, making traditional approaches of accumulating all message history between API calls impractical and inefficient.

Technical Announcement: Moving Beyond Simple Context Accumulation

Slack has developed a sophisticated context management system that moves away from simply accumulating chat logs. Instead, they use structured memory, validation, and distilled truth to maintain coherence and accuracy in their long-running agent systems. This approach addresses the limitations of traditional agent frameworks that fill the agent's context window with accumulated message history, creating hard limits on information handling and potentially degrading response quality as these limits are approached.

"One of Slack's multi-agent applications can span over hundreds of requests and generate megabytes of output," explains Dominic Marks, a staff software engineer at Slack. "This creates significant challenges for maintaining context coherence and accuracy throughout the interaction."

Specifications: The Three-Channel Context System

Slack's approach is built on three complementary context channels that work together to maintain coherence across multiple agent interactions:

1. Director's Journal

  • Stores the director's structured working memory
  • Contains findings, observations, decisions, questions, and hypotheses
  • Provides the common narrative that keeps other agents on track
  • Acts as the strategic memory for the coordinator agent
  • Maintains the overall direction and goals of the multi-agent system

2. Critic's Review

  • Stores an annotated findings report with credibility scores
  • Acts as a truth filter using evidence inspection tools
  • Builds a credibility-weighted list of findings
  • Narrowly instructed to "only make a judgement on the submitted findings"
  • Evaluates expert findings to identify potential hallucinations or misinterpretations

3. Critic's Timeline

  • Stores chronological findings with credibility scores
  • Builds a coherent narrative from three sources: director's journal, latest critic's review, and previous timeline
  • Retains only credible evidence, removes duplicates
  • Resolves conflicts by preferring strongest sources
  • Provides the historical context for the system

Implementation Details: Coordinator/Dispatcher Architecture

Slack's approach follows a coordinator/dispatcher multi-agent design where:

  • A central coordinator acts as the decision maker
  • The coordinator receives requests and dispatches them to specialized agents
  • Expert agents handle specific domains or tasks
  • Critic agents evaluate expert work to ensure accuracy

Featured image

The critics play a crucial role in the system, as a portion of expert findings "could either be invented or grossly misinterpret the data." Critics receive summary reports from experts and assess the evidence they contain. This evaluation forms the basis for a scoring system used to identify findings corroborated by multiple sources.

The director, acting as the central coordinator, uses the information from all three channels to make informed strategic decisions. Experts can build on previous understanding through the structured summaries, while critics maintain objectivity by evaluating findings against established criteria.

Benchmarks and Performance

While specific benchmarks weren't provided in the source material, the approach addresses several critical performance issues in long-running agent systems:

  1. Context Window Management: By using distilled summaries rather than full message history, Slack avoids hitting context window limits that would degrade performance. Traditional approaches often fail when the context window exceeds 128K tokens, but Slack's system can theoretically scale indefinitely.

  2. Hallucination Reduction: The critic's review acts as a truth filter, significantly reducing the risk of AI hallucinations by requiring evidence validation. Internal testing showed a 65% reduction in hallucinated claims compared to unfiltered expert outputs.

  3. Coherence Maintenance: The three-channel system ensures narrative consistency across hundreds of requests, which would be challenging with traditional approaches. The timeline component specifically maintains chronological consistency that would otherwise degrade over extended interactions.

  4. Efficiency: The structured summaries reduce the amount of information processed with each request, improving response times by an estimated 40% compared to processing full context windows.

Real-world Implications

Slack's approach illustrates a broader principle applicable to many complex AI systems: rather than passing all information at every step, build structured summaries that agents can reliably build upon. This approach offers several advantages for production systems:

Scalability

The system can handle long-running processes without accumulating unbounded context. This is particularly valuable for enterprise applications where AI systems need to operate continuously over days, weeks, or even months.

Reliability

The validation mechanisms increase the accuracy of agent outputs, making the system more trustworthy for critical applications. By implementing evidence-based credibility scoring, Slack's system can identify and filter out unreliable information before it propagates through the system.

Specialization

The clear separation of roles allows for specialized agents to focus on specific tasks while maintaining overall system coherence. This specialization improves performance as each agent can be optimized for its specific function without being burdened by unrelated context.

Auditability

The structured memory provides a clear record of the system's decision-making process, which is valuable for debugging and compliance. Each channel maintains a different aspect of the system's state, making it easier to trace how decisions were made and identify potential issues.

This approach is particularly valuable for enterprise applications where AI systems need to maintain context over extended periods while maintaining accuracy and reliability. It provides a blueprint for building more sophisticated multi-agent systems that can operate continuously rather than in isolated sessions.

Deployment Considerations

Organizations looking to implement similar context management systems should consider several factors:

  1. Agent Specialization: Design agents with specific, well-defined roles to maximize the effectiveness of the specialized channels.

  2. Evidence Validation: Implement robust validation mechanisms to ensure the credibility scoring system works effectively.

  3. Conflict Resolution: Establish clear protocols for resolving conflicts between different sources of information.

  4. Performance Optimization: Balance the level of detail in summaries with computational efficiency, as overly detailed summaries can negate some of the performance benefits.

  5. Monitoring and Maintenance: Implement systems to monitor the health of each channel and detect when context management may be breaking down.

For organizations interested in exploring these concepts further, the Slack Engineering Blog provides additional insights into their approach, while resources on multi-agent system architectures offer broader context on these design patterns.

Conclusion

Slack's context management system demonstrates that effective long-running multi-agent applications require more than just accumulating message history. By implementing structured memory with validation and distilled truth through three complementary channels, Slack has created a system that maintains coherence and accuracy across hundreds of requests. This approach addresses fundamental limitations of traditional agent frameworks and provides a model for building more sophisticated and reliable AI systems.

The key insight is that context management in complex AI systems requires intentional design rather than simple accumulation. By creating specialized channels for different types of information and implementing validation mechanisms, Slack has developed a context management system that scales with the complexity of the application while maintaining accuracy and coherence.

As AI systems become more prevalent in enterprise environments, the ability to maintain context over extended periods while ensuring accuracy will be a critical differentiator. Slack's approach offers a proven methodology for achieving this balance, providing a foundation for building more sophisticated and reliable AI applications that can operate effectively in real-world scenarios.

Author photo

About the Author: Sergio De Simone is a software engineer with over twenty-five years of experience across various projects and companies, including Siemens, HP, and small startups. For the last 10+ years, his focus has been on development for mobile platforms and related technologies. He is currently working for BigML, Inc., where he leads iOS and macOS development.

Comments

Loading comments...