SIA: Intelligence System for Attention: architecture, decisions, and lessons learned

The Problem That Brought Us Here

The rental operation faces three central pains:

Lines and low human attention capacity: repetitive demands consume the time of specialized teams.
Conversion loss due to long and frictional journeys: the more steps there are between intention and booking, the greater the abandonment.
Operational risk in critical stages: price, availability, and booking without deterministic control generate financial errors and rework.

The SIA tackles these three points in parallel: it uses AI to accelerate understanding and communication, and deterministic APIs to ensure safe execution in decisions that impact revenue, contracts, and customer experience.

What SIA Delivers in Practice

For the business:

Conversion: increases the closing rate by reducing the time between intention and action.
Efficiency: decreases the cost per attention with the automation of repetitive demands.
Scalability: allows multi-brand operation (white-label) with tenant isolation.
Risk control: reduces financial errors by separating language generation from transactional execution.

For the end customer:

Speed: a shorter journey to resolve doubts, quote, and book in the same flow.
Clarity: objective, contextualized answers with explicit confirmation before sensitive actions.
Reliability: prices and availability come from the real system, not model estimates.
Continuity: exceptions are resolved with human handoff without loss of context.

Scope and Architectural Thesis

The SIA combines natural conversation with deterministic execution to cover four main journeys: FAQ with RAG, real-time quoting, creation/consultation/cancellation of bookings, and handoff for human attention.

The central thesis is simple: AI decides the conversation path, but price, availability, and booking are only confirmed through deterministic APIs integrated into the official system.

Mandatory requirements from the start: regulatory compliance for data privacy, tenant isolation, and end-to-end auditability.

Layered Architecture

The solution is organized into four layers with well-defined responsibilities:

| Layer | Primary Role | Reference Components | |---|---|---| | Web Frontend | Conversational experience, journey control, and visual fallback | ChatShell, MessageTimeline, QuoteComposer, HandoffPanel | | Backend API/Services | Contracts, idempotence, orchestration, and domain rules | FastAPI, ConversationService, RecoveryService | | AI and Orchestration | Triage, intention-based routing, tool calls, and governance | LangGraph, SupervisorAgent, RAG/Pricing/Booking/Guardian | | Data and Observability | Operational persistence, audit tracing, metrics, and tracing | PostgreSQL + pgvector, Redis, audit store, OpenTelemetry |

Multi-Agent Architecture

Instead of a single generic agent, the SIA uses multi-agent orchestration where each agent acts in a specific domain. This improves governance, reduces cross-regression, and makes the system's behavior explainable and auditable.

Specialized Agents

SIA Supervisor: central orchestrator. Receives the normalized message, classifies intention and risk, and routes to the correct agent. Does not respond directly: its function is to decide who should act.

SIA RAG: answers frequent questions, policies, and terms based on documentary evidence. Uses retrieval + rerank + anchored generation. If confidence is low, it asks for clarification or escalates to a human; never responds with assumptions.

SIA Pricing: generates quotes with real-time price and availability through deterministic integration with the ERP. Validates commercial parameters (dates, category, location) before any call.

SIA Booking: executes creation, consultation, and cancellation of bookings with explicit user confirmation, mandatory idempotence, and validation of business preconditions.

SIA Guardian: security and governance agent. Validates the candidate response before it reaches the customer: checks privacy/PII, institutional tone, content restrictions, and tenant rules. Can approve, sanitize, block, or require reformulation.

Simplified Algorithmic Flow

Normalize input with mandatory metadata (tenant_id, session_id, request_id, channel)
Apply entry guardrail: detect PII, prompt injection, and sensitive content
Classify intention and risk
Route to the specialized agent
Validate preconditions and execute within the domain
Pass through the Guardian for output validation
Return response with audited tracing

RAG: Responding with Evidence, Not Assumption

The RAG flow was designed with a central rule: first, we seek reliable context, then generate the response.

The stages are:

Documentary ingestion: official base versioned by tenant, domain, language, validity, and source.
Semantic chunking: division into fragments with controlled size and overlap to preserve context.
Embedding: each fragment is converted into a vector and stored with metadata.
Retrieval with top_k: recovery of the most relevant fragments with mandatory scope filters.
Rerank: reordering by adherence to the question, reducing noise.
Anchored generation: the model responds based on the constructed context, with explicit instruction not to invent facts.

Mandatory rule: transactional decisions (booking/cancellation) depend on a deterministic API with idempotence, never just the LLM.

Backend: Deterministic Layer of the Solution

The backend in FastAPI operates in two complementary roles: as BFF (Backend for Frontend), adapting the experience for each channel (web, app, WhatsApp); and as application and domain orchestration layer, executing critical business rules.

It is not just a frontend proxy. It is an intelligent BFF with direct responsibility over operational consistency and business risk.

Main Modules

Conversation Service: maintains conversational state by session, controls the journey stage, and valid state transitions.
Pricing Service: consults deterministic quoting in the ERP with parameter validation and traceable response.
Booking Service: executes transactional operations with business invariants and idempotence guarantees.
Security & Governance Service: centralizes privacy policies, PII redaction, and content guardrails.
Escalation Service: detects handoff triggers and transfers complete context to a human agent.
Recovery Service: manages retry with backoff, circuit breaker, timeout budgets, and reconciliation with the ERP.

Adopted Technical Patterns

Clean/Hexagonal Architecture: separation between input, application, domain, and infrastructure adapters.
Idempotent Consumer: operations with financial effect use an idempotent key and state verification before reprocessing.
Outbox/Inbox: domain events persisted with the local transaction for eventual consistency without loss.
Operation-oriented observability: structured JSON logs, distributed tracing (OpenTelemetry), and metrics by SLO.

Database: Operational Simplicity with Semantic Capability

The data architecture separates two responsibilities: transactional persistence and vector search for AI.

Current standard: PostgreSQL + pgvector. PostgreSQL is the transactional truth source with ACID, constraints, auditing, and tenant isolation. pgvector stores embeddings in the same operational stack, reducing infrastructure overhead.

Planned evolution: migration to Qdrant when there are objective triggers: P95 retrieval above the target, sustained growth of the vector base, or need for advanced tuning. The decision is guided by data, not technological preference.

Security and Governance

Security and governance are part of the primary design, not an add-on. The adopted principles:

Security by default: all data is born with access rules and minimum classification.
Least privilege: each service accesses only what is necessary for its function.
Multi-tenant isolation: each brand's data and context remain segregated throughout the journey.
Mandatory tracing: critical events are logged for technical and regulatory auditing.

Implemented controls: strong authentication, RBAC by tenant, semantic guardrails, PII redaction in input and output, encryption in transit and at rest, audit tracing, and retention by policy aligned with current privacy regulations.

Guardrails in Layers

The risk of each operation is mitigated with specific controls:

| Category | Primary Risk | Mandatory Control | |---|---|---| | RAG | Response without evidence | Retrieval with filters + rerank + minimum groundedness | | Pricing | Incorrect quote due to free interpretation | Schema validation + factual return from the API | | Booking | Transactional duplication/inconsistency | Idempotence + reconciliation + audited tracing | | Security and Governance | Data leakage or inappropriate response | Redaction, tenant policies, and blocking/sanitization |

Professional Observability

In AI, monitoring only infrastructure is not enough. The SIA monitors four dimensions:

1. Technical: latency by stage, errors, timeouts, retries, and service availability.

2. LLM: prompt used, prompt version, chosen model, input/output tokens, and estimated cost per interaction. The prompt is not loose text: it is a versioned and auditable artifact.

3. Quality: useful response rate, hallucination rate, guardrail blocks, and frequency of human intervention.

4. Business: completed bookings, conversions, time saved, reduction in human attention, and avoided financial errors. Without this layer, technology is monitored, but not value.

Reference tools: OpenTelemetry + GenAI platform (LangSmith, Langfuse, or Arize Phoenix) + operational dashboard.

Executive KPIs

The outcome indicators that are monitored:

Conversation conversion rate: percentage of sessions that end in a booking.
Cost per attention: average cost per automated session vs. human handoff.
Full journey success rate: FAQ → quote → booking without interruption.
Customer satisfaction: attention rating and clarity/reliability perception.

Governance criterion: a business KPI is only considered positive when accompanied by technical stability and ongoing governance.

How to Interpret KPIs Without Short-Term Illusion

High conversion with high error is not success: it can hide operational risk.
Low cost with excessive handoff is not efficiency: it may mean transferring the load to human operation.
High satisfaction without regulatory compliance is not sustainable: regulatory risk invalidates momentary gain.
Healthy goal: growth with preserved technical stability and governance.

Challenges and Lessons Learned

Main Development Challenges

Coordination between layers: keeping frontend, backend, AI, and ERP synchronized in a single journey.
AI accuracy: reducing hallucination without losing fluidity in attention.
Transactional reliability: avoiding booking/cancellation duplication with idempotence and explicit confirmation.
Multi-channel integration: maintaining continuity between app and WhatsApp without context loss.
Governance and compliance: applying data privacy, redaction, and audited tracing throughout the journey.

Specific Challenges of the AI Layer

Intention ambiguity: differentiating an informational question from a transactional action with low error margin.
Hallucination risk: avoiding responses without foundation in FAQ/policies, especially in commercial topics.
Multi-agent coordination: maintaining consistency between Supervisor, RAG, Pricing, Booking, and Guardian without cross-regression.

Adopted Architectural Mitigations

Clear separation between non-deterministic (interpretation) and deterministic (business execution) layers.
Guardrails in layers: input, context, tools, transactional, output, telemetry, and post-execution.
Mandatory use of deterministic APIs for price, availability, booking, and cancellation.
Recovery with retry/backoff, circuit breaker, and human handoff with complete session snapshot.
Complete observability with request_id/trace_id and quality scorecards.

Lessons Carried to Future Versions

Separation of boundaries reduces incidents: when language and transaction are separated, risk consistently falls.
Observability is not optional: without end-to-end correlation, diagnosis becomes guesswork.
Well-executed handoff protects the brand: context-complete transfer maintains continuity and customer trust.
**Ongoing governance avoids risk