Fundamentals of Observability and Operation for Beginners
Building a system is only half the work. The other half is ensuring it continues to function in production.
Blog
Explore articles on software engineering, artificial intelligence, data, and systems architecture
Building a system is only half the work. The other half is ensuring it continues to function in production.
When we talk about modern systems, we usually think of APIs, interfaces, cloud, and Artificial Intelligence.
After understanding the business problem, the product objectives, and the software architecture fundamentals, it's time to take the next step: learning System Design.
Generative Artificial Intelligence is no longer an experimental technology and has become part of products used daily by millions of people.
Many developers believe that software architecture means drawing complex diagrams, choosing between AWS or Azure, or deciding whether an application should use microservices.
When I started using AI coding agents in my daily routine, I noticed a pattern that bothered me: the agent made mistakes that no dev on my team would make. It used the wrong formatter, ignored project naming conventions, and ran tests with the wrong command. It took me a while to understand that the problem wasn't the model. The problem was that the agent simply didn't know my project.
Language models generate. Vector databases retrieve.
Over the last 3 years studying and reviewing search pipelines and RAG, I've learned that most of the problems that appear in AI systems aren't in the language model. They're in the retrieval layer that nobody looks at with sufficient attention. This guide is what I wish I had when I started.
You type a few words into Google and, in under a second, you receive an ordered, relevant, and surprisingly accurate list of results. It seems trivial. But behind this experience lies an entire area of computer science dedicated to making it possible: Information Retrieval, or Recuperação de Informação.
Summary: This document dissects the architecture of a scientific reproduction agent system, examining its design trade-offs, consistency invariants, resilience patterns, and interface contracts between subsystems. The level of analysis assumes familiarity with distributed systems, LLM internals, and production software engineering.
# Executive summary: the SIA was designed to increase conversion, reduce operating cost and decrease transactional risk by separating the conversational decision (AI) from the critical execution (deterministic ERP APIs). This article navigates through management, product and engineering to explain what we built, why and what we learned.
Traditional RAG worked well for simple questions. But the real world is complex, and the new generation of retrieval systems needed to match that complexity.
This article is not about how to use LLMs. It is about how to architect systems that incorporate them with the same discipline required of any other critical component.
This article is not about how to use LLMs. It is about how to design systems that incorporate them with the same discipline required of any other critical component.