Skip to content
Updated: 43 min read

AI Agents in Enterprise - What Your Team Needs to Learn

AI Agents are transforming how IT teams work in enterprise. Discover key competencies, architectures and skills needed to implement AI agents in your...

Łukasz Szymański Author: Łukasz Szymański

AI Agents are no longer an experiment. In 2026, we’re seeing deployments in companies such as Salesforce, Microsoft, and Shopify. These aren’t just chatbots - they’re autonomous systems that execute real business tasks: analysing code, automating processes, responding to customers, generating reports.

For IT teams, this represents a fundamental paradigm shift. It’s no longer enough to know web frameworks or databases. You need to understand agentic system architecture, prompt engineering, orchestration, RAG (Retrieval-Augmented Generation), tool use and guardrails. These are new skills that 95% of IT teams in Poland don’t have in their portfolio today.

If you lead a technical team or are responsible for competency development in your organisation - this article will show you specifically what your people need to learn, which frameworks are worth knowing, and how to approach implementing AI agents in a practical and secure manner.

What are AI Agents and why does enterprise need them?

An AI Agent is an autonomous system based on a language model (LLM) that can:

  • Plan - break down complex tasks into steps
  • Use tools - call APIs, search databases, run scripts
  • Make decisions - choose the next action based on context
  • Iterate - repeat the cycle until the goal is achieved
  • Report - inform the user about progress and results

Example: You ask an agent “Analyse the last 100 commits in the repository and find potential security issues”. The agent autonomously:

  1. Connects to GitHub API
  2. Fetches commits
  3. Analyses code from a security perspective
  4. Saves results to the database
  5. Generates a report with priorities
  6. Creates tickets in Jira for high-priority findings

This isn’t a chatbot that answers questions. This is a system that does work.

Why does enterprise need AI Agents right now?

1. Rising specialist labour costs The median salary for a senior developer in Poland is PLN 180-220k annually (2026 data). An AI Agent costs 5-10% of that budget and works 24/7.

2. Talent shortage Polish IT companies report 40,000 unfilled positions (PARP 2026 report). Agents fill gaps in repetitive tasks, allowing people to focus on work requiring creativity.

3. Pressure on efficiency CFOs demand ROI from every IT investment. An agentic system in customer support reduces response time by 80% and costs by 60% (Zendesk AI Benchmark 2026 data).

4. Technology availability APIs from OpenAI, Anthropic, Google are stable, scalable and increasingly cheaper. Claude 3.7 Sonnet today costs $3 per 1M input tokens - 10x cheaper than a year ago.

5. Competitive pressure Your competitors are already testing this. The first wave of adopters will gain an advantage in 2026-2027. Those who start later will be playing catch-up.

What technical competencies are needed to build AI agents?

Implementing AI Agents requires completely new skills from the team. Here are 7 key competency areas you need:

1. Prompt Engineering - the foundation of everything

What it is: The art of designing instructions for language models to get predictable and valuable responses.

Why it’s crucial: An agent only works as well as it’s been “programmed” with prompts. A poorly written prompt = an agent that hallucinates, wastes tokens and gives unstable results.

What to learn:

  • System prompts - how to define the agent’s role, context and constraints
  • Few-shot learning - technique of showing examples in the prompt
  • Chain of Thought (CoT) - forcing step-by-step thinking
  • Structured outputs - enforcing JSON/XML formats in responses
  • Safety and guardrails - how to prevent unwanted behaviour

Example from practice: In a project for a bank, we had to build an agent to analyse credit documentation. The first prompt was 200 words. The agent hallucinated in 15% of cases. After redesigning to a structural prompt with explicit CoT and examples - hallucinations dropped to <2%.

Learning tools:

  • Anthropic Prompt Engineering Guide
  • OpenAI Cookbook
  • LangSmith (for debugging prompts)

2. Retrieval-Augmented Generation (RAG) - the agent’s memory

What it is: A technique combining a language model with a knowledge base. Before responding, the agent searches for relevant documents and uses them as context.

Why it’s crucial: LLMs don’t know your documentation, your code, your company policy. RAG allows the agent to operate on current, company knowledge without fine-tuning the model.

What to learn:

  • Embeddings - how to transform texts into vectors
  • Vector databases - Pinecone, Weaviate, pgvector, Qdrant
  • Chunking strategies - how to divide documents into fragments (chunk size, overlap)
  • Hybrid search - combining semantic search (embedding) with keyword search (BM25)
  • Reranking - how to improve search accuracy (Cohere Rerank, Cross-Encoder)
  • Metadata filtering - filtering by date, author, document type

Example from practice: Agent for customer support in e-commerce. We had 5000 articles from the knowledge base. Naive RAG (500-token chunk, top 3 results) gave 65% accuracy. After introducing:

  • Hybrid search (semantic + keyword)
  • Hierarchical chunking (summary + details)
  • Metadata filtering (publication date, category)
  • Reranking before submitting to LLM

Accuracy increased to 89%.

Tools:

  • LangChain (RAG chains)
  • LlamaIndex (data connectors)
  • ChromaDB, Pinecone (vector storage)

3. Tool Use (Function Calling) - the agent’s hands and feet

What it is: The agent’s ability to call external functions and APIs. The agent “knows” what tools are available and autonomously decides when to use which one.

Why it’s crucial: Without tool use, an agent is just a chatbot. Tool use gives the agent the ability to act - sending emails, saving to database, calling APIs, running scripts.

What to learn:

  • Function schemas - how to describe a function in JSON schema format
  • Parameter extraction - how the model extracts parameters from user input
  • Error handling - what happens when API returns an error
  • Tool selection strategies - how the agent chooses the right tool
  • Safety boundaries - which tools the agent should NOT have (delete operations, payment APIs without approval)

Example from practice: Agent for code review in a fintech company. Available tools:

  • get_file_content(path) - reading file from repo
  • run_static_analysis(file) - running linter/SAST
  • search_codebase(query) - semantic search in code
  • create_github_comment(pr_id, comment) - comment in PR
  • get_test_coverage(file) - checking test coverage

The agent autonomously:

  1. Fetches changed files from PR
  2. Runs static analysis for each
  3. If it detects an issue - searches codebase for similar problems
  4. Checks test coverage
  5. Generates detailed review with examples and links
  6. Adds comment to PR

Frameworks with native tool support:

  • Anthropic Claude (tool use API)
  • OpenAI (function calling)
  • LangChain (tools abstraction)

4. Multi-Agent Orchestration - teamwork of agents

What it is: Systems where multiple agents collaborate, each with a specialisation. One agent plans, another does research, third writes code, fourth tests.

Why it’s crucial: Complex tasks require different “roles”. One agent-for-everything doesn’t scale in enterprise use cases.

What to learn:

  • Agent roles - how to define roles and responsibilities
  • Communication protocols - how agents pass information to each other
  • Coordination patterns - hierarchical (boss-worker), sequential (pipeline), parallel (fork-join)
  • Conflict resolution - what happens when agents disagree
  • State management - how to manage shared context between agents

Example from practice: System for content marketing automation:

Agent 1 - Strategist:

  • Analyses trends (Google Trends API)
  • Chooses topic
  • Defines target persona

Agent 2 - Researcher:

  • Searches for sources (Perplexity API, arXiv, PubMed)
  • Extracts key insights
  • Validates facts

Agent 3 - Writer:

  • Writes draft based on research
  • Applies brand voice guidelines
  • Optimises for SEO

Agent 4 - Editor:

  • Reviews for facts
  • Checks grammatical correctness
  • Final polish

Each agent is a specialist. Coordination through orchestrator (LangGraph).

Frameworks:

  • CrewAI (role-based agents)
  • AutoGen (conversational agents)
  • LangGraph (state-based orchestration)

5. Guardrails and Safety - how not to lose control

What it is: Control and security mechanisms preventing unwanted agent behaviour.

Why it’s crucial: An autonomous agent without guardrails is a legal, financial and reputational risk. It can hallucinate, leak data, execute a destructive operation.

What to learn:

  • Input validation - filtering prompt injection
  • Output validation - checking if response is safe
  • PII detection - detecting personal data (GDPR compliance)
  • Fact-checking - validating agent assertions
  • Rate limiting - protection against runaway agents (agent in a loop)
  • Human-in-the-loop (HITL) - when agent must wait for approval
  • Audit logging - tracking all decisions and actions

Example from practice: Agent for handling complaints in e-commerce:

Guardrails:

  • Cannot propose refund >PLN 500 without approval
  • Must detect PII in transcripts and mask (GDPR)
  • Output filtered through toxicity detector
  • Every decision logged to audit trail
  • Timeout after 10 LLM calls (protection against loop)

Tools:

  • Guardrails AI (framework)
  • NeMo Guardrails (NVIDIA)
  • LangSmith (monitoring)
  • Presidio (PII detection)

6. LLM APIs and Model Selection - choosing the right model

What it is: The ability to choose the right LLM model for the task and work effectively with APIs.

Why it’s crucial: Different models have different strengths. GPT-4o is expensive but powerful. Claude 3.5 Haiku is fast and cheap. Wrong choice = wasting budget or poor results.

What to learn:

  • Model capabilities - what GPT-4o, Claude 3.7, Gemini 1.5, Llama 3.3 can do
  • Cost optimisation - when to use a small model and when a large one
  • Latency requirements - real-time vs batch processing
  • Context windows - how much data the model accepts (128k, 200k, 1M tokens)
  • Structured outputs - which models give the best JSON
  • Multi-modal capabilities - vision, audio (if needed)

Example from practice:

TaskModelWhy
Code reviewClaude 3.7 SonnetExcellent reasoning, 200k context
Customer support (triage)GPT-4o miniCheap, fast, sufficient
Legal document analysisGPT-4oBest for complex documents
Data extraction from PDFClaude 3.5 HaikuVision + speed + cost
Summarisation (batch)Llama 3.3 70B (self-hosted)Zero API cost

Key skill: Routing - agent decides which model to use for which sub-task.

7. Monitoring and Observability - seeing what the agent does

What it is: Tools and practices allowing you to track, debug and optimise agent operation in production.

Why it’s crucial: An agent in production is a black box. Without observability you don’t know why the agent failed, how much it costs, where the bottlenecks are.

What to learn:

  • Distributed tracing - tracking entire flow request → response
  • Token usage tracking - how much individual operations cost
  • Latency metrics - where delays are
  • Error tracking - error classification (LLM error vs API error vs logic error)
  • Quality metrics - accuracy, hallucination rate, user satisfaction
  • A/B testing - comparing different prompts/models

Example from practice: Dashboard for agent in production (5000 req/day):

Metrics:

  • Avg latency: 3.2s (target <5s) ✅
  • Avg cost per request: $0.08 (target <$0.10) ✅
  • Error rate: 2.1% (target <5%) ✅
  • User satisfaction (thumbs up): 87% (target >85%) ✅

Alert: Spike in hallucination rate (8% → 15%) at weekend. Root cause: Wikipedia API was down, agent didn’t have current data and hallucinated. Fix: adding fallback source.

Tools:

  • LangSmith (LangChain)
  • Weights & Biases (W&B)
  • Helicone (LLM observability)
  • Arize AI (ML observability)

Which frameworks and tools are worth knowing?

The AI Agents ecosystem is developing rapidly. Here are the most mature tools and frameworks worth knowing in 2026:

LangChain - the Swiss Army knife of agents

What it is: The most popular framework for building LLM applications and agents. Python and TypeScript.

When to use:

  • RAG applications (document Q&A)
  • Simple to medium complexity agents
  • Prototyping and MVP
  • When you need a large integration ecosystem (200+ integrations out-of-the-box)

Strengths:

  • Huge community
  • Plenty of examples and tutorials
  • LangSmith (monitoring)
  • LangGraph (orchestration)

Weaknesses:

  • API changes frequently (breaking changes)
  • Abstractions sometimes too heavy
  • Performance overhead

Use case: Agent for analysing legal documents in a law firm. RAG on 10k+ documents, semantic search, citation tracking.

CrewAI - multi-agent for teams

What it is: Framework focused on multi-agent collaboration. Agents as “crew members” with different roles.

When to use:

  • Multi-agent systems (3+ agents)
  • Complex workflows requiring specialisation
  • Projects where you need “teamwork” of agents

Strengths:

  • Role-based design (intuitive)
  • Built-in collaboration patterns
  • Task delegation
  • Memory between agents

Weaknesses:

  • Younger ecosystem
  • Fewer integrations than LangChain

Use case: Content marketing system (strategist + researcher + writer + editor). Each agent = role in team.

AutoGen - conversational agents from Microsoft

What it is: Framework from Microsoft Research. Agents converse with each other to achieve goal.

When to use:

  • Agents that must negotiate / debate
  • Code generation (agent writes code → second tests → first corrects)
  • Research tasks (multiple perspectives)

Strengths:

  • Excellent for code generation
  • Human-in-the-loop patterns
  • Mature research backing (Microsoft Research)

Weaknesses:

  • Documentation could be better
  • Sometimes agents talk “too much” (cost overhead)

Use case: Agent for automatically fixing bugs. Agent 1 analyses bug, Agent 2 proposes fix, Agent 3 writes test, Agent 4 reviews.

LangGraph - state machines for agents

What it is: Part of LangChain, but deserves separate mention. Graph-based orchestration.

When to use:

  • Complex agent workflows with branching logic
  • When you need full control over flow
  • State management between steps
  • Production-grade reliability

Strengths:

  • Deterministic flow (not like conversational agents)
  • Debugging is easier (you see the graph)
  • Conditional routing
  • Checkpointing (you can resume agent from any step)

Weaknesses:

  • Steep learning curve
  • Requires thinking in terms of graphs

Use case: Agent for employee onboarding. Multi-step workflow: data collection → account creation → access configuration → equipment dispatch → progress tracking. Each step = node in graph. Conditional routing if something fails.

Anthropic Claude API - direct approach

What it is: Native API from Anthropic. No framework. Direct HTTP calls.

When to use:

  • Maximum control
  • Performance-critical applications
  • Don’t need abstractions
  • Already have your own orchestration layer

Strengths:

  • Zero overhead
  • Latest features first (tool use, prompt caching, extended thinking)
  • Best-in-class documentation
  • Excellent reasoning

Weaknesses:

  • Must build more yourself (retry logic, error handling, observability)

Use case: High-frequency trading bot. Latency critical, you need control over every millisecond.

OpenAI Assistants API - managed agent

What it is: Fully managed agent service from OpenAI. You create agent through API, OpenAI manages execution.

When to use:

  • Want to deploy quickly without building infrastructure
  • Need Code Interpreter (agent executes Python code in sandbox)
  • File retrieval out-of-the-box
  • Don’t want to manage state

Strengths:

  • Managed service (less DevOps)
  • Code Interpreter
  • Built-in file handling
  • Persistent threads

Weaknesses:

  • Less control (black box)
  • Vendor lock-in
  • No self-hosting

Use case: Internal tool for data analysts. Agent analyses CSV/Excel, generates charts, answers questions. Code Interpreter does heavy lifting.

Summary: What to choose?

You needFramework
Prototype in 2 daysLangChain + Claude API
Production RAGLangChain + Pinecone + LangSmith
Multi-agent workflowCrewAI or LangGraph
Complex state machineLangGraph
Code generationAutoGen
Maximum controlDirect API (Claude/GPT-4o)
Quick MVP without infraOpenAI Assistants API

Pro tip: Start with a framework (LangChain), learn concepts, build MVP. As project matures and you need more control - migrate to direct API or your own orchestration layer. 80% of projects never need this.

What does agentic system architecture look like in enterprise?

Implementing AI agents in enterprise isn’t “adding a chatbot to the website”. It’s a distributed architecture with many components. Here’s a reference design based on real deployments:

Single-Agent Architecture - for simpler use cases

Components:

User Request

API Gateway (FastAPI/Express)

Agent Orchestrator

┌─────────────────┐
│   LLM Agent     │
│  (Claude 3.7)   │
└────────┬────────┘

    ┌────┴─────┬──────────┬─────────┐
    ↓          ↓          ↓         ↓
Vector DB   Tools     Memory    Guardrails
(Pinecone) (APIs)   (Redis)   (validation)
    ↓          ↓          ↓         ↓
    └──────────┴──────────┴─────────┘

            Response + Logs

            Observability
        (LangSmith/Helicone)

Example: Customer support agent.

Flow:

  1. User asks question
  2. Agent searches knowledge base (Vector DB)
  3. If doesn’t find - uses Tool: search_past_tickets(query)
  4. Generates response
  5. Guardrails check for PII leaks
  6. Response returns to user
  7. Feedback (thumbs up/down) saved to Memory

Infrastructure:

  • API: FastAPI on AWS Lambda
  • Vector DB: Pinecone (managed)
  • Memory: Redis (ElastiCache)
  • LLM: Claude 3.7 Sonnet (Anthropic API)
  • Monitoring: LangSmith

SLA: 95% requests <3s, 99.9% uptime, cost <$0.10 per request.

Multi-Agent Architecture - for complex workflows

Components:

User Request

Orchestrator (LangGraph)

┌─────────┬──────────┬──────────┬─────────┐
│ Agent 1 │ Agent 2  │ Agent 3  │ Agent N │
│(Planner)│(Executor)│(Reviewer)│  (...)  │
└────┬────┴────┬─────┴────┬─────┴────┬────┘
     │         │          │          │
     └─────────┴──────────┴──────────┘

            Shared Context
          (State Management)

          Message Queue (RabbitMQ)

    ┌──────────────┴──────────────┐
    ↓                              ↓
Tools & Services          Vector Stores
(APIs, DBs, etc.)         (Knowledge)

Example: Automated incident response in DevOps.

Agents:

1. Detector Agent:

  • Monitors metrics (Prometheus)
  • Detects anomalies
  • Creates initial incident report

2. Diagnostic Agent:

  • Analyses logs (Elasticsearch)
  • Correlates with previous incidents
  • Generates hypothesis

3. Remediation Agent:

  • Proposes fix (restart service, scale up, rollback deploy)
  • Executes if confidence >80%
  • Escalates to human if confidence <80%

4. Communication Agent:

  • Updates status page
  • Notifies Slack/PagerDuty
  • Generates post-mortem draft

Orchestration (LangGraph):

Detector → Diagnostic → Remediation → Communication
             ↓ (if high severity)
        Human Approval

Infrastructure:

  • Orchestrator: Kubernetes Job
  • Agents: Separate containers
  • Message Queue: RabbitMQ
  • State: PostgreSQL
  • Observability: Grafana + LangSmith

Human-in-the-Loop (HITL) - critical for enterprise

Why: Enterprise can’t afford full autonomy in all cases. Regulations, risk management, legal require human oversight.

When needed:

  • High-stakes decisions (e.g. financial transactions >X PLN)
  • Regulated industries (banking, healthcare, legal)
  • Low confidence predictions (agent isn’t sure)
  • Novel situations (agent never saw this)

Implementation:

Pattern 1: Approval Gate Agent prepares action → waits for human approval → executes.

Pattern 2: Review Loop Agent executes → human review after the fact → feedback for learning.

Pattern 3: Confidence Threshold If confidence >90% → auto-execute. If <90% → human review.

Example: Agent for approving refunds in e-commerce.

  • Refund <PLN 100 → auto-approve (confidence usually >95%)
  • Refund PLN 100-500 → human review if confidence <90%
  • Refund >PLN 500 → ALWAYS human approval

Tools: Retool, Internal Admin Panel, Slack bot with approve/reject buttons.

Security and Compliance - cannot be omitted

Enterprise requirements:

1. Data privacy (GDPR):

  • PII must be masked before sending to LLM API
  • Or: self-hosted model (Llama 3.3 on own infra)
  • Audit log who when what data accessed

2. Access control:

  • Agent cannot have access to all data
  • Role-based access (RBAC)
  • Agent for HR shouldn’t see financial data

3. Rate limiting:

  • Protection against runaway agents (agent in loop calling API 1000x/sec)
  • Circuit breakers
  • Timeouts

4. Audit trail:

  • Every agent decision logged
  • Immutable log (append-only)
  • Retention compliant with regulations (7 years in financial services)

5. Disaster recovery:

  • What if LLM API is down? (fallback: human queue)
  • What if agent did something destructive wrong? (rollback mechanisms)

Reference architecture for highly regulated industry (banking):

  • Self-hosted Llama 3.3 70B (on-premise, zero data leakage)
  • All PII encrypted at rest and in transit
  • Agent has read-only access to production DB (queries through read replica)
  • Write operations ONLY through approval queue
  • 100% audit coverage
  • Incident response playbook

What roles in the team are needed?

Implementing AI Agents requires new roles or expanding existing ones. Here’s a typical agentic team composition in enterprise:

1. AI/ML Engineer - builder

Responsibility:

  • Agent implementation
  • Integration with LLM APIs
  • RAG pipelines
  • Tool development

Tech stack: Python, LangChain, FastAPI, Vector DBs, LLM APIs

Profile: Senior developer with ML/AI experience, understands how LLMs work, can debug non-deterministic systems.

How many you need: 2-3 on team of 10.

2. Prompt Engineer - programming with natural language

Responsibility:

  • Designing system prompts
  • Cost optimisation (token usage)
  • Testing different prompting strategies
  • Documenting prompt libraries

Tech stack: LLM APIs, prompt evaluation frameworks, spreadsheets

Profile: Technical writer + data mindset + understanding of LLM capabilities. Doesn’t have to code, but should understand code.

How many you need: 1 for 3-4 agents.

Note: In many teams role overlaps with AI Engineer. In initial phase you can go without dedicated Prompt Engineer. When you have 10+ agents in production - you need one.

3. Data Engineer - knowledge infrastructure

Responsibility:

  • Building data pipelines for RAG (ETL from various sources to vector DB)
  • Data quality for training data
  • Embeddings infrastructure
  • Vector DB management

Tech stack: Airflow, DBT, SQL, Python, Vector DBs

Profile: Classic Data Engineer, but understands AI specifics (embeddings, chunking strategies).

How many you need: 1-2 for larger project (if RAG is core).

4. MLOps Engineer - production and monitoring

Responsibility:

  • Agent deployment
  • CI/CD for ML
  • Monitoring (observability)
  • Scaling, performance optimisation
  • Incident response

Tech stack: Kubernetes, Docker, Terraform, Prometheus, Grafana, LangSmith

Profile: DevOps Engineer with ML systems experience. Understands that ML != traditional software (non-deterministic, data dependencies).

How many you need: 1 MLOps for 5-10 AI Engineers (ratio like in traditional DevOps).

5. Product Manager - value owner

Responsibility:

  • Defining use cases
  • Feature prioritisation
  • Success metrics (what does it mean that agent works well?)
  • User research (how people use agent)

Tech stack: Analytics (Mixpanel), user research tools, basic understanding of AI capabilities

Profile: Product Manager with technical curiosity. Must understand what’s possible with LLM (and what isn’t). Doesn’t have to code.

How many you need: 1 PM can handle 2-3 agent projects in parallel.

6. Domain Expert - the one who knows HOW

Responsibility:

  • Defining business logic
  • Validating agent outputs
  • Training data labelling
  • Feedback loops

Tech stack: Domain knowledge, spreadsheets, annotation tools

Profile: Expert in the field (e.g. if agent does legal research → lawyer, if customer support → support lead). Doesn’t have to know AI, but must want to collaborate with AI.

How many you need: 1 Domain Expert per agent (part-time, 20-40% time).

Example: Agent for code review. Domain Expert = senior developer who defines what “good code” means and validates whether agent reviews correctly.

7. Security/Compliance Officer - risk guardian

Responsibility:

  • Security review of agents
  • Compliance (GDPR, NIS2, industry regulations)
  • Risk assessment
  • Incident response for security issues

Tech stack: Security tools, audit logs, compliance frameworks

Profile: Security Engineer or Compliance Specialist who understands AI specifics (data leakage via prompts, adversarial attacks, jailbreaking).

How many you need: 1 for organisation (if you have several agent projects). Involvement grows with industry regulation.

Typical team for medium-sized agent project (3-6 months):

  • 1 Product Manager (50% time)
  • 2 AI/ML Engineers (full-time)
  • 1 Data Engineer (50% time, if RAG)
  • 1 MLOps Engineer (30% time, shared resource)
  • 1 Domain Expert (30% time)
  • 1 Security review (10% time, at start and before go-live)

Total headcount: ~3.5 FTE.

Use cases of AI agents in companies - specific examples

AI Agents aren’t theory. In 2026 we have hundreds of production deployments. Here are real use cases from Polish and global companies:

1. Customer Support Automation - classic that works

Company: E-commerce scale-up (10M+ users)

Problem:

  • 5000 tickets daily
  • 50% tickets are repetitive questions (where’s my order? how to return? change details)
  • Support team (40 people) can’t keep up

Solution - Agent:

  • Integration with Zendesk (reads ticket)
  • RAG on knowledge base (1000+ articles)
  • Access to systems (order management, CRM, logistics APIs)
  • Responds autonomously to Level 1 queries
  • Escalates to human if doesn’t know

Results (after 6 months):

  • 65% tickets resolved by agent (zero human touch)
  • Avg resolution time: 2 minutes (was: 4 hours)
  • CSAT: 4.2/5 (agent) vs 4.5/5 (human) - small difference
  • Savings: 25 FTE (~PLN 2.5M annually)
  • Support team now focuses on complex issues and proactive outreach

Tech stack: LangChain, GPT-4o mini, Pinecone, Zendesk API.

2. Code Review Assistant - Dev Productivity development

Company: Software house (200 devs)

Problem:

  • Code review bottleneck (PRs wait 2-3 days)
  • Juniors don’t know what to check in review
  • Repetitive comments (naming, style, security basics)

Solution - Agent:

  • Triggered by GitHub webhook (new PR)
  • Analyses diff
  • Runs SAST (Semgrep)
  • Checks test coverage
  • Semantic search in codebase (did someone solve similar problem)
  • Leaves comments in PR with suggestions

Results:

  • 90% PRs get initial review in <5 minutes (was: 2 days)
  • 40% PRs agent approves without human review (low-risk changes)
  • Human reviewers focus on architecture and business logic
  • Onboarding juniors faster (agent teaches through feedback)

Interesting insight: Agent didn’t replace human review. Changed their role from “find typo and security bug” to “is this good design?”.

Tech stack: AutoGen, Claude 3.7 Sonnet, GitHub API, Semgrep.

3. Financial Document Analysis - AI in banking

Company: Bank (top 5 in Poland)

Problem:

  • Credit documentation analysis (income statements, tax returns) takes analysts 2-4 hours per case
  • 10k+ credit applications monthly
  • Manual errors (20% cases require re-review)

Solution - Agent (highly regulated):

  • Self-hosted Llama 3.3 70B (on-premise, GDPR compliance)
  • Data extraction from PDF (OCR + LLM)
  • Cross-verification (does income from different documents match)
  • Risk scoring
  • Draft recommendation (approve / reject / needs human review)

Results:

  • 70% cases agent does initial analysis in <10 minutes
  • Human analyst review takes 30 minutes (was: 2-4h)
  • Error rate: 5% (was: 20%)
  • Throughput increased 3x without increasing team size

Compliance:

  • 100% audit trail
  • Human always makes final decision (agent = recommendation only)
  • Quarterly review of agent decisions by Compliance

Tech stack: Llama 3.3 70B (self-hosted), LangChain, PostgreSQL, custom OCR.

4. HR Onboarding Automation - employee experience

Company: Tech corp (2000 employees)

Problem:

  • Onboarding new employee = 40 tasks (creating accounts, equipment dispatch, training, compliance)
  • HR coordinators spend 5h per new hire on tasks
  • New employees wait weeks for full access

Solution - Multi-Agent System:

Agent 1 - Coordinator: Manages workflow (LangGraph state machine).

Agent 2 - Account Creator: Creates accounts (Okta, Google Workspace, GitHub, Slack).

Agent 3 - Equipment Manager: Orders laptop, monitor, accessories (integration with procurement system).

Agent 4 - Trainer: Sends training materials, tracks completion, reminds.

Agent 5 - Compliance: Collects signatures on NDAs, GDPR consents, security policies.

Results:

  • Time-to-productivity: 3 days (was: 14 days)
  • HR coordinators time: 30 minutes per hire (was: 5h)
  • Employee satisfaction +40%
  • Zero missed compliance docs (was: 15% onboardings had gaps)

Human-in-the-loop: HR coordinator approves plan at start. Then agent executes autonomously.

Tech stack: LangGraph, CrewAI, Workday API, Okta API, custom integrations.

5. Sales Intelligence - AI for B2B sales

Company: B2B SaaS (ARR $50M)

Problem:

  • SDRs spend 60% time on research (LinkedIn, company websites, news)
  • Outreach is generic (low response rate: 3%)
  • Hard to scale sales without 2x team size

Solution - Agent:

  • Daily scan target accounts (Crunchbase, LinkedIn, Google News)
  • Identifies trigger events (funding, hiring, product launch, leadership change)
  • Generates personalised outreach (email + LinkedIn message)
  • Enriches CRM (Salesforce) with research notes
  • Proposes next steps for SDRs

Results:

  • Response rate: 3% → 12% (4x improvement!)
  • SDRs focus on conversations, not research
  • Pipeline increased 2.5x without increasing SDR headcount
  • Deal size higher (better targeted accounts)

Interesting insight: Agent doesn’t send emails autonomously. SDR always reviews and approves (HITL). But draft quality is so good that in 80% SDR clicks “send” without editing.

Tech stack: LangChain, GPT-4o, Apollo.io API, Salesforce API, LinkedIn Sales Navigator.

6. IT Ops - automated incident response

Company: Fintech (payment processing)

Problem:

  • Incident response: detection → diagnosis → fix takes 30-90 minutes
  • SLA: 99.95% uptime (max 4h downtime annually)
  • On-call engineers burned out (alerts at night)

Solution - Agent (see: multi-agent architecture above):

Agents: Detector → Diagnostic → Remediation → Communication.

Autonomous actions:

  • Restart unhealthy pod
  • Scale up if high load
  • Clear cache if memory issue
  • Rollback deploy if post-deploy spike in errors

Human escalation:

  • Unknown issue
  • Fix requires access to production DB
  • Severity = critical (payment processing down)

Results:

  • 60% incidents resolved autonomously in <5 minutes
  • MTTR (Mean Time To Resolution): 10 minutes (was: 45 minutes)
  • On-call alerts down 70% (fewer night wake-ups)
  • SLA achievement: 99.98% (was: 99.94%)

Trust building: Agent operated for 3 months in “recommendation mode” (suggested fix, human executed). After demonstrating 95% accuracy - enabled auto-remediation for low-risk issues.

Tech stack: LangGraph, Claude 3.5 Sonnet, Kubernetes API, Prometheus, PagerDuty.

How to start - roadmap for implementing AI agents in organisation

Implementing AI Agents in enterprise is 6-12 months from POC to production. Here’s a proven roadmap:

Phase 1: Discovery and Education (month 1-2)

Goals:

  • Education of team leads and decision makers
  • Identification of high-impact use cases
  • Assessment of current competencies
  • Selection of first pilot project

Activities:

1.1 Leadership workshops (1 day):

  • What are AI Agents (vs chatbots, vs RPA)
  • Use cases from industry
  • Costs, timeline, ROI expectations
  • Risk assessment

1.2 Technical assessment (1 week):

  • Audit of current stack (do you have data infrastructure?)
  • Inventory of available data (knowledge bases, DBs, APIs)
  • Security/compliance requirements
  • Budget for LLM APIs and tools

1.3 Use case brainstorming (2 weeks): Collect use cases from different departments. Score according to:

  • Business impact (how much does it cost today? how much will we save?)
  • Technical feasibility (do we have data? do we have APIs?)
  • Risk (what happens if agent makes mistake?)
  • Timeline (MVP in 3 months possible?)

Scoring framework:

Score = (Impact × Feasibility) / Risk

Impact: 1-10 (ROI, user satisfaction)
Feasibility: 1-10 (technical complexity)
Risk: 1-10 (failure consequences)

Select top 3 use cases.

1.4 First project selection:

Ideal first project characteristics:

  • Medium impact (valuable enough to get budget)
  • High feasibility (you want quick win)
  • Low risk (failure won’t sink project)
  • Repeatability (success can scale)

Examples of good first projects:

  • Internal IT helpdesk (low risk, easy to measure ROI)
  • Content summarisation (low risk, immediate value)
  • Data extraction from documents (clear success metric)

Bad first projects:

  • Customer-facing chatbot (high visibility, high risk if fails)
  • Financial decision automation (regulatory concerns)
  • Anything “strategic” (too many expectations, too many stakeholders)

Phase 1 Deliverable: 2-3 page project brief for pilot: use case, team, timeline (3 months), budget, success metrics.

Phase 2: Team Upskilling (month 2-3, parallel with Phase 1)

Goals:

  • Team has AI Agents fundamentals (prompt engineering, RAG, tool use)
  • Dev environment setup
  • Tech stack selection

Activities:

2.1 Team training (3-5 days intensive):

Day 1: LLM Fundamentals

  • How LLMs work (transformers, tokens, context window)
  • Capabilities and limitations
  • Prompt engineering basics
  • Hands-on: first prompts in Anthropic/OpenAI playground

Day 2: RAG (Retrieval-Augmented Generation)

  • Embeddings and vector search
  • Chunking strategies
  • Vector databases (Pinecone, Weaviate)
  • Hands-on: build simple RAG Q&A

Day 3: Agents and Tool Use

  • What is agent (vs model)
  • Tool use / function calling
  • Orchestration patterns
  • Hands-on: agent that calls APIs

Day 4: Frameworks (LangChain / CrewAI / LangGraph)

  • Framework overview
  • When to use which
  • Hands-on: rebuild Day 2-3 examples in LangChain

Day 5: Production (Monitoring, Safety, HITL)

  • Observability (LangSmith, Helicone)
  • Guardrails
  • Human-in-the-loop patterns
  • Hands-on: deploy agent with monitoring

Format: Hands-on, code-first. Zero slides. Trainer who built agents in production.

2.2 Dev environment setup:

  • Accounts in Anthropic/OpenAI (API keys)
  • Vector DB trial (Pinecone free tier)
  • LangSmith for monitoring
  • GitHub repo + CI/CD

2.3 Tech stack decision: Based on use case choose:

  • LLM provider (Claude, GPT, Gemini, self-hosted)
  • Framework (LangChain, CrewAI, LangGraph, direct API)
  • Vector DB (if RAG)
  • Monitoring (LangSmith, Helicone, W&B)
  • Deployment (Lambda, Kubernetes, modal.com)

Phase 2 Deliverable: Team knows basics, has working dev environment, first version of tech stack selected.

Phase 3: MVP Development (month 3-4)

Goals:

  • Working prototype of first use case
  • Validate technical feasibility
  • Early user feedback

Activities:

3.1 Sprint 1 (2 weeks): Core Agent

  • Core logic implementation
  • Integration with LLM API
  • Basic tool use (1-2 tools)
  • Happy path works

3.2 Sprint 2 (2 weeks): RAG (if needed)

  • Data preparation (chunking)
  • Vector DB setup
  • Retrieval pipeline
  • Testing retrieval quality

3.3 Sprint 3 (2 weeks): Error Handling + Monitoring

  • Retry logic, timeouts
  • Guardrails (basic)
  • Logging, observability
  • Error handling for unhappy paths

3.4 Sprint 4 (2 weeks): User Testing

  • Deploy to staging
  • 5-10 internal users test
  • Collect feedback
  • Iterate based on feedback

MVP success metrics:

  • Agent solves 50%+ test cases correctly
  • Latency <10s (target for production: <5s)
  • Zero critical bugs
  • User feedback >3.5/5

Phase 3 Deliverable: Working MVP, user tested, ready for limited beta.

Phase 4: Beta Deployment (month 5-6)

Goals:

  • Deploy to production (limited users)
  • Monitoring in real-world conditions
  • Collect metrics for ROI

Activities:

4.1 Production Deployment:

  • Infrastructure setup (if Kubernetes - helm charts, if serverless - Lambda/Cloud Run)
  • Security review (penetration testing, compliance check)
  • HITL setup (approval workflows if needed)
  • Runbook for on-call (what to do if agent fails)

4.2 Beta Programme (4-6 weeks):

  • 20-30% users (or 1 department)
  • Daily monitoring dashboards
  • Weekly review meetings
  • Rapid iteration based on production data

Key metrics to track:

  • Usage (requests per day)
  • Success rate (% resolved without escalation)
  • Latency (p50, p95, p99)
  • Cost per request
  • User satisfaction (CSAT, NPS)
  • Error rate, error types

4.3 Incident Response:

  • On-call rotation (team available if agent fails)
  • Rollback plan (if something very wrong - how to turn off agent and return to manual process)

Success criteria for moving to Full Deployment:

  • Success rate >70% (or other target metric for use case)
  • CSAT >4/5
  • Zero critical bugs in last 2 weeks
  • Cost per request in budget
  • Stakeholder approval

Phase 4 Deliverable: Agent in production (beta), metrics tracked, stakeholder confidence it works.

Phase 5: Scale & Optimise (month 7-12)

Goals:

  • 100% rollout
  • Cost optimisation
  • Next use cases

Activities:

5.1 Full Rollout (month 7-8):

  • Gradual traffic increase (20% → 50% → 100%)
  • Communication to users (what’s changing, how to use)
  • Training materials (if HITL)

5.2 Cost Optimisation (month 8-9):

  • Profiling: where are we spending most tokens?
  • Prompt optimisation (shorter = cheaper)
  • Model selection (is GPT-4o mini sufficient instead of GPT-4o?)
  • Caching (prompt caching, Anthropic extended thinking)
  • Batch processing (where real-time not required)

Target: Reduce cost per request by 30-50%.

5.3 Feature Expansion (month 9-10):

  • Additional tools
  • Multi-lingual support
  • Advanced guardrails
  • Better UX

5.4 Next Use Cases (month 10-12):

  • Apply learnings from first project
  • Pilot 2-3 next use cases
  • Build internal “AI Agent Platform” (shared infrastructure for next agents)

Phase 5 Deliverable: First agent fully deployed, cost-optimised, production-grade. 2-3 next use cases in pipeline.

Mistakes to avoid

1. Starting with too ambitious use case First project = proof of concept for organisation. Choose something realistically achievable in 3-4 months.

2. Lack of Domain Expert in team AI Engineer doesn’t know how your process should work. You need domain expert who knows.

3. Ignoring LLM APIs costs Prototype: $50/month. Production (10k requests/day): $5k/month. Plan budget.

4. Lack of HITL for high-risk cases Full autonomy sounds nice, but enterprise can’t afford risk. Start with HITL, gradually increase autonomy.

5. Premature multi-agent architecture Start with single agent. Multi-agent only when you really need it.

6. Lack of observability from Day 1 You can’t debug what you don’t see. Setup LangSmith/Helicone immediately.

How EITT prepares teams to work with AI Agents

At EITT we understand that AI Agents aren’t just new technology - they’re a new way of thinking about IT systems. Since 2024 we’ve been running dedicated training programmes preparing teams to implement AI agents in organisations.

Our training programmes for AI Agents

1. AI Agents Fundamentals (3 days, hands-on)

For whom: Developers, Tech Leads, Architects who want to start with AI Agents.

Programme:

  • Day 1: LLM fundamentals + Prompt Engineering
  • Day 2: RAG (Retrieval-Augmented Generation)
  • Day 3: Agents, Tool Use, Orchestration

Format: 70% practice, 30% theory. Each participant builds working agent from scratch.

After training you can:

  • Design and implement simple agent
  • Build RAG system for company knowledge
  • Integrate agent with external APIs (tool use)
  • Choose appropriate framework (LangChain vs CrewAI vs direct API)

2. Production AI Agents (2 days, advanced)

For whom: Teams that already have prototype and want to deploy to production.

Programme:

  • Monitoring and observability (LangSmith, Helicone)
  • Guardrails and safety
  • Cost optimisation
  • Human-in-the-loop patterns
  • Multi-agent architectures (LangGraph, CrewAI)
  • Security and compliance

Format: Workshop based on your use case. Bring your code - leave with production deployment plan.

3. LLM and Agent Architectures (2 days)

For whom: Architects, Senior Engineers designing agentic systems.

Programme:

  • Single-agent vs multi-agent architectures
  • State management and orchestration
  • Distributed tracing
  • Scaling patterns (how to handle 100k requests/day)
  • Self-hosted vs API-based LLMs (trade-offs)
  • Reference architectures from enterprise deployments

Format: Design workshops + case studies from production.

4. Prompt Engineering Masterclass (1 day)

For whom: Technical writers, Product Managers, AI Engineers who want to master prompt engineering.

Programme:

  • System prompts design
  • Few-shot learning
  • Chain of Thought (CoT)
  • ReAct pattern
  • Structured outputs
  • Prompt evaluation and testing

Format: Hands-on in playground (Anthropic Console, OpenAI Playground). Each participant leaves with library of own prompt templates.

5. Python for AI/ML (5 days, bootcamp)

For whom: Developers from other languages (Java, C#, JS) who want to jump to AI/ML ecosystem (Python).

Programme:

  • Python basics (quick refresher/learning)
  • Key libraries: NumPy, Pandas, Requests
  • Working with APIs
  • Async Python (for agent systems)
  • LangChain hands-on

Format: Bootcamp. Intensive 5 days, lots of code.

Why EITT?

500+ experts, 2500+ trainings - we know how to teach IT.

Our AI/ML trainings aren’t theory from slides. They’re hands-on workshops led by practitioners who implement AI Agents in production companies.

Trainers with experience:

  • Built production AI agents for e-commerce, fintechs, enterprise
  • Publish in industry blogs and at conferences (PyData, AI Devs)
  • Know not only “how” but “why” (and when NOT to use agents)

Adaptation to your stack: Don’t have stack yet? We’ll help choose (Claude vs GPT-4o, LangChain vs CrewAI). Already have stack? We’ll adapt training to your choice.

Closed programmes for companies: Need training for whole team (5-15 people)? We do closed trainings:

  • Customised programme (your use case, your stack)
  • At your office or online
  • Possibility of code review of your prototype
  • Follow-up consultations (after training, when you implement)

Post-training support: After training we don’t leave you alone. You have access to:

  • Slack community (alumni of our trainings + trainers)
  • Office hours (2x month, Q&A with trainer)
  • Updated materials (because AI changes every month)

Rating 4.8/5 from 2500+ participants.

Don’t believe? Check references from ING, mBank, PKO BP, Allegro - companies that trained their AI/ML teams with us.

FAQ - frequently asked questions about AI Agents

1. Can AI Agents replace human workers?

Short answer: Won’t replace, but will change nature of work.

Long answer: AI Agents are excellent at:

  • Repetitive tasks (answering same questions 1000 times)
  • Data processing (document analysis, information extraction)
  • First-level triage (customer support L1, incident classification)

People are better at:

  • Complex reasoning requiring deep business context
  • Empathy and emotional intelligence (customer success for high-value clients)
  • Creative problem solving (new problems agent never saw)
  • Strategic thinking

Reality: 50-person team with agents can deliver output like 80-person team without agents. But you won’t reduce team to 10 people. Agents increase leverage, don’t eliminate people.

Example from practice: Bank implemented agent for credit analysis. Didn’t fire analysts. But analyst who analysed 5 cases daily now analyses 15 (agent does initial pass, analyst does final review). Bank 3x increased throughput without 3x increasing team.

2. How much does implementing AI Agent in company cost?

Depends on complexity, but here are ballpark numbers:

Small project (simple single-agent, e.g. internal FAQ bot):

  • Development: 3 months × 2 FTE = ~PLN 300k
  • LLM API costs: $500-2000/month (depends on traffic)
  • Infrastructure: $200-500/month
  • Total first year: ~PLN 400k

Medium project (RAG-based agent with integrations, e.g. customer support):

  • Development: 6 months × 3-4 FTE = ~PLN 900k
  • LLM API costs: $3k-10k/month
  • Infrastructure: $1k-2k/month
  • Total first year: ~PLN 1.2M

Large project (multi-agent system, enterprise-grade):

  • Development: 12 months × 5-7 FTE = ~PLN 2.5M
  • LLM API costs: $10k-50k/month (depends on scale)
  • Infrastructure: $5k-15k/month
  • Total first year: ~PLN 3.5-4M

ROI: Customer support agent (medium project, PLN 1.2M):

  • Savings: 20 FTE support × PLN 100k = PLN 2M annually
  • ROI: PLN 2M - PLN 1.2M = PLN 800k net in first year
  • Payback period: 7-8 months

Note: These are ballparks. Real cost depends on: use case complexity, existing infrastructure (do you already have vector DB?), team velocity, vendor choices (self-hosted Llama = $0 API cost, but requires infra).

3. Do we need our own model (fine-tuning) or are public APIs sufficient (OpenAI, Anthropic)?

For 90% use cases: public APIs sufficient.

When public API (Claude, GPT-4o, Gemini):

  • Don’t have millions of training data examples
  • Use case isn’t super niche
  • Can send data to API (compliance allows)
  • Want to start quickly (setup in hours, not months)
  • Don’t want to manage ML infrastructure

When fine-tuning:

  • Have very niche domain (e.g. medical imaging, legal for specific market)
  • Have 10k-100k labelled examples
  • Public model doesn’t give accuracy you need (even after good prompt engineering)
  • ROI justifies cost (fine-tuning = $50k-500k depending on scale)

When self-hosted model (Llama 3.3, Mistral):

  • Strict compliance (e.g. healthcare, banking in some countries - data cannot leave infrastructure)
  • Very high volume (millions of requests/day - self-hosting can be cheaper)
  • Need full control over model (custom modifications)

Recommendation: Start with public API (Claude 3.7 Sonnet or GPT-4o). As project matures and you have data - consider fine-tuning. Self-hosting only if compliance requires or volume justifies.

Case study: Fintech wanted fine-tuned model for fraud detection. Started with GPT-4o + good prompt engineering. Accuracy: 82%. After 6 months had 50k labelled examples. Fine-tuned GPT-4o. Accuracy: 89%. Fine-tuning cost: $80k. Benefit: $2M annually reduced fraud. ROI clear.

4. How long does implementing first agent take?

Quick answer: 3-6 months from decision to production.

Breakdown:

Month 1-2: Discovery + Team Upskilling

  • Use case identification
  • Team training (AI Agents fundamentals)
  • Tech stack decision

Month 3-4: MVP Development

  • Core agent implementation
  • Integration with systems
  • Initial testing

Month 5-6: Beta + Production Deployment

  • Deploy to limited users
  • Monitoring, iteration
  • Full rollout

Fast track (if already have AI-savvy team): 2-3 months possible for simple use case.

Realistic for enterprise: 6-9 months (more stakeholders, security reviews, compliance).

Factors that extend:

  • Lack of competencies in team (add +2 months for upskilling)
  • Complex integrations (legacy systems, no APIs)
  • High compliance requirements (healthcare, finance)
  • Consensus paralysis (too many stakeholders, decision-making drags)

Factors that accelerate:

  • Team with ML/AI experience
  • Modern stack (cloud-native, APIs everywhere)
  • Clear decision-making (empowered Product Manager)
  • Choice of simple first use case

5. What are biggest risks of implementing AI Agents?

Top 5 risks and how to mitigate them:

1. Hallucinations (agent makes up facts)

Risk: Agent sounds confident but provides incorrect information.

Mitigation:

  • RAG (agent responds based on retrieved docs, not from memory)
  • Citations (agent always provides source: “According to Policy Doc v2.3…”)
  • Fact-checking layer (second model checks output of first)
  • Human review for high-stakes decisions

2. Data leakage / Privacy violations (GDPR)

Risk: Agent accidentally reveals PII (personal data), trade secrets, or data from another user context.

Mitigation:

  • PII detection and masking (Presidio)
  • Access control (agent sees only data user has right to see)
  • Audit logs (who when what asked)
  • Data retention policies (automatic deletion after N days)
  • On-premise deployment if compliance requires

3. Cost runaway (agent costs more than budget)

Risk: Agent in loop calls LLM 1000x, prompt is too long, using expensive model where cheap suffices.

Mitigation:

  • Rate limiting (max X requests per user per minute)
  • Timeouts (agent must finish in Y seconds)
  • Budget alerts (AWS/GCP billing alerts)
  • Cost monitoring per request
  • Prompt optimisation (shorter prompts = cheaper)
  • Model routing (small model for easy tasks, large for hard tasks)

4. Agent does something destructive (deletes data, sends 1000 emails)

Risk: Bug in tool use logic, agent gets hallucination and calls delete API.

Mitigation:

  • Read-only tools for prototype (agent can only read, not write)
  • HITL for write operations (agent proposes, human approves)
  • Dry-run mode (agent simulates action, doesn’t execute)
  • Idempotency (if agent calls API 2x through retry - doesn’t do damage)
  • Undo mechanisms (like rollback in DB)

5. User trust / adoption (people don’t use agent)

Risk: Agent is technologically excellent, but users don’t like it / don’t trust it / prefer old way.

Mitigation:

  • User research BEFORE building (what do users need?)
  • Transparency (agent clearly says what it’s doing and why)
  • Escape hatch (user can always escalate to human)
  • Gradual rollout (start with early adopters, don’t force on everyone)
  • Show the value (dashboards showing how much time agent saved)

6. Does AI Agent require continuous monitoring and maintenance?

Yes, but less than you think.

What requires monitoring:

1. Quality metrics (ongoing):

  • Success rate (% queries resolved correctly)
  • User satisfaction (thumbs up/down)
  • Escalation rate (how often agent must escalate to human)

Frequency: Daily dashboard review (5 minutes). Deep dive weekly (1h).

2. Cost monitoring:

  • Token usage per request
  • Total monthly spend (LLM API + infrastructure)

Frequency: Weekly review. Alert if exceed budget threshold.

3. Errors and anomalies:

  • Error rate (% failed requests)
  • Latency spikes
  • New error types

Frequency: Real-time alerts (PagerDuty). Review daily.

4. Model updates:

  • Is new model version (GPT-4.5, Claude 4.0) better?
  • Does prompt need updating (model behaviour changed)?

Frequency: Quarterly (when providers release major updates).

5. Data drift:

  • Is agent responding based on outdated knowledge?
  • Are new products / policies in RAG knowledge base?

Frequency: Monthly refresh knowledge base.

Total time commitment:

  • Initial 3 months after launch: 20-30% of one FTE (tight monitoring, rapid iteration).
  • After stabilisation: 5-10% FTE (monitoring, occasional updates).

Comparison: Traditional software also requires maintenance (bug fixes, dependency updates). Agent maintenance is comparable, maybe slightly more due to non-deterministic nature.

7. Can small and medium companies afford AI Agents?

Absolutely yes.

Myth: AI Agents are only for FAANG and enterprise with million budgets.

Reality: In 2026 barrier to entry is low:

Costs for SME:

Setup:

  • LangChain / OpenAI API: Free tier sufficient for prototype
  • Vector DB: Pinecone free tier (1M vectors)
  • Deployment: Vercel / Railway / Render (free tier or $20-50/m)

Total prototype cost: $0-100.

Production (small scale - 100-500 users):

  • LLM API: $200-800/month
  • Vector DB: $100/month (Pinecone starter)
  • Hosting: $50-200/month
  • Total: $350-1100/month

Development:

  • Hire 1 freelance AI engineer (part-time, 3 months): PLN 60-100k
  • Or upskill own developer (training PLN 10k + 2 months learning on job)

Total investment: PLN 60-100k.

ROI examples for SME:

Example 1: Accounting office (20 people)

  • Agent for answering client questions (accounting, taxes)
  • Savings: 10h/week accountants time × PLN 150/h = PLN 1500/week = PLN 78k annually
  • Agent cost: PLN 70k (development) + PLN 12k (API costs first year) = PLN 82k
  • Break-even: 13 months. Then PLN 78k saving annually.

Example 2: E-commerce (PLN 5M GMV annually, 10 person team)

  • Agent for customer support (where’s package? how to return?)
  • Savings: 1 FTE customer support (PLN 100k annually)
  • Agent cost: PLN 80k (development) + PLN 15k (API + infra)
  • Break-even: 11 months.

Bottom line: If you see clear ROI (agent will save X hours of work), SME can afford it. Start small, measure ROI, scale gradually.

Next steps: Begin your journey with AI Agents

AI Agents aren’t science fiction. They’re tools that work in production in 2026 - in banks, e-commerce, software houses, fintechs. The question isn’t “whether”, but “when” your organisation will start using them.

If you’re Tech Lead or Engineering Manager: Your team needs these skills. Competitive pressure is growing. Companies that master AI Agents in 2026-2027 will have advantage for next 5 years.

If you’re CTO or IT Director: ROI is real. 60-80% savings in repetitive tasks, 3x productivity increase in many use cases. But requires investment in competencies and infrastructure. Start now to have first agent in production by end of 2026.

If you’re Developer: Learn these skills. Job market in 2-3 years will require knowledge of AI Agents like today it requires knowledge of APIs and cloud. This is your chance to upskill.

Ready to start?

Reserve place on training:

Check our AI Agents training programmes:

First dates already in March 2026. Places limited (max 12 people per group - hands-on requires small groups).

Need closed training for team?

Contact us: /en/contact

Tell us about your use case, we’ll select programme, prepare offer. 500+ experts, 2500+ conducted trainings, 4.8/5 rating. Trusted by ING, mBank, PKO BP, Allegro.

Want to learn more about AI Agents?

Subscribe to EITT newsletter - weekly latest insights from AI/ML, case studies, practical tips. Zero spam, only valuable content.

AI Agents are changing the rules of the game. Time to join the game.

Read Also

Develop Your Skills

This article is related to the training Introduction to Enterprise Architect. Check the program and sign up to develop your skills with EITT experts.

Read also

Request a quote

Develop Your Competencies

Check out our training and workshop offerings.

Request Training
Call us +48 22 487 84 90