AI Agents are no longer an experiment. In 2026, we’re seeing deployments in companies such as Salesforce, Microsoft, and Shopify. These aren’t just chatbots - they’re autonomous systems that execute real business tasks: analysing code, automating processes, responding to customers, generating reports.
For IT teams, this represents a fundamental paradigm shift. It’s no longer enough to know web frameworks or databases. You need to understand agentic system architecture, prompt engineering, orchestration, RAG (Retrieval-Augmented Generation), tool use and guardrails. These are new skills that 95% of IT teams in Poland don’t have in their portfolio today.
If you lead a technical team or are responsible for competency development in your organisation - this article will show you specifically what your people need to learn, which frameworks are worth knowing, and how to approach implementing AI agents in a practical and secure manner.
What are AI Agents and why does enterprise need them?
An AI Agent is an autonomous system based on a language model (LLM) that can:
- Plan - break down complex tasks into steps
- Use tools - call APIs, search databases, run scripts
- Make decisions - choose the next action based on context
- Iterate - repeat the cycle until the goal is achieved
- Report - inform the user about progress and results
Example: You ask an agent “Analyse the last 100 commits in the repository and find potential security issues”. The agent autonomously:
- Connects to GitHub API
- Fetches commits
- Analyses code from a security perspective
- Saves results to the database
- Generates a report with priorities
- Creates tickets in Jira for high-priority findings
This isn’t a chatbot that answers questions. This is a system that does work.
Why does enterprise need AI Agents right now?
1. Rising specialist labour costs The median salary for a senior developer in Poland is PLN 180-220k annually (2026 data). An AI Agent costs 5-10% of that budget and works 24/7.
2. Talent shortage Polish IT companies report 40,000 unfilled positions (PARP 2026 report). Agents fill gaps in repetitive tasks, allowing people to focus on work requiring creativity.
3. Pressure on efficiency CFOs demand ROI from every IT investment. An agentic system in customer support reduces response time by 80% and costs by 60% (Zendesk AI Benchmark 2026 data).
4. Technology availability APIs from OpenAI, Anthropic, Google are stable, scalable and increasingly cheaper. Claude 3.7 Sonnet today costs $3 per 1M input tokens - 10x cheaper than a year ago.
5. Competitive pressure Your competitors are already testing this. The first wave of adopters will gain an advantage in 2026-2027. Those who start later will be playing catch-up.
What technical competencies are needed to build AI agents?
Implementing AI Agents requires completely new skills from the team. Here are 7 key competency areas you need:
1. Prompt Engineering - the foundation of everything
What it is: The art of designing instructions for language models to get predictable and valuable responses.
Why it’s crucial: An agent only works as well as it’s been “programmed” with prompts. A poorly written prompt = an agent that hallucinates, wastes tokens and gives unstable results.
What to learn:
- System prompts - how to define the agent’s role, context and constraints
- Few-shot learning - technique of showing examples in the prompt
- Chain of Thought (CoT) - forcing step-by-step thinking
- Structured outputs - enforcing JSON/XML formats in responses
- Safety and guardrails - how to prevent unwanted behaviour
Example from practice: In a project for a bank, we had to build an agent to analyse credit documentation. The first prompt was 200 words. The agent hallucinated in 15% of cases. After redesigning to a structural prompt with explicit CoT and examples - hallucinations dropped to <2%.
Learning tools:
- Anthropic Prompt Engineering Guide
- OpenAI Cookbook
- LangSmith (for debugging prompts)
2. Retrieval-Augmented Generation (RAG) - the agent’s memory
What it is: A technique combining a language model with a knowledge base. Before responding, the agent searches for relevant documents and uses them as context.
Why it’s crucial: LLMs don’t know your documentation, your code, your company policy. RAG allows the agent to operate on current, company knowledge without fine-tuning the model.
What to learn:
- Embeddings - how to transform texts into vectors
- Vector databases - Pinecone, Weaviate, pgvector, Qdrant
- Chunking strategies - how to divide documents into fragments (chunk size, overlap)
- Hybrid search - combining semantic search (embedding) with keyword search (BM25)
- Reranking - how to improve search accuracy (Cohere Rerank, Cross-Encoder)
- Metadata filtering - filtering by date, author, document type
Example from practice: Agent for customer support in e-commerce. We had 5000 articles from the knowledge base. Naive RAG (500-token chunk, top 3 results) gave 65% accuracy. After introducing:
- Hybrid search (semantic + keyword)
- Hierarchical chunking (summary + details)
- Metadata filtering (publication date, category)
- Reranking before submitting to LLM
Accuracy increased to 89%.
Tools:
- LangChain (RAG chains)
- LlamaIndex (data connectors)
- ChromaDB, Pinecone (vector storage)
3. Tool Use (Function Calling) - the agent’s hands and feet
What it is: The agent’s ability to call external functions and APIs. The agent “knows” what tools are available and autonomously decides when to use which one.
Why it’s crucial: Without tool use, an agent is just a chatbot. Tool use gives the agent the ability to act - sending emails, saving to database, calling APIs, running scripts.
What to learn:
- Function schemas - how to describe a function in JSON schema format
- Parameter extraction - how the model extracts parameters from user input
- Error handling - what happens when API returns an error
- Tool selection strategies - how the agent chooses the right tool
- Safety boundaries - which tools the agent should NOT have (delete operations, payment APIs without approval)
Example from practice: Agent for code review in a fintech company. Available tools:
get_file_content(path)- reading file from reporun_static_analysis(file)- running linter/SASTsearch_codebase(query)- semantic search in codecreate_github_comment(pr_id, comment)- comment in PRget_test_coverage(file)- checking test coverage
The agent autonomously:
- Fetches changed files from PR
- Runs static analysis for each
- If it detects an issue - searches codebase for similar problems
- Checks test coverage
- Generates detailed review with examples and links
- Adds comment to PR
Frameworks with native tool support:
- Anthropic Claude (tool use API)
- OpenAI (function calling)
- LangChain (tools abstraction)
4. Multi-Agent Orchestration - teamwork of agents
What it is: Systems where multiple agents collaborate, each with a specialisation. One agent plans, another does research, third writes code, fourth tests.
Why it’s crucial: Complex tasks require different “roles”. One agent-for-everything doesn’t scale in enterprise use cases.
What to learn:
- Agent roles - how to define roles and responsibilities
- Communication protocols - how agents pass information to each other
- Coordination patterns - hierarchical (boss-worker), sequential (pipeline), parallel (fork-join)
- Conflict resolution - what happens when agents disagree
- State management - how to manage shared context between agents
Example from practice: System for content marketing automation:
Agent 1 - Strategist:
- Analyses trends (Google Trends API)
- Chooses topic
- Defines target persona
Agent 2 - Researcher:
- Searches for sources (Perplexity API, arXiv, PubMed)
- Extracts key insights
- Validates facts
Agent 3 - Writer:
- Writes draft based on research
- Applies brand voice guidelines
- Optimises for SEO
Agent 4 - Editor:
- Reviews for facts
- Checks grammatical correctness
- Final polish
Each agent is a specialist. Coordination through orchestrator (LangGraph).
Frameworks:
- CrewAI (role-based agents)
- AutoGen (conversational agents)
- LangGraph (state-based orchestration)
5. Guardrails and Safety - how not to lose control
What it is: Control and security mechanisms preventing unwanted agent behaviour.
Why it’s crucial: An autonomous agent without guardrails is a legal, financial and reputational risk. It can hallucinate, leak data, execute a destructive operation.
What to learn:
- Input validation - filtering prompt injection
- Output validation - checking if response is safe
- PII detection - detecting personal data (GDPR compliance)
- Fact-checking - validating agent assertions
- Rate limiting - protection against runaway agents (agent in a loop)
- Human-in-the-loop (HITL) - when agent must wait for approval
- Audit logging - tracking all decisions and actions
Example from practice: Agent for handling complaints in e-commerce:
Guardrails:
- Cannot propose refund >PLN 500 without approval
- Must detect PII in transcripts and mask (GDPR)
- Output filtered through toxicity detector
- Every decision logged to audit trail
- Timeout after 10 LLM calls (protection against loop)
Tools:
- Guardrails AI (framework)
- NeMo Guardrails (NVIDIA)
- LangSmith (monitoring)
- Presidio (PII detection)
6. LLM APIs and Model Selection - choosing the right model
What it is: The ability to choose the right LLM model for the task and work effectively with APIs.
Why it’s crucial: Different models have different strengths. GPT-4o is expensive but powerful. Claude 3.5 Haiku is fast and cheap. Wrong choice = wasting budget or poor results.
What to learn:
- Model capabilities - what GPT-4o, Claude 3.7, Gemini 1.5, Llama 3.3 can do
- Cost optimisation - when to use a small model and when a large one
- Latency requirements - real-time vs batch processing
- Context windows - how much data the model accepts (128k, 200k, 1M tokens)
- Structured outputs - which models give the best JSON
- Multi-modal capabilities - vision, audio (if needed)
Example from practice:
| Task | Model | Why |
|---|---|---|
| Code review | Claude 3.7 Sonnet | Excellent reasoning, 200k context |
| Customer support (triage) | GPT-4o mini | Cheap, fast, sufficient |
| Legal document analysis | GPT-4o | Best for complex documents |
| Data extraction from PDF | Claude 3.5 Haiku | Vision + speed + cost |
| Summarisation (batch) | Llama 3.3 70B (self-hosted) | Zero API cost |
Key skill: Routing - agent decides which model to use for which sub-task.
7. Monitoring and Observability - seeing what the agent does
What it is: Tools and practices allowing you to track, debug and optimise agent operation in production.
Why it’s crucial: An agent in production is a black box. Without observability you don’t know why the agent failed, how much it costs, where the bottlenecks are.
What to learn:
- Distributed tracing - tracking entire flow request → response
- Token usage tracking - how much individual operations cost
- Latency metrics - where delays are
- Error tracking - error classification (LLM error vs API error vs logic error)
- Quality metrics - accuracy, hallucination rate, user satisfaction
- A/B testing - comparing different prompts/models
Example from practice: Dashboard for agent in production (5000 req/day):
Metrics:
- Avg latency: 3.2s (target <5s) ✅
- Avg cost per request: $0.08 (target <$0.10) ✅
- Error rate: 2.1% (target <5%) ✅
- User satisfaction (thumbs up): 87% (target >85%) ✅
Alert: Spike in hallucination rate (8% → 15%) at weekend. Root cause: Wikipedia API was down, agent didn’t have current data and hallucinated. Fix: adding fallback source.
Tools:
- LangSmith (LangChain)
- Weights & Biases (W&B)
- Helicone (LLM observability)
- Arize AI (ML observability)
Which frameworks and tools are worth knowing?
The AI Agents ecosystem is developing rapidly. Here are the most mature tools and frameworks worth knowing in 2026:
LangChain - the Swiss Army knife of agents
What it is: The most popular framework for building LLM applications and agents. Python and TypeScript.
When to use:
- RAG applications (document Q&A)
- Simple to medium complexity agents
- Prototyping and MVP
- When you need a large integration ecosystem (200+ integrations out-of-the-box)
Strengths:
- Huge community
- Plenty of examples and tutorials
- LangSmith (monitoring)
- LangGraph (orchestration)
Weaknesses:
- API changes frequently (breaking changes)
- Abstractions sometimes too heavy
- Performance overhead
Use case: Agent for analysing legal documents in a law firm. RAG on 10k+ documents, semantic search, citation tracking.
CrewAI - multi-agent for teams
What it is: Framework focused on multi-agent collaboration. Agents as “crew members” with different roles.
When to use:
- Multi-agent systems (3+ agents)
- Complex workflows requiring specialisation
- Projects where you need “teamwork” of agents
Strengths:
- Role-based design (intuitive)
- Built-in collaboration patterns
- Task delegation
- Memory between agents
Weaknesses:
- Younger ecosystem
- Fewer integrations than LangChain
Use case: Content marketing system (strategist + researcher + writer + editor). Each agent = role in team.
AutoGen - conversational agents from Microsoft
What it is: Framework from Microsoft Research. Agents converse with each other to achieve goal.
When to use:
- Agents that must negotiate / debate
- Code generation (agent writes code → second tests → first corrects)
- Research tasks (multiple perspectives)
Strengths:
- Excellent for code generation
- Human-in-the-loop patterns
- Mature research backing (Microsoft Research)
Weaknesses:
- Documentation could be better
- Sometimes agents talk “too much” (cost overhead)
Use case: Agent for automatically fixing bugs. Agent 1 analyses bug, Agent 2 proposes fix, Agent 3 writes test, Agent 4 reviews.
LangGraph - state machines for agents
What it is: Part of LangChain, but deserves separate mention. Graph-based orchestration.
When to use:
- Complex agent workflows with branching logic
- When you need full control over flow
- State management between steps
- Production-grade reliability
Strengths:
- Deterministic flow (not like conversational agents)
- Debugging is easier (you see the graph)
- Conditional routing
- Checkpointing (you can resume agent from any step)
Weaknesses:
- Steep learning curve
- Requires thinking in terms of graphs
Use case: Agent for employee onboarding. Multi-step workflow: data collection → account creation → access configuration → equipment dispatch → progress tracking. Each step = node in graph. Conditional routing if something fails.
Anthropic Claude API - direct approach
What it is: Native API from Anthropic. No framework. Direct HTTP calls.
When to use:
- Maximum control
- Performance-critical applications
- Don’t need abstractions
- Already have your own orchestration layer
Strengths:
- Zero overhead
- Latest features first (tool use, prompt caching, extended thinking)
- Best-in-class documentation
- Excellent reasoning
Weaknesses:
- Must build more yourself (retry logic, error handling, observability)
Use case: High-frequency trading bot. Latency critical, you need control over every millisecond.
OpenAI Assistants API - managed agent
What it is: Fully managed agent service from OpenAI. You create agent through API, OpenAI manages execution.
When to use:
- Want to deploy quickly without building infrastructure
- Need Code Interpreter (agent executes Python code in sandbox)
- File retrieval out-of-the-box
- Don’t want to manage state
Strengths:
- Managed service (less DevOps)
- Code Interpreter
- Built-in file handling
- Persistent threads
Weaknesses:
- Less control (black box)
- Vendor lock-in
- No self-hosting
Use case: Internal tool for data analysts. Agent analyses CSV/Excel, generates charts, answers questions. Code Interpreter does heavy lifting.
Summary: What to choose?
| You need | Framework |
|---|---|
| Prototype in 2 days | LangChain + Claude API |
| Production RAG | LangChain + Pinecone + LangSmith |
| Multi-agent workflow | CrewAI or LangGraph |
| Complex state machine | LangGraph |
| Code generation | AutoGen |
| Maximum control | Direct API (Claude/GPT-4o) |
| Quick MVP without infra | OpenAI Assistants API |
Pro tip: Start with a framework (LangChain), learn concepts, build MVP. As project matures and you need more control - migrate to direct API or your own orchestration layer. 80% of projects never need this.
What does agentic system architecture look like in enterprise?
Implementing AI agents in enterprise isn’t “adding a chatbot to the website”. It’s a distributed architecture with many components. Here’s a reference design based on real deployments:
Single-Agent Architecture - for simpler use cases
Components:
User Request
↓
API Gateway (FastAPI/Express)
↓
Agent Orchestrator
↓
┌─────────────────┐
│ LLM Agent │
│ (Claude 3.7) │
└────────┬────────┘
│
┌────┴─────┬──────────┬─────────┐
↓ ↓ ↓ ↓
Vector DB Tools Memory Guardrails
(Pinecone) (APIs) (Redis) (validation)
↓ ↓ ↓ ↓
└──────────┴──────────┴─────────┘
↓
Response + Logs
↓
Observability
(LangSmith/Helicone)
Example: Customer support agent.
Flow:
- User asks question
- Agent searches knowledge base (Vector DB)
- If doesn’t find - uses Tool:
search_past_tickets(query) - Generates response
- Guardrails check for PII leaks
- Response returns to user
- Feedback (thumbs up/down) saved to Memory
Infrastructure:
- API: FastAPI on AWS Lambda
- Vector DB: Pinecone (managed)
- Memory: Redis (ElastiCache)
- LLM: Claude 3.7 Sonnet (Anthropic API)
- Monitoring: LangSmith
SLA: 95% requests <3s, 99.9% uptime, cost <$0.10 per request.
Multi-Agent Architecture - for complex workflows
Components:
User Request
↓
Orchestrator (LangGraph)
↓
┌─────────┬──────────┬──────────┬─────────┐
│ Agent 1 │ Agent 2 │ Agent 3 │ Agent N │
│(Planner)│(Executor)│(Reviewer)│ (...) │
└────┬────┴────┬─────┴────┬─────┴────┬────┘
│ │ │ │
└─────────┴──────────┴──────────┘
↓
Shared Context
(State Management)
↓
Message Queue (RabbitMQ)
↓
┌──────────────┴──────────────┐
↓ ↓
Tools & Services Vector Stores
(APIs, DBs, etc.) (Knowledge)
Example: Automated incident response in DevOps.
Agents:
1. Detector Agent:
- Monitors metrics (Prometheus)
- Detects anomalies
- Creates initial incident report
2. Diagnostic Agent:
- Analyses logs (Elasticsearch)
- Correlates with previous incidents
- Generates hypothesis
3. Remediation Agent:
- Proposes fix (restart service, scale up, rollback deploy)
- Executes if confidence >80%
- Escalates to human if confidence <80%
4. Communication Agent:
- Updates status page
- Notifies Slack/PagerDuty
- Generates post-mortem draft
Orchestration (LangGraph):
Detector → Diagnostic → Remediation → Communication
↓ (if high severity)
Human Approval
Infrastructure:
- Orchestrator: Kubernetes Job
- Agents: Separate containers
- Message Queue: RabbitMQ
- State: PostgreSQL
- Observability: Grafana + LangSmith
Human-in-the-Loop (HITL) - critical for enterprise
Why: Enterprise can’t afford full autonomy in all cases. Regulations, risk management, legal require human oversight.
When needed:
- High-stakes decisions (e.g. financial transactions >X PLN)
- Regulated industries (banking, healthcare, legal)
- Low confidence predictions (agent isn’t sure)
- Novel situations (agent never saw this)
Implementation:
Pattern 1: Approval Gate Agent prepares action → waits for human approval → executes.
Pattern 2: Review Loop Agent executes → human review after the fact → feedback for learning.
Pattern 3: Confidence Threshold If confidence >90% → auto-execute. If <90% → human review.
Example: Agent for approving refunds in e-commerce.
- Refund <PLN 100 → auto-approve (confidence usually >95%)
- Refund PLN 100-500 → human review if confidence <90%
- Refund >PLN 500 → ALWAYS human approval
Tools: Retool, Internal Admin Panel, Slack bot with approve/reject buttons.
Security and Compliance - cannot be omitted
Enterprise requirements:
1. Data privacy (GDPR):
- PII must be masked before sending to LLM API
- Or: self-hosted model (Llama 3.3 on own infra)
- Audit log who when what data accessed
2. Access control:
- Agent cannot have access to all data
- Role-based access (RBAC)
- Agent for HR shouldn’t see financial data
3. Rate limiting:
- Protection against runaway agents (agent in loop calling API 1000x/sec)
- Circuit breakers
- Timeouts
4. Audit trail:
- Every agent decision logged
- Immutable log (append-only)
- Retention compliant with regulations (7 years in financial services)
5. Disaster recovery:
- What if LLM API is down? (fallback: human queue)
- What if agent did something destructive wrong? (rollback mechanisms)
Reference architecture for highly regulated industry (banking):
- Self-hosted Llama 3.3 70B (on-premise, zero data leakage)
- All PII encrypted at rest and in transit
- Agent has read-only access to production DB (queries through read replica)
- Write operations ONLY through approval queue
- 100% audit coverage
- Incident response playbook
What roles in the team are needed?
Implementing AI Agents requires new roles or expanding existing ones. Here’s a typical agentic team composition in enterprise:
1. AI/ML Engineer - builder
Responsibility:
- Agent implementation
- Integration with LLM APIs
- RAG pipelines
- Tool development
Tech stack: Python, LangChain, FastAPI, Vector DBs, LLM APIs
Profile: Senior developer with ML/AI experience, understands how LLMs work, can debug non-deterministic systems.
How many you need: 2-3 on team of 10.
2. Prompt Engineer - programming with natural language
Responsibility:
- Designing system prompts
- Cost optimisation (token usage)
- Testing different prompting strategies
- Documenting prompt libraries
Tech stack: LLM APIs, prompt evaluation frameworks, spreadsheets
Profile: Technical writer + data mindset + understanding of LLM capabilities. Doesn’t have to code, but should understand code.
How many you need: 1 for 3-4 agents.
Note: In many teams role overlaps with AI Engineer. In initial phase you can go without dedicated Prompt Engineer. When you have 10+ agents in production - you need one.
3. Data Engineer - knowledge infrastructure
Responsibility:
- Building data pipelines for RAG (ETL from various sources to vector DB)
- Data quality for training data
- Embeddings infrastructure
- Vector DB management
Tech stack: Airflow, DBT, SQL, Python, Vector DBs
Profile: Classic Data Engineer, but understands AI specifics (embeddings, chunking strategies).
How many you need: 1-2 for larger project (if RAG is core).
4. MLOps Engineer - production and monitoring
Responsibility:
- Agent deployment
- CI/CD for ML
- Monitoring (observability)
- Scaling, performance optimisation
- Incident response
Tech stack: Kubernetes, Docker, Terraform, Prometheus, Grafana, LangSmith
Profile: DevOps Engineer with ML systems experience. Understands that ML != traditional software (non-deterministic, data dependencies).
How many you need: 1 MLOps for 5-10 AI Engineers (ratio like in traditional DevOps).
5. Product Manager - value owner
Responsibility:
- Defining use cases
- Feature prioritisation
- Success metrics (what does it mean that agent works well?)
- User research (how people use agent)
Tech stack: Analytics (Mixpanel), user research tools, basic understanding of AI capabilities
Profile: Product Manager with technical curiosity. Must understand what’s possible with LLM (and what isn’t). Doesn’t have to code.
How many you need: 1 PM can handle 2-3 agent projects in parallel.
6. Domain Expert - the one who knows HOW
Responsibility:
- Defining business logic
- Validating agent outputs
- Training data labelling
- Feedback loops
Tech stack: Domain knowledge, spreadsheets, annotation tools
Profile: Expert in the field (e.g. if agent does legal research → lawyer, if customer support → support lead). Doesn’t have to know AI, but must want to collaborate with AI.
How many you need: 1 Domain Expert per agent (part-time, 20-40% time).
Example: Agent for code review. Domain Expert = senior developer who defines what “good code” means and validates whether agent reviews correctly.
7. Security/Compliance Officer - risk guardian
Responsibility:
- Security review of agents
- Compliance (GDPR, NIS2, industry regulations)
- Risk assessment
- Incident response for security issues
Tech stack: Security tools, audit logs, compliance frameworks
Profile: Security Engineer or Compliance Specialist who understands AI specifics (data leakage via prompts, adversarial attacks, jailbreaking).
How many you need: 1 for organisation (if you have several agent projects). Involvement grows with industry regulation.
Typical team for medium-sized agent project (3-6 months):
- 1 Product Manager (50% time)
- 2 AI/ML Engineers (full-time)
- 1 Data Engineer (50% time, if RAG)
- 1 MLOps Engineer (30% time, shared resource)
- 1 Domain Expert (30% time)
- 1 Security review (10% time, at start and before go-live)
Total headcount: ~3.5 FTE.
Use cases of AI agents in companies - specific examples
AI Agents aren’t theory. In 2026 we have hundreds of production deployments. Here are real use cases from Polish and global companies:
1. Customer Support Automation - classic that works
Company: E-commerce scale-up (10M+ users)
Problem:
- 5000 tickets daily
- 50% tickets are repetitive questions (where’s my order? how to return? change details)
- Support team (40 people) can’t keep up
Solution - Agent:
- Integration with Zendesk (reads ticket)
- RAG on knowledge base (1000+ articles)
- Access to systems (order management, CRM, logistics APIs)
- Responds autonomously to Level 1 queries
- Escalates to human if doesn’t know
Results (after 6 months):
- 65% tickets resolved by agent (zero human touch)
- Avg resolution time: 2 minutes (was: 4 hours)
- CSAT: 4.2/5 (agent) vs 4.5/5 (human) - small difference
- Savings: 25 FTE (~PLN 2.5M annually)
- Support team now focuses on complex issues and proactive outreach
Tech stack: LangChain, GPT-4o mini, Pinecone, Zendesk API.
2. Code Review Assistant - Dev Productivity development
Company: Software house (200 devs)
Problem:
- Code review bottleneck (PRs wait 2-3 days)
- Juniors don’t know what to check in review
- Repetitive comments (naming, style, security basics)
Solution - Agent:
- Triggered by GitHub webhook (new PR)
- Analyses diff
- Runs SAST (Semgrep)
- Checks test coverage
- Semantic search in codebase (did someone solve similar problem)
- Leaves comments in PR with suggestions
Results:
- 90% PRs get initial review in <5 minutes (was: 2 days)
- 40% PRs agent approves without human review (low-risk changes)
- Human reviewers focus on architecture and business logic
- Onboarding juniors faster (agent teaches through feedback)
Interesting insight: Agent didn’t replace human review. Changed their role from “find typo and security bug” to “is this good design?”.
Tech stack: AutoGen, Claude 3.7 Sonnet, GitHub API, Semgrep.
3. Financial Document Analysis - AI in banking
Company: Bank (top 5 in Poland)
Problem:
- Credit documentation analysis (income statements, tax returns) takes analysts 2-4 hours per case
- 10k+ credit applications monthly
- Manual errors (20% cases require re-review)
Solution - Agent (highly regulated):
- Self-hosted Llama 3.3 70B (on-premise, GDPR compliance)
- Data extraction from PDF (OCR + LLM)
- Cross-verification (does income from different documents match)
- Risk scoring
- Draft recommendation (approve / reject / needs human review)
Results:
- 70% cases agent does initial analysis in <10 minutes
- Human analyst review takes 30 minutes (was: 2-4h)
- Error rate: 5% (was: 20%)
- Throughput increased 3x without increasing team size
Compliance:
- 100% audit trail
- Human always makes final decision (agent = recommendation only)
- Quarterly review of agent decisions by Compliance
Tech stack: Llama 3.3 70B (self-hosted), LangChain, PostgreSQL, custom OCR.
4. HR Onboarding Automation - employee experience
Company: Tech corp (2000 employees)
Problem:
- Onboarding new employee = 40 tasks (creating accounts, equipment dispatch, training, compliance)
- HR coordinators spend 5h per new hire on tasks
- New employees wait weeks for full access
Solution - Multi-Agent System:
Agent 1 - Coordinator: Manages workflow (LangGraph state machine).
Agent 2 - Account Creator: Creates accounts (Okta, Google Workspace, GitHub, Slack).
Agent 3 - Equipment Manager: Orders laptop, monitor, accessories (integration with procurement system).
Agent 4 - Trainer: Sends training materials, tracks completion, reminds.
Agent 5 - Compliance: Collects signatures on NDAs, GDPR consents, security policies.
Results:
- Time-to-productivity: 3 days (was: 14 days)
- HR coordinators time: 30 minutes per hire (was: 5h)
- Employee satisfaction +40%
- Zero missed compliance docs (was: 15% onboardings had gaps)
Human-in-the-loop: HR coordinator approves plan at start. Then agent executes autonomously.
Tech stack: LangGraph, CrewAI, Workday API, Okta API, custom integrations.
5. Sales Intelligence - AI for B2B sales
Company: B2B SaaS (ARR $50M)
Problem:
- SDRs spend 60% time on research (LinkedIn, company websites, news)
- Outreach is generic (low response rate: 3%)
- Hard to scale sales without 2x team size
Solution - Agent:
- Daily scan target accounts (Crunchbase, LinkedIn, Google News)
- Identifies trigger events (funding, hiring, product launch, leadership change)
- Generates personalised outreach (email + LinkedIn message)
- Enriches CRM (Salesforce) with research notes
- Proposes next steps for SDRs
Results:
- Response rate: 3% → 12% (4x improvement!)
- SDRs focus on conversations, not research
- Pipeline increased 2.5x without increasing SDR headcount
- Deal size higher (better targeted accounts)
Interesting insight: Agent doesn’t send emails autonomously. SDR always reviews and approves (HITL). But draft quality is so good that in 80% SDR clicks “send” without editing.
Tech stack: LangChain, GPT-4o, Apollo.io API, Salesforce API, LinkedIn Sales Navigator.
6. IT Ops - automated incident response
Company: Fintech (payment processing)
Problem:
- Incident response: detection → diagnosis → fix takes 30-90 minutes
- SLA: 99.95% uptime (max 4h downtime annually)
- On-call engineers burned out (alerts at night)
Solution - Agent (see: multi-agent architecture above):
Agents: Detector → Diagnostic → Remediation → Communication.
Autonomous actions:
- Restart unhealthy pod
- Scale up if high load
- Clear cache if memory issue
- Rollback deploy if post-deploy spike in errors
Human escalation:
- Unknown issue
- Fix requires access to production DB
- Severity = critical (payment processing down)
Results:
- 60% incidents resolved autonomously in <5 minutes
- MTTR (Mean Time To Resolution): 10 minutes (was: 45 minutes)
- On-call alerts down 70% (fewer night wake-ups)
- SLA achievement: 99.98% (was: 99.94%)
Trust building: Agent operated for 3 months in “recommendation mode” (suggested fix, human executed). After demonstrating 95% accuracy - enabled auto-remediation for low-risk issues.
Tech stack: LangGraph, Claude 3.5 Sonnet, Kubernetes API, Prometheus, PagerDuty.
How to start - roadmap for implementing AI agents in organisation
Implementing AI Agents in enterprise is 6-12 months from POC to production. Here’s a proven roadmap:
Phase 1: Discovery and Education (month 1-2)
Goals:
- Education of team leads and decision makers
- Identification of high-impact use cases
- Assessment of current competencies
- Selection of first pilot project
Activities:
1.1 Leadership workshops (1 day):
- What are AI Agents (vs chatbots, vs RPA)
- Use cases from industry
- Costs, timeline, ROI expectations
- Risk assessment
1.2 Technical assessment (1 week):
- Audit of current stack (do you have data infrastructure?)
- Inventory of available data (knowledge bases, DBs, APIs)
- Security/compliance requirements
- Budget for LLM APIs and tools
1.3 Use case brainstorming (2 weeks): Collect use cases from different departments. Score according to:
- Business impact (how much does it cost today? how much will we save?)
- Technical feasibility (do we have data? do we have APIs?)
- Risk (what happens if agent makes mistake?)
- Timeline (MVP in 3 months possible?)
Scoring framework:
Score = (Impact × Feasibility) / Risk
Impact: 1-10 (ROI, user satisfaction)
Feasibility: 1-10 (technical complexity)
Risk: 1-10 (failure consequences)
Select top 3 use cases.
1.4 First project selection:
Ideal first project characteristics:
- Medium impact (valuable enough to get budget)
- High feasibility (you want quick win)
- Low risk (failure won’t sink project)
- Repeatability (success can scale)
Examples of good first projects:
- Internal IT helpdesk (low risk, easy to measure ROI)
- Content summarisation (low risk, immediate value)
- Data extraction from documents (clear success metric)
Bad first projects:
- Customer-facing chatbot (high visibility, high risk if fails)
- Financial decision automation (regulatory concerns)
- Anything “strategic” (too many expectations, too many stakeholders)
Phase 1 Deliverable: 2-3 page project brief for pilot: use case, team, timeline (3 months), budget, success metrics.
Phase 2: Team Upskilling (month 2-3, parallel with Phase 1)
Goals:
- Team has AI Agents fundamentals (prompt engineering, RAG, tool use)
- Dev environment setup
- Tech stack selection
Activities:
2.1 Team training (3-5 days intensive):
Day 1: LLM Fundamentals
- How LLMs work (transformers, tokens, context window)
- Capabilities and limitations
- Prompt engineering basics
- Hands-on: first prompts in Anthropic/OpenAI playground
Day 2: RAG (Retrieval-Augmented Generation)
- Embeddings and vector search
- Chunking strategies
- Vector databases (Pinecone, Weaviate)
- Hands-on: build simple RAG Q&A
Day 3: Agents and Tool Use
- What is agent (vs model)
- Tool use / function calling
- Orchestration patterns
- Hands-on: agent that calls APIs
Day 4: Frameworks (LangChain / CrewAI / LangGraph)
- Framework overview
- When to use which
- Hands-on: rebuild Day 2-3 examples in LangChain
Day 5: Production (Monitoring, Safety, HITL)
- Observability (LangSmith, Helicone)
- Guardrails
- Human-in-the-loop patterns
- Hands-on: deploy agent with monitoring
Format: Hands-on, code-first. Zero slides. Trainer who built agents in production.
2.2 Dev environment setup:
- Accounts in Anthropic/OpenAI (API keys)
- Vector DB trial (Pinecone free tier)
- LangSmith for monitoring
- GitHub repo + CI/CD
2.3 Tech stack decision: Based on use case choose:
- LLM provider (Claude, GPT, Gemini, self-hosted)
- Framework (LangChain, CrewAI, LangGraph, direct API)
- Vector DB (if RAG)
- Monitoring (LangSmith, Helicone, W&B)
- Deployment (Lambda, Kubernetes, modal.com)
Phase 2 Deliverable: Team knows basics, has working dev environment, first version of tech stack selected.
Phase 3: MVP Development (month 3-4)
Goals:
- Working prototype of first use case
- Validate technical feasibility
- Early user feedback
Activities:
3.1 Sprint 1 (2 weeks): Core Agent
- Core logic implementation
- Integration with LLM API
- Basic tool use (1-2 tools)
- Happy path works
3.2 Sprint 2 (2 weeks): RAG (if needed)
- Data preparation (chunking)
- Vector DB setup
- Retrieval pipeline
- Testing retrieval quality
3.3 Sprint 3 (2 weeks): Error Handling + Monitoring
- Retry logic, timeouts
- Guardrails (basic)
- Logging, observability
- Error handling for unhappy paths
3.4 Sprint 4 (2 weeks): User Testing
- Deploy to staging
- 5-10 internal users test
- Collect feedback
- Iterate based on feedback
MVP success metrics:
- Agent solves 50%+ test cases correctly
- Latency <10s (target for production: <5s)
- Zero critical bugs
- User feedback >3.5/5
Phase 3 Deliverable: Working MVP, user tested, ready for limited beta.
Phase 4: Beta Deployment (month 5-6)
Goals:
- Deploy to production (limited users)
- Monitoring in real-world conditions
- Collect metrics for ROI
Activities:
4.1 Production Deployment:
- Infrastructure setup (if Kubernetes - helm charts, if serverless - Lambda/Cloud Run)
- Security review (penetration testing, compliance check)
- HITL setup (approval workflows if needed)
- Runbook for on-call (what to do if agent fails)
4.2 Beta Programme (4-6 weeks):
- 20-30% users (or 1 department)
- Daily monitoring dashboards
- Weekly review meetings
- Rapid iteration based on production data
Key metrics to track:
- Usage (requests per day)
- Success rate (% resolved without escalation)
- Latency (p50, p95, p99)
- Cost per request
- User satisfaction (CSAT, NPS)
- Error rate, error types
4.3 Incident Response:
- On-call rotation (team available if agent fails)
- Rollback plan (if something very wrong - how to turn off agent and return to manual process)
Success criteria for moving to Full Deployment:
- Success rate >70% (or other target metric for use case)
- CSAT >4/5
- Zero critical bugs in last 2 weeks
- Cost per request in budget
- Stakeholder approval
Phase 4 Deliverable: Agent in production (beta), metrics tracked, stakeholder confidence it works.
Phase 5: Scale & Optimise (month 7-12)
Goals:
- 100% rollout
- Cost optimisation
- Next use cases
Activities:
5.1 Full Rollout (month 7-8):
- Gradual traffic increase (20% → 50% → 100%)
- Communication to users (what’s changing, how to use)
- Training materials (if HITL)
5.2 Cost Optimisation (month 8-9):
- Profiling: where are we spending most tokens?
- Prompt optimisation (shorter = cheaper)
- Model selection (is GPT-4o mini sufficient instead of GPT-4o?)
- Caching (prompt caching, Anthropic extended thinking)
- Batch processing (where real-time not required)
Target: Reduce cost per request by 30-50%.
5.3 Feature Expansion (month 9-10):
- Additional tools
- Multi-lingual support
- Advanced guardrails
- Better UX
5.4 Next Use Cases (month 10-12):
- Apply learnings from first project
- Pilot 2-3 next use cases
- Build internal “AI Agent Platform” (shared infrastructure for next agents)
Phase 5 Deliverable: First agent fully deployed, cost-optimised, production-grade. 2-3 next use cases in pipeline.
Mistakes to avoid
1. Starting with too ambitious use case First project = proof of concept for organisation. Choose something realistically achievable in 3-4 months.
2. Lack of Domain Expert in team AI Engineer doesn’t know how your process should work. You need domain expert who knows.
3. Ignoring LLM APIs costs Prototype: $50/month. Production (10k requests/day): $5k/month. Plan budget.
4. Lack of HITL for high-risk cases Full autonomy sounds nice, but enterprise can’t afford risk. Start with HITL, gradually increase autonomy.
5. Premature multi-agent architecture Start with single agent. Multi-agent only when you really need it.
6. Lack of observability from Day 1 You can’t debug what you don’t see. Setup LangSmith/Helicone immediately.
How EITT prepares teams to work with AI Agents
At EITT we understand that AI Agents aren’t just new technology - they’re a new way of thinking about IT systems. Since 2024 we’ve been running dedicated training programmes preparing teams to implement AI agents in organisations.
Our training programmes for AI Agents
1. AI Agents Fundamentals (3 days, hands-on)
For whom: Developers, Tech Leads, Architects who want to start with AI Agents.
Programme:
- Day 1: LLM fundamentals + Prompt Engineering
- Day 2: RAG (Retrieval-Augmented Generation)
- Day 3: Agents, Tool Use, Orchestration
Format: 70% practice, 30% theory. Each participant builds working agent from scratch.
After training you can:
- Design and implement simple agent
- Build RAG system for company knowledge
- Integrate agent with external APIs (tool use)
- Choose appropriate framework (LangChain vs CrewAI vs direct API)
2. Production AI Agents (2 days, advanced)
For whom: Teams that already have prototype and want to deploy to production.
Programme:
- Monitoring and observability (LangSmith, Helicone)
- Guardrails and safety
- Cost optimisation
- Human-in-the-loop patterns
- Multi-agent architectures (LangGraph, CrewAI)
- Security and compliance
Format: Workshop based on your use case. Bring your code - leave with production deployment plan.
3. LLM and Agent Architectures (2 days)
For whom: Architects, Senior Engineers designing agentic systems.
Programme:
- Single-agent vs multi-agent architectures
- State management and orchestration
- Distributed tracing
- Scaling patterns (how to handle 100k requests/day)
- Self-hosted vs API-based LLMs (trade-offs)
- Reference architectures from enterprise deployments
Format: Design workshops + case studies from production.
4. Prompt Engineering Masterclass (1 day)
For whom: Technical writers, Product Managers, AI Engineers who want to master prompt engineering.
Programme:
- System prompts design
- Few-shot learning
- Chain of Thought (CoT)
- ReAct pattern
- Structured outputs
- Prompt evaluation and testing
Format: Hands-on in playground (Anthropic Console, OpenAI Playground). Each participant leaves with library of own prompt templates.
5. Python for AI/ML (5 days, bootcamp)
For whom: Developers from other languages (Java, C#, JS) who want to jump to AI/ML ecosystem (Python).
Programme:
- Python basics (quick refresher/learning)
- Key libraries: NumPy, Pandas, Requests
- Working with APIs
- Async Python (for agent systems)
- LangChain hands-on
Format: Bootcamp. Intensive 5 days, lots of code.
Why EITT?
500+ experts, 2500+ trainings - we know how to teach IT.
Our AI/ML trainings aren’t theory from slides. They’re hands-on workshops led by practitioners who implement AI Agents in production companies.
Trainers with experience:
- Built production AI agents for e-commerce, fintechs, enterprise
- Publish in industry blogs and at conferences (PyData, AI Devs)
- Know not only “how” but “why” (and when NOT to use agents)
Adaptation to your stack: Don’t have stack yet? We’ll help choose (Claude vs GPT-4o, LangChain vs CrewAI). Already have stack? We’ll adapt training to your choice.
Closed programmes for companies: Need training for whole team (5-15 people)? We do closed trainings:
- Customised programme (your use case, your stack)
- At your office or online
- Possibility of code review of your prototype
- Follow-up consultations (after training, when you implement)
Post-training support: After training we don’t leave you alone. You have access to:
- Slack community (alumni of our trainings + trainers)
- Office hours (2x month, Q&A with trainer)
- Updated materials (because AI changes every month)
Rating 4.8/5 from 2500+ participants.
Don’t believe? Check references from ING, mBank, PKO BP, Allegro - companies that trained their AI/ML teams with us.
FAQ - frequently asked questions about AI Agents
1. Can AI Agents replace human workers?
Short answer: Won’t replace, but will change nature of work.
Long answer: AI Agents are excellent at:
- Repetitive tasks (answering same questions 1000 times)
- Data processing (document analysis, information extraction)
- First-level triage (customer support L1, incident classification)
People are better at:
- Complex reasoning requiring deep business context
- Empathy and emotional intelligence (customer success for high-value clients)
- Creative problem solving (new problems agent never saw)
- Strategic thinking
Reality: 50-person team with agents can deliver output like 80-person team without agents. But you won’t reduce team to 10 people. Agents increase leverage, don’t eliminate people.
Example from practice: Bank implemented agent for credit analysis. Didn’t fire analysts. But analyst who analysed 5 cases daily now analyses 15 (agent does initial pass, analyst does final review). Bank 3x increased throughput without 3x increasing team.
2. How much does implementing AI Agent in company cost?
Depends on complexity, but here are ballpark numbers:
Small project (simple single-agent, e.g. internal FAQ bot):
- Development: 3 months × 2 FTE = ~PLN 300k
- LLM API costs: $500-2000/month (depends on traffic)
- Infrastructure: $200-500/month
- Total first year: ~PLN 400k
Medium project (RAG-based agent with integrations, e.g. customer support):
- Development: 6 months × 3-4 FTE = ~PLN 900k
- LLM API costs: $3k-10k/month
- Infrastructure: $1k-2k/month
- Total first year: ~PLN 1.2M
Large project (multi-agent system, enterprise-grade):
- Development: 12 months × 5-7 FTE = ~PLN 2.5M
- LLM API costs: $10k-50k/month (depends on scale)
- Infrastructure: $5k-15k/month
- Total first year: ~PLN 3.5-4M
ROI: Customer support agent (medium project, PLN 1.2M):
- Savings: 20 FTE support × PLN 100k = PLN 2M annually
- ROI: PLN 2M - PLN 1.2M = PLN 800k net in first year
- Payback period: 7-8 months
Note: These are ballparks. Real cost depends on: use case complexity, existing infrastructure (do you already have vector DB?), team velocity, vendor choices (self-hosted Llama = $0 API cost, but requires infra).
3. Do we need our own model (fine-tuning) or are public APIs sufficient (OpenAI, Anthropic)?
For 90% use cases: public APIs sufficient.
When public API (Claude, GPT-4o, Gemini):
- Don’t have millions of training data examples
- Use case isn’t super niche
- Can send data to API (compliance allows)
- Want to start quickly (setup in hours, not months)
- Don’t want to manage ML infrastructure
When fine-tuning:
- Have very niche domain (e.g. medical imaging, legal for specific market)
- Have 10k-100k labelled examples
- Public model doesn’t give accuracy you need (even after good prompt engineering)
- ROI justifies cost (fine-tuning = $50k-500k depending on scale)
When self-hosted model (Llama 3.3, Mistral):
- Strict compliance (e.g. healthcare, banking in some countries - data cannot leave infrastructure)
- Very high volume (millions of requests/day - self-hosting can be cheaper)
- Need full control over model (custom modifications)
Recommendation: Start with public API (Claude 3.7 Sonnet or GPT-4o). As project matures and you have data - consider fine-tuning. Self-hosting only if compliance requires or volume justifies.
Case study: Fintech wanted fine-tuned model for fraud detection. Started with GPT-4o + good prompt engineering. Accuracy: 82%. After 6 months had 50k labelled examples. Fine-tuned GPT-4o. Accuracy: 89%. Fine-tuning cost: $80k. Benefit: $2M annually reduced fraud. ROI clear.
4. How long does implementing first agent take?
Quick answer: 3-6 months from decision to production.
Breakdown:
Month 1-2: Discovery + Team Upskilling
- Use case identification
- Team training (AI Agents fundamentals)
- Tech stack decision
Month 3-4: MVP Development
- Core agent implementation
- Integration with systems
- Initial testing
Month 5-6: Beta + Production Deployment
- Deploy to limited users
- Monitoring, iteration
- Full rollout
Fast track (if already have AI-savvy team): 2-3 months possible for simple use case.
Realistic for enterprise: 6-9 months (more stakeholders, security reviews, compliance).
Factors that extend:
- Lack of competencies in team (add +2 months for upskilling)
- Complex integrations (legacy systems, no APIs)
- High compliance requirements (healthcare, finance)
- Consensus paralysis (too many stakeholders, decision-making drags)
Factors that accelerate:
- Team with ML/AI experience
- Modern stack (cloud-native, APIs everywhere)
- Clear decision-making (empowered Product Manager)
- Choice of simple first use case
5. What are biggest risks of implementing AI Agents?
Top 5 risks and how to mitigate them:
1. Hallucinations (agent makes up facts)
Risk: Agent sounds confident but provides incorrect information.
Mitigation:
- RAG (agent responds based on retrieved docs, not from memory)
- Citations (agent always provides source: “According to Policy Doc v2.3…”)
- Fact-checking layer (second model checks output of first)
- Human review for high-stakes decisions
2. Data leakage / Privacy violations (GDPR)
Risk: Agent accidentally reveals PII (personal data), trade secrets, or data from another user context.
Mitigation:
- PII detection and masking (Presidio)
- Access control (agent sees only data user has right to see)
- Audit logs (who when what asked)
- Data retention policies (automatic deletion after N days)
- On-premise deployment if compliance requires
3. Cost runaway (agent costs more than budget)
Risk: Agent in loop calls LLM 1000x, prompt is too long, using expensive model where cheap suffices.
Mitigation:
- Rate limiting (max X requests per user per minute)
- Timeouts (agent must finish in Y seconds)
- Budget alerts (AWS/GCP billing alerts)
- Cost monitoring per request
- Prompt optimisation (shorter prompts = cheaper)
- Model routing (small model for easy tasks, large for hard tasks)
4. Agent does something destructive (deletes data, sends 1000 emails)
Risk: Bug in tool use logic, agent gets hallucination and calls delete API.
Mitigation:
- Read-only tools for prototype (agent can only read, not write)
- HITL for write operations (agent proposes, human approves)
- Dry-run mode (agent simulates action, doesn’t execute)
- Idempotency (if agent calls API 2x through retry - doesn’t do damage)
- Undo mechanisms (like rollback in DB)
5. User trust / adoption (people don’t use agent)
Risk: Agent is technologically excellent, but users don’t like it / don’t trust it / prefer old way.
Mitigation:
- User research BEFORE building (what do users need?)
- Transparency (agent clearly says what it’s doing and why)
- Escape hatch (user can always escalate to human)
- Gradual rollout (start with early adopters, don’t force on everyone)
- Show the value (dashboards showing how much time agent saved)
6. Does AI Agent require continuous monitoring and maintenance?
Yes, but less than you think.
What requires monitoring:
1. Quality metrics (ongoing):
- Success rate (% queries resolved correctly)
- User satisfaction (thumbs up/down)
- Escalation rate (how often agent must escalate to human)
Frequency: Daily dashboard review (5 minutes). Deep dive weekly (1h).
2. Cost monitoring:
- Token usage per request
- Total monthly spend (LLM API + infrastructure)
Frequency: Weekly review. Alert if exceed budget threshold.
3. Errors and anomalies:
- Error rate (% failed requests)
- Latency spikes
- New error types
Frequency: Real-time alerts (PagerDuty). Review daily.
4. Model updates:
- Is new model version (GPT-4.5, Claude 4.0) better?
- Does prompt need updating (model behaviour changed)?
Frequency: Quarterly (when providers release major updates).
5. Data drift:
- Is agent responding based on outdated knowledge?
- Are new products / policies in RAG knowledge base?
Frequency: Monthly refresh knowledge base.
Total time commitment:
- Initial 3 months after launch: 20-30% of one FTE (tight monitoring, rapid iteration).
- After stabilisation: 5-10% FTE (monitoring, occasional updates).
Comparison: Traditional software also requires maintenance (bug fixes, dependency updates). Agent maintenance is comparable, maybe slightly more due to non-deterministic nature.
7. Can small and medium companies afford AI Agents?
Absolutely yes.
Myth: AI Agents are only for FAANG and enterprise with million budgets.
Reality: In 2026 barrier to entry is low:
Costs for SME:
Setup:
- LangChain / OpenAI API: Free tier sufficient for prototype
- Vector DB: Pinecone free tier (1M vectors)
- Deployment: Vercel / Railway / Render (free tier or $20-50/m)
Total prototype cost: $0-100.
Production (small scale - 100-500 users):
- LLM API: $200-800/month
- Vector DB: $100/month (Pinecone starter)
- Hosting: $50-200/month
- Total: $350-1100/month
Development:
- Hire 1 freelance AI engineer (part-time, 3 months): PLN 60-100k
- Or upskill own developer (training PLN 10k + 2 months learning on job)
Total investment: PLN 60-100k.
ROI examples for SME:
Example 1: Accounting office (20 people)
- Agent for answering client questions (accounting, taxes)
- Savings: 10h/week accountants time × PLN 150/h = PLN 1500/week = PLN 78k annually
- Agent cost: PLN 70k (development) + PLN 12k (API costs first year) = PLN 82k
- Break-even: 13 months. Then PLN 78k saving annually.
Example 2: E-commerce (PLN 5M GMV annually, 10 person team)
- Agent for customer support (where’s package? how to return?)
- Savings: 1 FTE customer support (PLN 100k annually)
- Agent cost: PLN 80k (development) + PLN 15k (API + infra)
- Break-even: 11 months.
Bottom line: If you see clear ROI (agent will save X hours of work), SME can afford it. Start small, measure ROI, scale gradually.
Next steps: Begin your journey with AI Agents
AI Agents aren’t science fiction. They’re tools that work in production in 2026 - in banks, e-commerce, software houses, fintechs. The question isn’t “whether”, but “when” your organisation will start using them.
If you’re Tech Lead or Engineering Manager: Your team needs these skills. Competitive pressure is growing. Companies that master AI Agents in 2026-2027 will have advantage for next 5 years.
If you’re CTO or IT Director: ROI is real. 60-80% savings in repetitive tasks, 3x productivity increase in many use cases. But requires investment in competencies and infrastructure. Start now to have first agent in production by end of 2026.
If you’re Developer: Learn these skills. Job market in 2-3 years will require knowledge of AI Agents like today it requires knowledge of APIs and cloud. This is your chance to upskill.
Ready to start?
Reserve place on training:
Check our AI Agents training programmes:
- AI Agents Fundamentals (3 days)
- Production AI Agents (2 days, advanced)
- LLM and Agent Architectures (2 days)
First dates already in March 2026. Places limited (max 12 people per group - hands-on requires small groups).
Need closed training for team?
Contact us: /en/contact
Tell us about your use case, we’ll select programme, prepare offer. 500+ experts, 2500+ conducted trainings, 4.8/5 rating. Trusted by ING, mBank, PKO BP, Allegro.
Want to learn more about AI Agents?
Subscribe to EITT newsletter - weekly latest insights from AI/ML, case studies, practical tips. Zero spam, only valuable content.
AI Agents are changing the rules of the game. Time to join the game.
Read Also
- NIS2 Training - What Your IT Team Needs to Know
- AI Act and Team Competencies - How to Prepare Your Company
- How to Recognize and Prevent Burnout in Your IT Team: A Manager’s Guide
Develop Your Skills
This article is related to the training Introduction to Enterprise Architect. Check the program and sign up to develop your skills with EITT experts.