Fine-Tuning LLMs — LoRA, QLoRA and DPO in Practice
Advanced training on fine-tuning language models. Covers when to fine-tune vs RAG vs prompt engineering, training data preparation, Supervised Fine-Tuning (SFT), Parameter-Efficient Fine-Tuning (PEFT) with LoRA and QLoRA, RLHF and DPO (Direct Preference Optimization), quantization (GPTQ, AWQ, GGUF), fine-tuned model evaluation, and deployment and serving.
Why choose this training?
Advanced training on fine-tuning language models. Covers when to fine-tune vs RAG vs prompt engineering, training data preparation, Supervised Fine-Tuning (SFT), Parameter-Efficient Fine-Tuning (PEFT) with LoRA and QLoRA, RLHF and DPO (Direct Preference Optimization), quantization (GPTQ, AWQ, GGUF), fine-tuned model evaluation, and deployment and serving. This training combines theoretical knowledge with intensive hands-on exercises, enabling participants to immediately apply their skills in daily work. The program is designed and delivered by practitioners with real-world experience.
What you will learn
You will gain comprehensive knowledge and practical skills in fine-tuning llms. The program covers all key aspects from foundational concepts through advanced techniques to real-world implementation patterns.
Through hands-on exercises and realistic scenarios, you will develop the ability to apply learned concepts in your organization. After completing the training, you will have actionable knowledge that translates directly into improved capabilities for your team and organization.
Fine-tuning vs RAG vs Prompt Engineering — when to use which
A critical decision in any LLM project is choosing the right approach:
| Approach | When to use | Cost | Latency | Accuracy gain |
|---|---|---|---|---|
| Prompt Engineering | Quick PoC, simple tasks | $0 | None | Limited |
| Few-shot prompting | Small examples, structured output | $0 | None | Moderate |
| RAG (Retrieval Augmented Generation) | Domain knowledge, frequently changing data | Low ($) | +500ms | High for facts |
| Fine-tuning (LoRA) | Task-specific behavior, brand voice, format | Medium ($$) | None | High for style |
| Full fine-tuning | Massive domain shift, performance critical | High ($$$$) | None | Highest |
Decision tree: Start with prompt engineering. If insufficient, add RAG (cheaper than fine-tuning). Fine-tune only when style/format/behavior must change permanently or when running at scale (>1M requests/month) where prompt token costs add up.
LoRA, QLoRA, and PEFT — efficient fine-tuning
Full fine-tuning of a 70B parameter model requires 280+ GB GPU memory ($10-50k cluster). Prohibitive for most organizations. PEFT (Parameter-Efficient Fine-Tuning) revolutionized this:
LoRA (Low-Rank Adaptation)
Instead of updating all model weights, freeze base model and add small “adapter” layers (~0.1% of original parameters). Train only adapters. Result: 99% less memory, similar performance.
QLoRA (Quantized LoRA)
Combine LoRA with 4-bit quantization of frozen base model. Fine-tune 70B on a single A100 (80GB) — practical for individual practitioners.
Hardware requirements 2026:
- 7B model + LoRA: single RTX 4090 (24GB) — $1500
- 13B + QLoRA: single A6000 (48GB) — $5000
- 70B + QLoRA: single A100 (80GB) — $15000
- 70B + full fine-tune: 8x H100 (640GB) — $300k+
RLHF and DPO — preference optimization
Beyond supervised fine-tuning (SFT), preference optimization aligns models with human values:
- RLHF (Reinforcement Learning from Human Feedback) — used to train ChatGPT-style models. Complex: needs reward model + PPO. Expensive.
- DPO (Direct Preference Optimization, 2023) — simpler alternative. Skip the reward model. Just need preference pairs (chosen vs rejected). 90% of RLHF results with 50% complexity.
DPO has become the de facto standard for preference tuning in 2024-2026. We’ll implement both in hands-on labs.
Quantization for deployment
Production inference cost scales with model size. Quantization (reducing precision from FP16 to INT8/INT4) reduces costs 4-8x with minimal accuracy loss:
- GPTQ — accurate post-training quantization
- AWQ — Activation-aware quantization, faster
- GGUF — for llama.cpp, runs on CPUs and consumer GPUs
We’ll quantize fine-tuned models and benchmark accuracy/speed tradeoffs.
Real-world fine-tuning use cases
Hands-on projects we’ll complete:
- Customer support chatbot — fine-tune Mistral-7B on company FAQ + tone of voice
- Code generation — fine-tune CodeLlama on internal codebase patterns
- Document summarization — DPO-tune Llama-3 on company-style executive summaries
- SQL generation — fine-tune for specific database schema (text-to-SQL)
- Multilingual support — fine-tune for Polish/Swedish business contexts
Benefits
- Understand and apply fine-tuning llms
- Design and implement solutions based on best practices
- Evaluate tools and approaches
- Build implementation roadmap
- Integrate with existing processes
- Measure and optimize outcomes
Who is this training for?
Prerequisites
- IT experience or domain expertise
- Basic understanding of AI concepts is helpful
- Willingness to learn and experiment
Training program
Module 1: Advanced training on fine-tuning language models. Covers when to fine-
- Overview and key concepts — Advanced training on fine-tuning language models.
- Architecture and design decisions
- Hands-on implementation — workshop
- Configuration and optimization
- Best practices and common pitfalls
- Real-world case studies
Module 2: training data preparation
- Overview and key concepts — training data preparation
- Architecture and design decisions
- Hands-on implementation — workshop
- Configuration and optimization
- Best practices and common pitfalls
- Real-world case studies
Module 3: Supervised Fine-Tuning (SFT)
- Overview and key concepts — Supervised Fine-Tuning (SFT)
- Architecture and design decisions
- Hands-on implementation — workshop
- Configuration and optimization
- Best practices and common pitfalls
- Real-world case studies
Module 4: Parameter-Efficient Fine-Tuning (PEFT) with LoRA and QLoRA
- Overview and key concepts — Parameter-Efficient Fine-Tuning (PEFT) with LoRA a
- Architecture and design decisions
- Hands-on implementation — workshop
- Configuration and optimization
- Best practices and common pitfalls
- Real-world case studies
Module 5: RLHF and DPO (Direct Preference Optimization)
- Overview and key concepts — RLHF and DPO (Direct Preference Optimization)
- Architecture and design decisions
- Hands-on implementation — workshop
- Configuration and optimization
- Best practices and common pitfalls
- Real-world case studies
Delivery Methods
Online
- Convenience of participating from anywhere
- Interactive live sessions with trainer
- Materials available for 30 days
- No travel costs
On-site
- Direct contact with trainer and group
- Intensive hands-on workshops
- Networking with other participants
- Full focus on learning
Frequently asked questions
Is this training suitable for my level?
This training is at advanced. Check prerequisites above.
What practical exercises are included?
Hands-on exercises in prepared lab environment with realistic scenarios.
Will I receive a certificate?
Yes — EITT certificate of completion plus comprehensive materials.
Why choose EITT?
500+ experts, 2500+ trainings, 4.8/5 rating. Practitioners with real-world experience.
Request a quote
Funding Options
Check funding options for your company
Development Services Database
Up to 80% funding for SMEs from EU funds
Check availabilityNational Training Fund
Up to 100% funding for employers
Learn moreTrusted by
We train teams at Poland's largest companies
Interested in this training?
Contact us - we'll prepare an offer tailored to your organization's needs.