Skip to content
5 min read

Data warehouse — what is it and how does it work?

What is a data warehouse? Learn the architecture, differences vs data lake, popular solutions (Snowflake, BigQuery, Redshift) and applications in business analytics.

Author: EITT Team

A data warehouse is the foundation of modern business analytics. If your organization makes decisions based on data — and it should — sooner or later you will need a data warehouse.

What is a data warehouse?

A data warehouse is a specialized database system designed for analytics and reporting. Unlike operational databases (OLTP), which handle ongoing transactions (orders, payments, registrations), a data warehouse stores historical data optimized for analytical queries (OLAP).

Key characteristics of a data warehouse:

  • Subject-oriented — data is organized around key business areas (sales, customers, products), not operational processes
  • Integrated — data from multiple sources (CRM, ERP, web systems, IoT) unified in a single format
  • Non-volatile — historical data is not overwritten, enabling trend analysis
  • Time-variant — every record contains a timestamp, allowing you to track changes over time

Data warehouse architecture

Classic three-tier architecture

  1. Data source layer — operational systems (CRM, ERP, logs, APIs), files (CSV, JSON), data streams (Kafka)
  2. Integration layer (ETL/ELT) — processes for extracting, transforming, and loading data
  3. Presentation layer — BI tools (Tableau, Power BI, Looker), reports, dashboards

ETL vs ELT

ETL (Extract, Transform, Load) — the classic approach. Data is transformed BEFORE loading into the warehouse. Requires separate infrastructure for transformation. Popular tools: Informatica, Talend, SSIS.

ELT (Extract, Load, Transform) — the modern approach. Data is loaded into the warehouse in raw form, and transformation happens inside the warehouse (leveraging its computing power). Popular tools: dbt, Fivetran, Airbyte.

In 2026, ELT dominates new projects — especially in the cloud, where computing power is elastically scalable.

Data models

  • Star schema — a central fact table surrounded by dimension tables. Simple, fast for queries. The standard in data warehousing.
  • Snowflake schema — an extension of the star schema with normalized dimension tables. Saves space but results in slower queries.
  • Data vault — a data modeling approach with separation into Hub, Link, and Satellite. Flexible, auditable, popular in enterprise environments.

Data warehouse vs data lake vs data lakehouse

FeatureData WarehouseData LakeData Lakehouse
DataStructuredRaw (any format)Both types
SchemaSchema-on-writeSchema-on-readHybrid
UsersBI analysts, managementData scientists, ML engineersEveryone
CostHigh (storage + compute)Low (cheap storage)Medium
QueriesSQL, fastSpark, slowerSQL + Spark
ExamplesSnowflake, BigQuery, RedshiftS3, ADLS, GCSDatabricks, Delta Lake, Apache Iceberg
Data qualityHigh (ETL guarantees)Low (raw data)High (governance layer)

Data lakehouse is the latest trend (2024-2026) — combining the cheap storage of a data lake with the transactional consistency and performance of a data warehouse. Key technologies: Delta Lake (Databricks), Apache Iceberg, Apache Hudi.

Snowflake

  • Architecture: separation of storage and compute (independent scaling)
  • Advantages: multi-cloud (AWS, Azure, GCP), elastic scaling, data sharing
  • Pricing: pay-per-query (credits), storage billed separately
  • Best for: companies needing flexibility and multi-cloud support

Google BigQuery

  • Architecture: serverless, columnar database
  • Advantages: zero infrastructure management, fast queries on petabytes, built-in ML (BigQuery ML)
  • Pricing: pay-per-query or flat-rate
  • Best for: companies in the Google Cloud ecosystem, fast ad-hoc analyses

Amazon Redshift

  • Architecture: MPP (Massively Parallel Processing) cluster
  • Advantages: deep integration with AWS (S3, Glue, SageMaker), Redshift Serverless
  • Pricing: per-node or serverless (per-query)
  • Best for: companies with existing AWS infrastructure

Azure Synapse Analytics

  • Architecture: unified analytics platform (SQL + Spark)
  • Advantages: integration with Microsoft 365, Power BI, Azure ML
  • Pricing: per-compute-unit
  • Best for: companies in the Microsoft ecosystem

Cost comparison (approximate, 1 TB of data)

SolutionEstimated monthly cost
BigQuery (on-demand)$200–500
Snowflake (standard)$300–800
Redshift (dc2.large)$250–600
Azure Synapse$300–700

When does an organization need a data warehouse?

A data warehouse is justified when:

  1. Data from multiple sources — CRM, ERP, web systems, marketing automation — you need a single source of truth
  2. Reports take too long — analytical queries are slowing down operational databases
  3. Data inconsistency — different departments report different numbers for the same metrics
  4. Need for historical analysis — trends, forecasts, year-over-year comparisons
  5. Compliance regulations — requirement for data auditing and archiving

How to get started?

Step 1: Define business objectives

Do not build a warehouse “because everyone has one.” Determine: what business questions do you want to answer? What reports do you need?

Step 2: Inventory your data sources

Where will data flow from? CRM (Salesforce, HubSpot), ERP (SAP, Dynamics), web systems (Google Analytics), Excel files?

Step 3: Choose a platform

For new projects in 2026: Snowflake or BigQuery. If you are on AWS: Redshift. On Microsoft: Azure Synapse.

Step 4: Design the data model

Start with a star schema — simple, fast, understandable. Use data vault for enterprise with multiple sources.

Step 5: Build the ELT pipeline

dbt + Fivetran/Airbyte is the most popular stack in 2026. Open-source alternative: dbt + Airbyte + Airflow.

Step 6: Connect a BI tool

Tableau, Power BI, Looker, or Metabase (open-source). Business users need dashboards, not SQL.

Data warehouse and analytics training

At EITT, we offer training in database and analytics technologies — from SQL fundamentals to advanced data warehousing and BI tools. Explore our database training catalog.

If you are planning a data warehouse implementation in your organization, our training programs will help your team gain the practical skills needed to design, build, and maintain a data warehouse.

Request a quote

Develop Your Competencies

Check out our training and workshop offerings.

Request Training
Call us +48 22 487 84 90