What Are Always On Availability Groups?

Introduced in SQL Server 2012, Always On Availability Groups (AG) is a high availability and disaster recovery feature that allows a set of databases to fail over together as a unit. It builds on Windows Server Failover Clustering (WSFC) but extends far beyond traditional failover clustering — you don't need shared storage, and secondaries can serve as readable replicas, offloading read workloads from the primary.

Always On AG — Architecture Overview
  ┌─────────────────────────────────────────────────────────┐
  │              Windows Server Failover Cluster             │
  │                                                          │
  │  ┌──────────────────┐      ┌──────────────────────────┐ │
  │  │   PRIMARY NODE   │      │    SECONDARY NODE(S)     │ │
  │  │                  │      │                          │ │
  │  │  SQL Server      │ ───► │  SQL Server              │ │
  │  │  (Read/Write)    │      │  (Read-Only / Standby)   │ │
  │  │                  │ ◄─── │                          │ │
  │  │  ┌────────────┐  │      │  ┌────────────────────┐  │ │
  │  │  │  DB_1      │  │      │  │  DB_1  (replica)   │  │ │
  │  │  │  DB_2      │  │      │  │  DB_2  (replica)   │  │ │
  │  │  │  DB_3      │  │      │  │  DB_3  (replica)   │  │ │
  │  │  └────────────┘  │      │  └────────────────────┘  │ │
  │  └──────────────────┘      └──────────────────────────┘ │
  │           │                            │                 │
  │           └──────────┬─────────────────┘                 │
  │                      │                                   │
  │            AG LISTENER (Virtual IP)                      │
  │         Applications connect here →                      │
  └─────────────────────────────────────────────────────────┘
        

Prerequisites

Before configuring Always On AG, your environment must meet these requirements:

SQL Server 2022 introduced Contained Availability Groups, which include their own system databases (logins, SQL Agent jobs, etc.), eliminating the need to manually synchronise these objects across replicas.

Step 1 — Configure the Windows Failover Cluster

Always On AG requires an underlying WSFC even if you don't use shared storage. All SQL Server nodes must be cluster members.

PowerShell — Install Failover Clustering feature on each node
Install-WindowsFeature -Name Failover-Clustering -IncludeManagementTools

# Validate cluster configuration (run before creating)
Test-Cluster -Node "SQL-NODE1", "SQL-NODE2" -Include "Inventory", "Network", "System Configuration"

# Create the cluster (no shared storage needed for AG)
New-Cluster -Name "SQL-CLUSTER" `
            -Node "SQL-NODE1", "SQL-NODE2" `
            -StaticAddress "10.0.0.50" `
            -NoStorage

Step 2 — Enable Always On on Each SQL Instance

PowerShell — Enable Always On via SQL Server Configuration Manager
Enable-SqlAlwaysOn -ServerInstance "SQL-NODE1" -Force
Enable-SqlAlwaysOn -ServerInstance "SQL-NODE2" -Force

# Restart SQL Server service after enabling
Restart-Service -Name "MSSQLSERVER" -Force

Step 3 — Create Database Mirroring Endpoints

Each SQL Server instance needs a dedicated endpoint for AG traffic. This is the communication channel between replicas.

T-SQL — Run on each replica node
-- Create the endpoint (run on SQL-NODE1)
CREATE ENDPOINT [Hadr_endpoint]
    STATE = STARTED
    AS TCP (LISTENER_PORT = 5022)
    FOR DATABASE_MIRRORING (
        ROLE = ALL,
        AUTHENTICATION = WINDOWS NEGOTIATE,
        ENCRYPTION = REQUIRED ALGORITHM AES
    );

-- Grant CONNECT permission to the SQL Server service account
GRANT CONNECT ON ENDPOINT::[Hadr_endpoint]
    TO [DOMAIN\SQLServiceAccount];

Synchronisation Modes

Understanding synchronisation modes is critical — the wrong choice can mean data loss or performance degradation.

Synchronous

Synchronous Commit

  • Transaction commits only after log is hardened on secondary
  • Zero data loss (RPO = 0)
  • Automatic failover supported
  • Higher latency on primary writes
  • Best for: same datacenter or low-latency links (<5ms RTT)
Asynchronous

Asynchronous Commit

  • Transaction commits without waiting for secondary
  • Possible data loss (RPO > 0)
  • Only manual failover (forced)
  • No latency impact on primary
  • Best for: DR sites, high-latency WAN links

A common mistake is configuring synchronous commit to a geographically distant secondary over a high-latency WAN link. Every write transaction pays the RTT penalty. If latency to the secondary is >5ms, use asynchronous commit for that replica.

Step 4 — Create the Availability Group

T-SQL — Run on the primary (SQL-NODE1)
-- Take full backup first (required)
BACKUP DATABASE [YourDB]
    TO DISK = '\\backup-share\YourDB_full.bak'
    WITH FORMAT, COMPRESSION;

BACKUP LOG [YourDB]
    TO DISK = '\\backup-share\YourDB_log.bak';

-- Create the Availability Group
CREATE AVAILABILITY GROUP [AG_Production]
    WITH (
        AUTOMATED_BACKUP_PREFERENCE = SECONDARY,
        DB_FAILOVER = ON,
        FAILURE_CONDITION_LEVEL = 3
    )
    FOR DATABASE [YourDB], [YourDB2]
    REPLICA ON
        N'SQL-NODE1' WITH (
            ENDPOINT_URL = N'TCP://SQL-NODE1.domain.local:5022',
            AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
            FAILOVER_MODE = AUTOMATIC,
            SEEDING_MODE = AUTOMATIC,
            SECONDARY_ROLE (ALLOW_CONNECTIONS = NO)
        ),
        N'SQL-NODE2' WITH (
            ENDPOINT_URL = N'TCP://SQL-NODE2.domain.local:5022',
            AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
            FAILOVER_MODE = AUTOMATIC,
            SEEDING_MODE = AUTOMATIC,
            SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY)
        );

-- Join secondary to the AG (run on SQL-NODE2)
ALTER AVAILABILITY GROUP [AG_Production] JOIN;
ALTER AVAILABILITY GROUP [AG_Production] GRANT CREATE ANY DATABASE;

Step 5 — Configure the AG Listener

The listener is a virtual network name (VNN) and IP address that applications connect to. It abstracts the primary replica — after a failover, applications reconnect to the same listener and are automatically routed to the new primary.

T-SQL — Create the AG Listener
ALTER AVAILABILITY GROUP [AG_Production]
ADD LISTENER N'AG-LISTENER' (
    WITH IP (
        (N'10.0.0.60', N'255.255.255.0')  -- listener IP and subnet mask
    ),
    PORT = 1433
);

-- Verify listener
SELECT dns_name, port, ip_configuration_string_from_cluster
FROM sys.availability_group_listeners;

Applications should connect to the listener name, not the individual server names. Include ApplicationIntent=ReadOnly in the connection string to route read workloads to the secondary replica automatically.

Step 6 — Monitor AG Health

T-SQL — Key monitoring queries
-- Overall AG health
SELECT
    ag.name                         AS ag_name,
    ar.replica_server_name,
    ars.role_desc,
    ars.operational_state_desc,
    ars.synchronization_health_desc,
    ars.connected_state_desc
FROM sys.availability_groups ag
JOIN sys.availability_replicas ar       ON ag.group_id = ar.group_id
JOIN sys.dm_hadr_availability_replica_states ars ON ar.replica_id = ars.replica_id;

-- Redo queue and send queue (lag indicators)
SELECT
    db_name(drs.database_id)        AS database_name,
    ar.replica_server_name,
    drs.synchronization_state_desc,
    drs.log_send_queue_size         AS send_queue_kb,
    drs.redo_queue_size             AS redo_queue_kb,
    drs.log_send_rate               AS send_rate_kb_s,
    drs.redo_rate                   AS redo_rate_kb_s,
    drs.last_commit_time
FROM sys.dm_hadr_database_replica_states drs
JOIN sys.availability_replicas ar ON drs.replica_id = ar.replica_id
ORDER BY drs.redo_queue_size DESC;

Failover Types

Automatic Failover

Occurs when the primary becomes unavailable and a synchronous secondary detects the failure via the WSFC health check. No data loss. Requires synchronous commit mode and automatic failover mode configured on both replicas.

Manual Planned Failover

Used for maintenance windows — patch the primary, failover to secondary, patch the old primary. Zero data loss.

T-SQL — Planned manual failover (run on target secondary)
-- Verify secondary is synchronised first
SELECT synchronization_state_desc FROM sys.dm_hadr_database_replica_states
WHERE is_local = 1;

-- Perform planned failover (run on the secondary you want to promote)
ALTER AVAILABILITY GROUP [AG_Production] FAILOVER;

Forced Failover (DR)

Used when the primary is unrecoverable and the secondary is asynchronous. May result in data loss. Use only when no other option is available.

T-SQL — Forced failover with potential data loss
-- Only use when primary is permanently lost
ALTER AVAILABILITY GROUP [AG_Production]
    FORCE_FAILOVER_ALLOW_DATA_LOSS;

After a forced failover, the old primary (if it comes back online) will be in a SUSPENDED state. Do not bring it back into the AG until you've verified data consistency and resolved any divergence between the two nodes.

Common Pitfalls to Avoid


Summary

Always On Availability Groups is the most robust HA/DR solution for SQL Server, but it requires careful planning — from WSFC configuration to choosing the right synchronisation mode for each replica. With the architecture correctly in place, you gain automatic failover, readable secondaries, and a solid foundation for zero-downtime maintenance.

Need help designing or implementing an Always On cluster for your environment? Contact our SQL Server team — we've designed and deployed AG clusters from 2-node setups to multi-site enterprise configurations.

JR

João Rocha

Database Administrator and Software Consultant at Bytedivision with 20+ years of experience in SQL Server, PostgreSQL, MongoDB and Azure cloud solutions.