If your SQL Server is slow and you do not know where to start, start here.
Wait statistics are the most direct signal SQL Server gives you about what it is actually waiting on. Not what you think it is waiting on. Not what your monitoring dashboard says. What SQL Server itself is tracking, every millisecond, about where time is going.
The problem is the raw output. Run SELECT * FROM sys.dm_os_wait_stats and you get 900+ rows. Most of them are noise — internal system waits that are completely normal and have nothing to do with your performance problem. The signal is buried in there, but it takes method to find it.
This post gives you that method. It also gives you a reference list of the 15 wait types that actually matter in production environments, with what each one tells you and where to look first.
The Method: Five Steps
Step 1 — Clear the Baseline
sys.dm_os_wait_stats is cumulative from the last SQL Server service start. If your server has been running for three years, that data includes everything that has ever happened — good days, bad days, that batch job someone ran once in 2023. To get a useful picture, clear the counters and let them accumulate fresh.
-- Clear wait statistics (requires ALTER SERVER STATE or sysadmin)
DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR);
Then wait. The accumulation period depends on what you are diagnosing. For an actively slow server, 15–30 minutes is usually enough. For a problem that only happens overnight, run the clear before you leave and query it in the morning.
Important: clearing wait stats on a production server is safe — it does not change any SQL Server configuration or affect query execution — it simply resets the diagnostic counters. Safe to run, but follow your normal change process if your environment requires it.
Step 2 — Query the Top Waits
After accumulation, query what is at the top:
SELECT TOP 20
wait_type,
waiting_tasks_count,
wait_time_ms,
max_wait_time_ms,
signal_wait_time_ms,
wait_time_ms - signal_wait_time_ms AS resource_wait_time_ms,
CAST(100.0 * wait_time_ms / SUM(wait_time_ms) OVER () AS DECIMAL(5,2)) AS pct_of_total
FROM sys.dm_os_wait_stats
WHERE wait_type NOT IN (
-- Benign system waits — filter these out
'SLEEP_TASK', 'SLEEP_SYSTEMTASK', 'SLEEP_DBSTARTUP', 'SLEEP_DBTASK',
'SLEEP_TEMPDBSTARTUP', 'SLEEP_MASTERDBREADY', 'SLEEP_MASTERMDREADY',
'SLEEP_MASTERUPGRADED', 'SLEEP_MSDBSTARTUP', 'SLEEP_TEMPDBSTARTUP',
'SLEEP_REPLICATION_MONITOR', 'SLEEP_TRANS_ROLLED_BACK',
'SLEEP_WORKER_POOL', 'WAIT_XTP_OFFLINE_CKPT_NEW_LOG',
'DISPATCHER_QUEUE_SEMAPHORE', 'FT_IFTS_SCHEDULER_IDLE_WAIT',
'REQUEST_FOR_DEADLOCK_SEARCH', 'RESOURCE_QUEUE',
'SERVER_IDLE_CHECK', 'SLEEP_DBSTARTUP', 'SLEEP_DBTASK',
'SLEEP_MASTERDBREADY', 'SLEEP_MASTERMDREADY', 'SLEEP_MASTERUPGRADED',
'SLEEP_MSDBSTARTUP', 'SLEEP_SYSTEMTASK', 'SLEEP_TEMPDBSTARTUP',
'SNI_HTTP_ACCEPT', 'SP_SERVER_DIAGNOSTICS_SLEEP',
'SQLTRACE_BUFFER_FLUSH', 'WAITFOR', 'XE_DISPATCHER_WAIT',
'XE_TIMER_EVENT', 'BROKER_TO_FLUSH', 'BROKER_TASK_STOP',
'CLR_AUTO_EVENT', 'CLR_MANUAL_EVENT', 'DBMIRROR_EVENTS_QUEUE',
'SQLTRACE_INCREMENTAL_FLUSH_SLEEP', 'HADR_WORK_QUEUE',
'ONDEMAND_TASK_MANAGER', 'REQUEST_DISPATCHER_QUEUE_SEMAPHORE',
'RESOURCE_QUEUE', 'SERVER_IDLE_CHECK', 'WAIT_XTP_HOST_WAIT',
'WAIT_XTP_OFFLINE_CKPT_NEW_LOG', 'KSOURCE_WAKEUP', 'LAZYWRITER_SLEEP'
)
ORDER BY wait_time_ms DESC;
The pct_of_total column tells you how dominant each wait type is. A single wait type sitting at 60%+ is a strong signal. Three wait types each sitting at 15–25% suggests multiple contributing factors.
Step 3 — Note the Signal vs Resource Split
The query above separates signal_wait_time_ms from resource_wait_time_ms. This distinction matters.
- Resource wait time is time spent waiting for an actual resource — a lock, a disk I/O, memory, network.
- Signal wait time is time spent in the runnable queue after being signalled but before actually getting a CPU.
High signal wait time relative to total wait time means CPU pressure — the queries are ready to run but cannot get a turn. High resource wait time means the resource itself is the bottleneck.
Step 4 — Correlate with Current Sessions
The DMV shows you aggregate waits. To see what is waiting right now, query the session-level waits:
SELECT
r.session_id,
r.status,
r.wait_type,
r.wait_time / 1000.0 AS wait_time_sec,
r.blocking_session_id,
DB_NAME(r.database_id) AS database_name,
SUBSTRING(st.text, (r.statement_start_offset/2)+1,
((CASE r.statement_end_offset WHEN -1 THEN DATALENGTH(st.text)
ELSE r.statement_end_offset END - r.statement_start_offset)/2)+1) AS current_statement
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) st
WHERE r.session_id <> @@SPID
AND r.wait_type IS NOT NULL
ORDER BY r.wait_time DESC;
Step 5 — Map to Root Cause
This is where the reference section below comes in. Once you know your dominant wait types, you map them to their known causes and work from there.
The 15 Wait Types That Matter
PAGEIOLATCH_SH / PAGEIOLATCH_EX
What it is: SQL Server is waiting for a page to be read from (SH) or written to (EX) disk. These are physical I/O waits.
What it tells you: Your buffer pool is not large enough to keep the working set in memory. SQL Server is hitting disk for data pages. This is the single most common performance problem on production servers with insufficient RAM.
First thing to check:
-- Check current I/O pressure
SELECT
DB_NAME(vfs.database_id) AS database_name,
mf.physical_name,
vfs.io_stall_read_ms,
vfs.io_stall_write_ms,
vfs.num_of_reads,
vfs.num_of_writes,
CASE WHEN vfs.num_of_reads > 0
THEN vfs.io_stall_read_ms / vfs.num_of_reads
ELSE 0 END AS avg_read_ms,
CASE WHEN vfs.num_of_writes > 0
THEN vfs.io_stall_write_ms / vfs.num_of_writes
ELSE 0 END AS avg_write_ms
FROM sys.dm_io_virtual_file_stats(NULL, NULL) vfs
JOIN sys.master_files mf ON vfs.database_id = mf.database_id
AND vfs.file_id = mf.file_id
ORDER BY vfs.io_stall_read_ms + vfs.io_stall_write_ms DESC;
Average read times above 20ms on data files are slow. Above 50ms is a storage problem. Correlate with PLE — if both are showing pressure, you need more memory before anything else changes.
CXPACKET / CXCONSUMER
What it is: Parallelism waits. CXPACKET means threads in a parallel query are waiting for each other (the coordinator waiting for worker threads, or skewed work distribution). CXCONSUMER (introduced in SQL Server 2016 SP2+) is a benign subset — threads waiting on the consumer side of a parallel exchange.
What it tells you: CXPACKET by itself does not mean parallelism is bad. It means parallelism exists. If CXPACKET is very high, it usually indicates skewed parallel work distribution (one thread doing most of the work while others wait), or MAXDOP is unconstrained and large parallel queries are dominating the workload.
First thing to check: Your MAXDOP and Cost Threshold for Parallelism settings.
SELECT name, value_in_use, description
FROM sys.configurations
WHERE name IN ('max degree of parallelism', 'cost threshold for parallelism');
MAXDOP 0 means unlimited parallelism — every large query can use every core. Cost Threshold for Parallelism at the default value of 5 means almost every query gets a parallel plan. Raise cost threshold to 25–50 first. Set MAXDOP based on your NUMA node core count, capped at 8.
LCKM* (Lock waits)
What it is: Blocking. A session is waiting to acquire a lock that another session holds. LCK_M_S is a shared lock wait, LCK_M_X is exclusive, LCK_M_U is update.
What it tells you: Either a long-running transaction is holding locks too long, queries are taking a long time due to poor indexing (larger lock scope), or the application is doing unnecessary lock escalation.
First thing to check: Who is blocking whom right now:
SELECT
blocking.session_id AS blocking_session,
blocked.session_id AS blocked_session,
blocked.wait_time / 1000.0 AS blocked_seconds,
blocked.wait_type,
DB_NAME(blocked.database_id) AS database_name,
SUBSTRING(bt.text, (blocked.statement_start_offset/2)+1,
((CASE blocked.statement_end_offset WHEN -1 THEN DATALENGTH(bt.text)
ELSE blocked.statement_end_offset END - blocked.statement_start_offset)/2)+1) AS blocked_statement,
SUBSTRING(bkt.text, (blocking.statement_start_offset/2)+1,
((CASE blocking.statement_end_offset WHEN -1 THEN DATALENGTH(bkt.text)
ELSE blocking.statement_end_offset END - blocking.statement_start_offset)/2)+1) AS blocking_statement
FROM sys.dm_exec_requests blocked
JOIN sys.dm_exec_requests blocking
ON blocked.blocking_session_id = blocking.session_id
CROSS APPLY sys.dm_exec_sql_text(blocked.sql_handle) bt
CROSS APPLY sys.dm_exec_sql_text(blocking.sql_handle) bkt
WHERE blocked.blocking_session_id > 0;
WRITELOG
What it is: Waiting for the transaction log write to complete and be hardened to disk.
What it tells you: Your log disk I/O is the bottleneck. Every committed transaction must wait for the log record to be written to disk — this is by design (it guarantees durability). If your log is on a slow disk, or shared with other heavy-write workloads, this shows up as WRITELOG.
First thing to check: Where is your log file physically located? Is it on a dedicated LUN/drive, or sharing with data files or tempdb? Log writes are sequential — they benefit enormously from fast, dedicated storage. A log file on a spinning disk shared with data files is a common root cause.
SELECT
DB_NAME(database_id) AS database_name,
physical_name,
type_desc
FROM sys.master_files
WHERE type = 1 -- log files only
ORDER BY DB_NAME(database_id);
SOS_SCHEDULER_YIELD
What it is: A thread voluntarily yielded the scheduler to allow other work to run, then waited to be rescheduled.
What it tells you: CPU pressure. Queries are getting a lot of work done on the CPU but are competing for scheduler time. High SOS_SCHEDULER_YIELD combined with high signal waits (from Step 3) is a strong indicator your server is CPU-bound, not I/O-bound.
First thing to check: Is the CPU running hot? Which queries are burning the most CPU?
SELECT TOP 20
qs.total_worker_time / qs.execution_count AS avg_cpu_time_us,
qs.total_worker_time,
qs.execution_count,
SUBSTRING(st.text, (qs.statement_start_offset/2)+1,
((CASE qs.statement_end_offset WHEN -1 THEN DATALENGTH(st.text)
ELSE qs.statement_end_offset END - qs.statement_start_offset)/2)+1) AS statement_text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st
ORDER BY qs.total_worker_time DESC;
RESOURCE_SEMAPHORE
What it is: A query is waiting to be granted a memory allocation for its execution grant. SQL Server sets aside memory for sort operations, hash joins, and other memory-intensive operations — if too many queries need this memory simultaneously, they queue.
What it tells you: Query memory grants are being refused or queued. This often means a small number of poorly-tuned queries are requesting very large memory grants — either because they have stale statistics (causing overestimates), or because they genuinely need a lot of memory and your max server memory is too low or other allocations are crowding them out.
First thing to check:
SELECT
session_id,
requested_memory_kb,
granted_memory_kb,
used_memory_kb,
queue_id,
wait_order,
wait_time_ms,
is_next_candidate
FROM sys.dm_exec_query_memory_grants
ORDER BY requested_memory_kb DESC;
Queries with granted_memory_kb = NULL are waiting. Queries that have been granted but used_memory_kb is much lower than granted_memory_kb are over-estimating their needs — that usually means stale statistics.
ASYNC_NETWORK_IO
What it is: SQL Server has returned results to the client but the client has not consumed them yet. SQL Server is waiting for the network buffer to clear.
What it tells you: The client application is not reading results fast enough, or there is a network bottleneck between the application and the database server. This is often a symptom of application code doing something slow while iterating a result set — processing each row in a loop while holding the query open, for example.
First thing to check: Which sessions are showing ASYNC_NETWORK_IO and what are they running? If it is the same application query repeatedly, look at the application code — particularly any pattern that fetches a large result set and processes it row by row.
IO_COMPLETION
What it is: Waiting for a non-data-page I/O to complete — typically backup reads/writes, or restore operations.
What it tells you: Usually benign during backups or restores. If IO_COMPLETION is high during normal operations (not a backup window), investigate whether there is unusual file I/O happening — DBCC CHECKDB, log shipping, mirroring data movement, or a VLF fragmentation problem on the log file.
HADR_SYNC_COMMIT
What it is: Waiting for a synchronous secondary replica in an Always On Availability Group to acknowledge a log hardening. The primary cannot commit the transaction until the secondary confirms it has hardened the log record.
What it tells you: Your synchronous secondary replica is introducing commit latency on the primary. This is by design — synchronous commit guarantees zero data loss, which means the primary waits. If HADR_SYNC_COMMIT is dominant, your secondary is lagging.
First thing to check:
SELECT
ag.name AS ag_name,
ars.role_desc,
DB_NAME(drs.database_id) AS database_name,
drs.log_send_queue_size,
drs.redo_queue_size,
drs.synchronization_state_desc,
drs.synchronization_health_desc
FROM sys.dm_hadr_database_replica_states drs
JOIN sys.availability_replicas ar ON drs.replica_id = ar.replica_id
JOIN sys.availability_groups ag ON ar.group_id = ag.group_id
JOIN sys.dm_hadr_availability_replica_states ars ON ars.replica_id = ar.replica_id;
High log_send_queue_size means the primary is generating log faster than it can get to the secondary. Either the network between primary and secondary is undersized, or the secondary server is too slow to redo the log.
PAGELATCH_UP
What it is: In-memory latch contention on allocation pages (PFS, GAM, SGAM pages) in the buffer pool — specifically on update operations.
What it tells you: Often a sign of allocation contention, frequently on tempdb. If PAGELATCH_UP is high and involves tempdb files, check your tempdb data file count. Best practice is to have one data file per logical CPU up to 8 files — on a server with 16 cores running a single tempdb data file, allocation contention can be severe.
LATCH_* (non-page latches)
What it is: Internal synchronisation waits on non-page structures — plan cache, allocation bitmaps, etc.
What it tells you: Depends on the specific latch class. High LATCH_EX on ACCESS_METHODS_DATASET_PARENT often indicates scanning activity. High LATCH_EX on ACCESS_METHODS_HOBT_VIRTUAL_ROOT suggests B-tree root page contention — usually a symptom of a heavily-inserted narrow table with a sequential key and many concurrent inserts.
CMEMTHREAD
What it is: Threads competing to allocate memory from a shared memory object.
What it tells you: Memory allocation contention — usually on tempdb or a specific memory clerk. Often seen alongside PAGELATCH_UP on tempdb allocation pages. If CMEMTHREAD is significant, correlate with the tempdb contention indicators.
THREADPOOL
What it is: No worker thread was available to pick up the request — the server’s thread pool is exhausted.
What it tells you: You have more concurrent sessions than worker threads, or worker threads are being held by long-running transactions. This is a serious condition — it means some connections cannot even start executing. Check max worker threads configuration and look for long-running blocking chains.
SLEEP_* waits (the exceptions)
Most SLEEP_ waits are pure noise and belong in the filter list. The one exception is SLEEP_DBSTARTUP seen outside of a startup event, or any SLEEP wait showing unusually high accumulated time — these can occasionally indicate a configuration or recovery issue worth investigating.
Common Traps
Trap 1: Acting on point-in-time data. A single snapshot of wait stats tells you what was happening when you looked. If the problem is intermittent, you need to capture before and after a known slow period — clear the baseline, let it run through the slow window, then query. A snapshot taken mid-afternoon when the server is idle tells you nothing about the 8am peak.
Trap 2: Filtering too aggressively. The benign wait filter above is appropriate for most environments. But on some servers — particularly those with specific workloads or features enabled — wait types that are usually benign can be non-trivial. If you have filtered everything and your total wait time seems very low, re-check what you filtered.
Trap 3: Chasing the wrong wait type. CXPACKET is the classic example. It is almost always in the top 5 by total wait time on any server running parallel queries. It is not a problem by itself — it is a symptom. Chasing CXPACKET by blindly lowering MAXDOP to 1 often makes performance worse, not better. Read the wait type before acting on it.
Trap 4: Ignoring the ratio. A wait type at 80% of total wait time on a server that processes 500 queries per second is very different from the same wait type on a server processing 10 queries per second. The percentage tells you relative priority, not absolute severity. Always factor in workload volume.
The Practical Takeaway
Wait statistics are not a silver bullet, but they are the closest thing SQL Server has to telling you directly what is wrong. The methodology is simple:
- Clear the baseline.
- Let it accumulate through the problem period.
- Filter the noise.
- Find your dominant wait type.
- Map it to the reference above and look in the right place.
The common mistake is skipping step 1 (looking at historical accumulation) and step 2 (accumulating through the actual problem). Point-in-time snapshots on a server you just started investigating are nearly worthless. Baseline, accumulate, query. In that order.
AIDA Diagnostics analyses wait statistics as part of its 150+ check assessment — identifying dominant waits, correlating them with configuration settings, and providing ranked remediation guidance. $495 per server instance. Try AIDA Recon free for a no-cost baseline of your environment.
