Índice do SQL Server vs estatística

Quais são as diferenças entre CREATE INDEXe CREATE STATISTICSe quando devo usar cada um?

sql-server index statistics

— Scott
fonte

Os índices armazenam dados reais (páginas de dados ou páginas de índice, dependendo do tipo de índice sobre o qual estamos falando), e o Statistics armazena a distribuição de dados. Portanto, CREATE INDEXserá o DDL para criar um índice (clusterizado, não clusterizado etc.) e CREATE STATISTICSé o DDL para criar as estatísticas nas colunas na tabela.

Eu recomendo que você leia sobre esses aspectos dos dados relacionais. Abaixo estão alguns artigos introdutórios para iniciantes. Esses são tópicos muito amplos e, portanto, as informações sobre eles podem ser muito amplas e profundas. Leia abaixo a idéia geral deles e faça perguntas mais específicas quando surgirem.

Referência de BOL na organização de tabelas e índices
Referência de BOL na estrutura de índice
em cluster Referência de BOL em estruturas de índice não clusterizadas
SQL Server Central na Introdução aos índices
Referência de BOL em estatísticas

Aqui está um exemplo de trabalho para ver essas duas partes em ação (comentadas para explicar):

use testdb;
go

create table MyTable1
(
    id int identity(1, 1) not null,
    my_int_col int not null
);
go

insert into MyTable1(my_int_col)
values(1);
go 100

-- this statement will create a clustered index
-- on MyTable1.  The index key is the id field
-- but due to the nature of a clustered index
-- it will contain all of the table data
create clustered index MyTable1_CI
on MyTable1(id);
go


-- by default, SQL Server will create a statistics
-- on this index.  Here is proof.  We see a stat created
-- with the name of the index, and the consisting stat 
-- column of the index key column
select
    s.name as stats_name,
    c.name as column_name
from sys.stats s
inner join sys.stats_columns sc
on s.object_id = sc.object_id
and s.stats_id = sc.stats_id
inner join sys.columns c
on sc.object_id = c.object_id
and sc.column_id = c.column_id
where s.object_id = object_id('MyTable1');


-- here is a standalone statistics on a single column
create statistics MyTable1_MyIntCol
on MyTable1(my_int_col);
go

-- now look at the statistics that exist on the table.
-- we have the additional statistics that's not necessarily
-- corresponding to an index
select
    s.name as stats_name,
    c.name as column_name
from sys.stats s
inner join sys.stats_columns sc
on s.object_id = sc.object_id
and s.stats_id = sc.stats_id
inner join sys.columns c
on sc.object_id = c.object_id
and sc.column_id = c.column_id
where s.object_id = object_id('MyTable1');


-- what is a stat look like?  run DBCC SHOW_STATISTICS
-- to get a better idea of what is stored
dbcc show_statistics('MyTable1', 'MyTable1_CI');
go

Aqui está a aparência de uma amostra de estatística de teste:

insira a descrição da imagem aqui

Observe que as estatísticas são a contenção da distribuição de dados. Eles ajudam o SQL Server a determinar um plano ideal. Um bom exemplo disso é: imagine que você está vivendo um objeto pesado. Se você soubesse quanto desse peso, porque havia uma marcação de peso, determinaria a melhor maneira de levantar e com que músculos. É o que o SQL Server faz com as estatísticas.

-- create a nonclustered index
-- with the key column as my_int_col
create index IX_MyTable1_MyIntCol
on MyTable1(my_int_col);
go

-- let's look at this index
select
    object_name(object_id) as object_name,
    name as index_name,
    index_id,
    type_desc,
    is_unique,
    fill_factor
from sys.indexes
where name = 'IX_MyTable1_MyIntCol';

-- now let's see some physical aspects
-- of this particular index
-- (I retrieved index_id from the above query)
select *
from sys.dm_db_index_physical_stats
(
    db_id('TestDB'),
    object_id('MyTable1'),
    4,
    null,
    'detailed'
);

Podemos ver no exemplo acima que o índice realmente contém dados (dependendo do tipo de índice, as páginas folha serão diferentes).

Esta postagem mostrou apenas uma visão geral muito, muito muito breve desses dois aspectos grandes do SQL Server. Ambos poderiam ocupar capítulos e livros. Leia algumas das referências e terá uma melhor compreensão.

— Thomas Stringer
fonte

Eu sei que este é um post antigo, mas acho digno de nota que a criação de um índice irá (na maioria dos casos) gerar automaticamente estatísticas para o índice. O mesmo não pode ser dito sobre a criação de estatísticas.

— 21414 Steve Jobs