Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
523 views
in Technique[技术] by (71.8m points)

select - How does sql server sort your data?

I was wondering how sql server sorts it's data. I noticed that if I have a table that doesn't contain the column "Id" and you select data without "ORDER BY" sql server doesn't automatically sort on the primary column.

Does anyone know what rule sql server follows to sort it's data?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Although it's good to wonder about how it might be explained that you often see the same order, I'd like to point out that it never a good idea to rely on implicit order caused by the particular implementation of the underlying database engine. In other words, its nice to know why, but you should never ever rely on it. For MS SQL, the only thing that reliably delivers the rows in a certain order, is an explicit ORDER BY clause.

Not only do different RDMBS-es behave differently, one particular instance may behave differently due to an update (patch). Not only that, even the state of the RDBMS software may have an impact: a "warm" database behaves differently than a "cold" one, a small table behaves differently than a large one.

Even if you have background information about the implementation (ex: "there is a clustered index, thus it is likely the data will be returned by order of the clustered index"), there is always a possibility that there is another mechanism you don't know about that causes the rows to be returned in a different order (ex1: "if another session just did a full table scan with an explicit ORDER BY the resultset may have been cached; a subsequent full scan will attempt to return the rows from the cache"; ex2: "a GROUP BY may be implemented by sorting the data, thus impacting the order the rows are returned"; ex3: "If the selected columns are all in a secondary index that is already cached in memory, the engine may scan the secondary index instead of the table, most likely returning the rows by order of the secondary index").

Here's a very simple test that illustrates some of my points.

First, startup SQL server (I'm using 2008). Create this table:

create table test_order (
    id int not null identity(1,1) primary key
,   name varchar(10) not null 
)

Examine the table and witness that a clusted index was created to support the primary key on the id column. For example, in sql server management studio, you can use the tree view and navigate to the indexes folder beneath your table. There you should see one index, with a name like: PK__test_ord__3213E83F03317E3D (Clustered)

Insert the first row with this statement:

insert into test_order(name)
select RAND()

Insert more rows by repeating this statement 16 times:

insert into test_order(name)
select RAND()
from   test_order

You should now have 65536 rows:

select COUNT(*) 
from   test_order

Now, select all rows without using an order by:

select *
from   test_order

Most likely, the results will be returned by order of the primary key (although there is no guarantee). Here's the result I got (which is indeed by order of primary key):

#      id    name
1      1     0.605831
2      2     0.517251
3      3     0.52326
.      .     .......
65536  65536 0.902214

(the # is not a column but the ordinal position of the row in the result)

Now, create a secondary index on the name column:

create index idx_name on test_order(name)

Select all rows, but retrieve only the name column:

select name
from   test_order

Most likely the results will be returned by order of the secondary index idx_name, since the query can be resolved by only scanning the index (i.o.w. idx_name is a covering index). Here's the result I got, which is indeed by order of name.

#      name
1      0.0185732
2      0.0185732
.      .........
65536  0.981894

Now, select all columns and all rows again:

select * 
from test_order

Here's the result I got:

#      id    name
1      17    0.0185732
2      18    0.0185732
3      19    0.0185732
...    ..    .........

as you can see, quite different from the first time we ran this query. (It looks like the rows are ordered by the secondary index, but I don't have an explanation why that should be so).

Anyway, the bottom line is - don't rely on implicit order. You can think of explanations why a particular order can be observed, but even then you can't always predict it (like in the latter case) without having intimate knowledge of implementation and runtime state.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...