Sunday, December 2, 2012

understanding SQL server 2012 columnstore index


In the past few months, I spent some time playing with SQL server 2012 ColumnStore (CS) index. One of the tests was to apply CS index on tables with 50 million and 500 million rows. On average, query execution time reduced from 10 - 15 minutes to less than 10 seconds. Realized later that even with CS, you might still need to apply tranditional row-based indices to maximize query optimization. One simple way is to let the query optimizer decide when to use which. In general, CS improves performance on large tables using joins and aggregations. For look-ups, the row-based index doing seeks still works faster in many cases.

CS works so fast due to its in-memory technology. It compresses tables on columns and puts them in-memory as a whole. For columns with many repeating values, if they are integers, date-times and numerics with precisions less than 18, they are compressed the most. If you inlcude many text fields with high cardinality, the CS compression won't help much. These texts would also increase storage spaces and consume lot of memories. Had one interesting observation that when I was opening a tabular project in visual studio 2010, the tabular was loading into memory instantly. It consumed about 2 GB of memory for about 1.5 GB of data.

SQL server 2012 SSAS tabular and PowerPivot have built-in CS indices. Tabular runs on a server, while PowerPivot runs on a client PC. If you import millions and millions rows to a PowerPivot into your Excel, you need minimum 4 GB RAM based on Microsoft, 6 or 8GB based on my testing. Your PC must have 64-bit OS.

Some other basics learned about CS -

- CS cannot be applied to views

- After you apply CS to a table, the table becomes read only.

- CS cannot be combined with page / row compression.

reference - http://msdn.microsoft.com/en-us/library/gg492088.aspx

No comments:

Post a Comment