Intersystems Cache Indexing Shocker - How to Fix Sluggish Queries

Photo: Markus Spiske / www.temporausch.com / CC-BY

When going through a performance review of a grid on the website, I was struggling to see why the query was taking so long. It was a single table query with a single where clause which was on an indexed field. This indexed field could only contain 5 different values and the query was return fewer than 100 entries but was taking over 1 second.

On other lower powered servers the query was taking less than a third of a second and often returning more data albeit on a smaller total data set.

Checking the query plan revealed something interesting, the query was not using the index but was instead performing a whole table scan. Further testing revealed that only "=" queries use indexes. "!=" ignore indexes as do any queries with "OR" or "IN".

After further investigation I found that if you only return the ID then NOT, OR , IN queries do use indexes, however, if you return any other field including the field you are performing your where clause on you activate a full table scan

Uses index

SELECT ID FROM TABLE WHERE STATUS != 'Complete'
SELECT count(*) FROM TABLE WHERE STATUS != 'Complete'
SELECT * FROM TABLE WHERE STATUS = 'Complete'

Full table scan

SELECT STATUS FROM TABLE WHERE STATUS != 'Complete'
SELECT * FROM TABLE WHERE STATUS != 'Complete'
SELECT * FROM TABLE WHERE STATUS = 'Active' OR STATUS = 'Error'  OR STATUS = 'Cancelled'

As far as I am concerned this is unbelievable behaviour, which is still in the product as of the latest 2014 version of Intersystems Cache database!

At least it had revealed a 3 different ways of substantially improving query performance

1. Union Query 

If you know all of the data that can be contained in the field then you can perform a UNION query

SELECT * FROM TABLE WHERE STATUS = 'Active'
UNION
SELECT * FROM TABLE WHERE STATUS = 'Error'
UNION
SELECT * FROM TABLE WHERE STATUS = 'Cancelled'

This of course means more maintenance when adding a new status.

2. ID Only Select 

If you do not know the data then you can just return the ID and loop through the returned IDs to grab the data.
e.g.
SELECT ID FROM TABLE WHERE STATUS != 'Complete'
Loop...
SELECT * FROM TABLE WHERE ID=...

3. Create New Property

e.g.
SELECT * FROM TABLE WHERE Active = '1'

This requires a new property which is probably easiest to create in the OnUpdate method, again maintenance of this would be an issue.

I have not seen any documentation talking about these issues with queries. It does strike me this should be an easy thing for Intersystems to fix at the database level, but if they haven't considered it by now then I won't be holding my breath for it.

Comments

RNInterSystems said…
Andrew

First of all, please contact wrc.intersystems.com whenever the behavior of Caché surprises you. You definitely will be surprised by the helpfulness of our support advisors and we'll all do our best for you to have the predictably best behavior from your system.

Secondly, Caché is not unique in having behavior that, at first glance, is surprising with (for example) indices and NOT EQUAL where clauses. (Try googling "ORACLE NOT EQUAL INDEX")

Thirdly (and I feel sort of lame saying this) running Tune Table http://docs.intersystems.com/cache20141/csp/docbook/DocBook.UI.Page.cls?KEY=GSQL_optimizing#GSQL_optimizing_tunetable will improve SOME of the behaviors you're seeing. [I feel lame because we pride ourselves on minimizing the tuning needed for Caché systems and yet we don't (re-)initialize some of the tuning parameters automatically. Shame on us but stay tuned (ha, ha !!)]

I'd prefer not to get into a lengthy explanation of how you can improve the plan used (and why) for the different cases you cite. There are relatively easy steps to take for most and easy-to-understand explanations. Give us a call.

RN
Unknown said…
In my case the !='Complete' was about 1% of the data in the table so either an ID OR or UNION based query was around 100+ times faster than a NOT select.

Running tune table sets the selectivity to 20% as there are 5 possible values for the state and it ignores the actual spread of data. Manually changing the tuning value of the table does not change how the query works it still always avoids the index.

Setting the index to contain all of the data returned from the query also failed to use the index on a NOT query.

I found that technically the fastest in every test case I made using SQL was to change a NOT query to an ID only OR query and then iterate through the results selecting by ID the required data.

1,000,000 record table with the state evenly spread between 5 states gave me the following performance

SELECT ID, Subject, State FROM Test WHERE State != 'Complete' - 3.356 seconds

SELECT ID FROM Test WHERE State='Active' OR State='Paused' OR State='Pending' OR State='Waiting' , then iterate by ID with SELECT ID, Subject, State FROM Test WHERE ID='id' - 2.221 seconds

SELECT ID FROM Test WHERE State='Active' OR State='Paused' OR State='Pending' OR State='Waiting' , then iterate through global storage using the returned ID - 1.635 seconds

Iterate through Global storage - 1.100 seconds

Iterate through Index Storage and obtain data from Global Storage - 0.794 seconds

Iterate through Index storage which contains data 0.612 seconds
Mack said…
RNInterSystems, I just wanted to note that Intersystems is not always so helpful. In most cases, we have been referred to our reseller, which has often resulted with us finding our own solutions...after several hours of research online or trial and error.
4W Tech said…
Awesome blog! I am really impressed by this blog! The informations are really nice and cool

InterSystems Technologies

Popular posts from this blog

IE9 Intranet compatibility mode in Intranet websites

User Interface Usability Checklist Part 2

Procedural VS Object Oriented