Wednesday, 25 September 2013

Intersystems Caché Performance - Part 3

While my investigations have still left me sad to find that there is no real way to match the read performance of an SQL database I have found out some interesting little pieces of information on Caché.

1. Bitmap indexes always appear to be faster than standard indexes no matter how many unique values there are.

2. Traversing standard indexes manually is the fastest way to return a query first time (second time bitmap index SQL is the fastest).

3. Single Insert performance is good

4. Reversing through an on disk array (e.g. $ORDER(^global,-1) ) is slower than going through it forwards putting the data into an in memory array and then reserving through it! To my surprise all my tests showed that it was at least a 50% performance improvement and as the data grows it can double. The only problem is if the memory array grows too big then you will get a store error as the memory runs out.

I am considering re-writing my queries to only reverse $ORDER in memory and checking the $STORAGE to make sure it is not running out, then writing it to disk if it becomes too large.

It appears to be the case that using SQL with bitmap indexes and potentially using triggers to make sure no values become calculated or require significant levels of joins is the only way to achieve reasonable read performance.

Hopefully this information will be helpful in to you in getting some vague level of performance out of the system.

Sunday, 22 September 2013

Intersystems Caché performance testing part 2

Time For Write Performance to Shine?
Having struggled to achieve good read query performance I thought I would give the write performance a try.

Caché has suggested that its write performance is its real strength and so I thought I would have a quick comparison. I used the same simple table with 10,000,000 records. It only has a first and lastname value, I inserted 10,000 rows 3 times in each and took the average performance figure.

A Win for Caché?

I tried both SQL INSERT and creating objects using %Save() to see which was faster. I found SQL INSERT to be about 15% faster than using objects, so the figures below are from SQL INSERTS.

Caché 10,000 inserts 0.42 seconds, this went down to 0.38 on subsequent tests

MySQL 10,000 inserts 0.152 seconds. 
MSSQL 10,000 inserts 0.502 seconds.

Finally Caché performing well. What is an additional bonus is that Caché has good performance with sequential inserts which are significantly slower in MySQL.

Overall the write performance is not exactly astonishing, but certainly comparable with other databases.

From my basic testing the consistency of MSSQL probably makes it my database of choice, but MySQL's query cache performance and its good group insert performance certainly can be considered better in certain situations.

Back in 2005 I noticed the performance of MySQL was significantly better on a linux box of equivalent spec to a windows box. I do not know if the Windows version has improved or if this performance difference does exist, but it certainly would be interesting to find out.

Tuesday, 17 September 2013

Intersystems Caché performance, woe is me...

I admit I am no Intersystems Caché guru, I am looking at 3 years working with the software. I have attended the sparse available training and I have been left to explore this weird universe known as Caché.

Now if you read their website you will read the tag line "the World's fastest, most scalable Object oriented database". More over they make even greater claims of "...it can run SQL faster than relational databases. "

Every training session seemed to pad every sentence with "so fast" and "world's fastest" although I was never sure if they were trying to convince me or themselves of this "fact".

When using SQL databases for reporting data I have never really encountered issues, my SQL has normally been suboptimal fire and forget code, I make the query get the data and if the full page response including the query is less than 1 second it is not worth optimising.

Now while 99% of my MSSQL and MySQL queries have always been so quick I have never needed to revisit them with Intersystems I have had to go back again and again to optimise everything, completely re-writing a ton of code, turning a single line of SQL in to an intricate For loop absolutely minimising memory use. Yet the makers continue to bang on about performance, but to my knowledge only write performance is reasonable, read performance is SLOW.

I thought it would only be fair to benchmark the process to see if it was possible to match the performance of two of the most common SQL databases.

The following query is based on a "real world" query and one of the first that I experienced slow Cache performance. I wrote a summary page which contained several aggregate queries, and to my surprise it was very sluggish. The page in question ran between 10-30 aggregate queries, and the tables containing the data ranged from 100-100,000 rows.

Query
SELECT LastName, count(*) AS Total
FROM Users
ORDER BY DESC Total 

I consider that a reasonably simple, but a reasonable example of a commonly run query.

Test Data
10,000,000 records

I chose this number as it should help highlight real performance differences and helps establish if it truly was more scalable.

Results
MSSQL 0.061 seconds
MySQL first query in 10.4 seconds and subsequent queries in 0.003 seconds.
Caché *as SQL(/w bitmap indexes) first query in 62.83 seconds and subsequent queries in 33.27 seconds.
Caché *as $Order through the index global 48.17 seconds first query and subsequent queries 48.17 seconds

MSSQL unindexed 4.4 seconds

I hope you will agree the performance figures are WAY off SQL performance.

Next test to line up is write performance, as Cache does not need to be "ACID" I suspect it has a chance at winning, but nothing is guaranteed...