Query Optimization 101

On this post, I will show what I usually do to optimize queries. My database of choice is postgreSQL, but the same principles can be apply to most databases.

Laying down the terrain
kkizfzd.png

Pretty much, I have 3 tables…​ A, B and C (yeah, great naming).

A is the one that contains my most relevant data. B is a mandatory join, contains some data we need. C is an optinal join, contains some data we need, C also know B and it’s mandatory.

My original query
     select A.*, B.*, C.*
     from
         A A
     join B B on A.B_id=B.id
     left join C C on A.C_id=C.id
     where
         A.B_id='my_cool_id'
     order by
         A.order asc
     limit
         100

In simple terms, select all from A, B and C filtered by B_id.

This query was taking 122 ms to return 100 rows…​ long ago one guy told me that if a query was taking more then 5 ms, would be nice to revisit it and optimize it.

Asking postgres where the problem is

Don’t hurt to ask, right? Maybe postgres know where the problem lies. Let’s use the explain statement to request the execution plan:

Limit  (cost=20499.31..20499.56 rows=100 width=4505) (actual time=121.947..121.956 rows=100 loops=1)
  Buffers: shared hit=23034 read=10916
  ->  Sort  (cost=20499.31..20519.32 rows=8003 width=4505) (actual time=121.946..121.947 rows=100 loops=1)
        Sort Key: A.order
        Sort Method: top-N heapsort  Memory: 126kB
        Buffers: shared hit=23034 read=10916
        ->  Nested Loop Left Join  (cost=0.28..20193.44 rows=8003 width=4505) (actual time=58.003..117.868 rows=7590 loops=1)
              Buffers: shared hit=23034 read=10916
              ->  Nested Loop  (cost=0.00..17364.52 rows=8003 width=3323) (actual time=57.960..89.738 rows=7590 loops=1)
                    Buffers: shared hit=264 read=10916
                    ->  Seq Scan on B B  (cost=0.00..40.58 rows=1 width=1614) (actual time=0.013..0.061 rows=1 loops=1)
                          Filter: ((id)::text = 'my_cool_id'::text)
                          Rows Removed by Filter: 325
                          Buffers: shared hit=36
                    ->  Seq Scan on A A  (cost=0.00..17243.91 rows=8003 width=1709) (actual time=57.942..88.880 rows=7590 loops=1)
                          Filter: ((B_id)::text = 'my_cool_id'::text)
                          Rows Removed by Filter: 480403
                          Buffers: shared hit=228 read=10916
              ->  Index Scan using C_pkey on C C  (cost=0.28..0.34 rows=1 width=1182) (actual time=0.003..0.003 rows=1 loops=7590)
                    Index Cond: ((A.C_id)::text = (id)::text)
                    Buffers: shared hit=22770
Total runtime: 122.064 ms

Before I start optimizing based on what I can see on explain, remember when I said C also "knows" B? That is where I start. I also included the B filter on all joins.

     select A.*, B.*, C.*
     from A A
     join B B on A.B_id=B.id and B.id='my_cool_id'
     left join C C on A.C_id=C.id and C.B_id='my_cool_id'
     where
         A.B_id='my_cool_id'
     order by
         A.order asc

This simple action reduces the cost of the query a little bit:

Limit  (cost=17803.87..17804.12 rows=100 width=4505) (actual time=101.080..101.090 rows=100 loops=1)
  Buffers: shared hit=297 read=10948
  ->  Sort  (cost=17803.87..17823.88 rows=8003 width=4505) (actual time=101.079..101.082 rows=100 loops=1)
        Sort Key: A.order
        Sort Method: top-N heapsort  Memory: 126kB
        Buffers: shared hit=297 read=10948
        ->  Hash Left Join  (cost=103.10..17498.00 rows=8003 width=4505) (actual time=62.598..97.085 rows=7590 loops=1)
              Hash Cond: ((A.C_id)::text = (C.id)::text)
              Buffers: shared hit=297 read=10948
              ->  Nested Loop  (cost=0.00..17364.52 rows=8003 width=3323) (actual time=62.185..94.505 rows=7590 loops=1)
                    Buffers: shared hit=232 read=10948
                    ->  Seq Scan on B B  (cost=0.00..40.58 rows=1 width=1614) (actual time=0.015..0.065 rows=1 loops=1)
                          Filter: ((id)::text = 'my_cool_id'::text)
                          Rows Removed by Filter: 325
                          Buffers: shared hit=36
                    ->  Seq Scan on A A  (cost=0.00..17243.91 rows=8003 width=1709) (actual time=62.165..93.658 rows=7590 loops=1)
                          Filter: ((B_id)::text = 'my_cool_id'::text)
                          Rows Removed by Filter: 480403
                          Buffers: shared hit=196 read=10948
              ->  Hash  (cost=102.22..102.22 rows=70 width=1182) (actual time=0.395..0.395 rows=66 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 13kB
                    Buffers: shared hit=65
                    ->  Seq Scan on C C  (cost=0.00..102.22 rows=70 width=1182) (actual time=0.028..0.385 rows=66 loops=1)
                          Filter: ((B_id)::text = 'my_cool_id'::text)
                          Rows Removed by Filter: 2883
                          Buffers: shared hit=65
Total runtime: 101.213 ms

That alone shaves 20 ms, nothing special, but it’s a start. The query still take 101 ms to go, it’s better, but ain’t THAT better.

1st problem: Seq Scan

In general, Seq Scan is a bad thing. In this case, Seq Scan is happening on both FK.

So I will introduce indexes here:

create index on C(B_id);
create index on A(B_id);

Resulting in:

Limit  (cost=14258.70..14258.95 rows=100 width=4505) (actual time=32.903..32.908 rows=100 loops=1)
  Buffers: shared hit=23004
  ->  Sort  (cost=14258.70..14278.70 rows=8003 width=4505) (actual time=32.902..32.904 rows=100 loops=1)
        Sort Key: A.order
        Sort Method: top-N heapsort  Memory: 126kB
        Buffers: shared hit=23004
        ->  Nested Loop Left Join  (cost=223.00..13952.83 rows=8003 width=4505) (actual time=0.509..28.865 rows=7590 loops=1)
              Buffers: shared hit=23004
              ->  Nested Loop  (cost=222.72..11123.91 rows=8003 width=3323) (actual time=0.501..2.082 rows=7590 loops=1)
                    Buffers: shared hit=234
                    ->  Index Scan using B_pkey on B B  (cost=0.27..8.29 rows=1 width=1614) (actual time=0.009..0.009 rows=1 loops=1)
                          Index Cond: ((id)::text = 'my_cool_id'::text)
                          Buffers: shared hit=3
                    ->  Bitmap Heap Scan on A A  (cost=222.45..11035.59 rows=8003 width=1709) (actual time=0.488..1.042 rows=7590 loops=1)
                          Recheck Cond: ((B_id)::text = 'my_cool_id'::text)
                          Buffers: shared hit=231
                          ->  Bitmap Index Scan on A_B_id_idx  (cost=0.00..220.45 rows=8003 width=0) (actual time=0.471..0.471 rows=7590 loops=1)
                                Index Cond: ((B_id)::text = 'my_cool_id'::text)
                                Buffers: shared hit=40
              ->  Index Scan using C_pkey on C C  (cost=0.28..0.34 rows=1 width=1182) (actual time=0.003..0.003 rows=1 loops=7590)
                    Index Cond: ((A.C_id)::text = (id)::text)
                    Buffers: shared hit=22770
Total runtime: 33.009 ms

This resulted in a more significant cost reduction and a huge time reduction. From 101ms to 33ms.

2nd problem: Sort

To sort, postgresql need to "load" all records and that takes time. Since my query is only ordered by order I can create an index on it an shave most of the sort time.

I also realized that A.order + A.B_id needed an unique index to meet my business rules. So I needed a new index to handle that:

CREATE UNIQUE INDEX ON A (order, B_id);
Limit  (cost=0.98..525.06 rows=100 width=4505) (actual time=0.062..1.150 rows=100 loops=1)
  Buffers: shared hit=307 read=77
  ->  Nested Loop Left Join  (cost=0.98..41943.12 rows=8003 width=4505) (actual time=0.062..1.146 rows=100 loops=1)
        Buffers: shared hit=307 read=77
        ->  Nested Loop  (cost=0.70..39114.20 rows=8003 width=3323) (actual time=0.053..0.752 rows=100 loops=1)
              Buffers: shared hit=7 read=77
              ->  Index Scan using A_order_B_id_idx on A A  (cost=0.42..39005.87 rows=8003 width=1709) (actual time=0.042..0.699 rows=100 loops=1)
                    Index Cond: ((B_id)::text = 'my_cool_id'::text)
                    Buffers: shared hit=4 read=77
              ->  Materialize  (cost=0.27..8.30 rows=1 width=1614) (actual time=0.000..0.000 rows=1 loops=100)
                    Buffers: shared hit=3
                    ->  Index Scan using B_pkey on B B  (cost=0.27..8.29 rows=1 width=1614) (actual time=0.008..0.008 rows=1 loops=1)
                          Index Cond: ((id)::text = 'my_cool_id'::text)
                          Buffers: shared hit=3
        ->  Index Scan using C_pkey on C C  (cost=0.28..0.34 rows=1 width=1182) (actual time=0.003..0.004 rows=1 loops=100)
              Index Cond: ((A.C_id)::text = (id)::text)
              Buffers: shared hit=300
Total runtime: 1.220 ms

Now we are talking business. It’s taking only 1/100 of the original time to grab the data I need. This is the time I consider my work done and pack my things and hit home. And so I did.

But, while I was writting this, I noticed that I made this index "out of order". In general, is a good idea to put the low cardinatily item first and the highest cardinality latter. So I create a new index on the correct order:

CREATE UNIQUE INDEX ON A (B_id, order);
Limit  (cost=0.98..331.64 rows=100 width=4505) (actual time=0.056..0.468 rows=100 loops=1)
  Buffers: shared hit=307 read=4
  ->  Nested Loop Left Join  (cost=0.98..26464.27 rows=8003 width=4505) (actual time=0.055..0.463 rows=100 loops=1)
        Buffers: shared hit=307 read=4
        ->  Nested Loop  (cost=0.70..23635.35 rows=8003 width=3323) (actual time=0.044..0.100 rows=100 loops=1)
              Buffers: shared hit=7 read=4
              ->  Index Scan using A_B_id_order_idx on A A  (cost=0.42..23527.02 rows=8003 width=1709) (actual time=0.035..0.061 rows=100 loops=1)
                    Index Cond: ((B_id)::text = 'my_cool_id'::text)
                    Buffers: shared hit=4 read=4
              ->  Materialize  (cost=0.27..8.30 rows=1 width=1614) (actual time=0.000..0.000 rows=1 loops=100)
                    Buffers: shared hit=3
                    ->  Index Scan using B_pkey on B B  (cost=0.27..8.29 rows=1 width=1614) (actual time=0.006..0.006 rows=1 loops=1)
                          Index Cond: ((id)::text = 'my_cool_id'::text)
                          Buffers: shared hit=3
        ->  Index Scan using C_pkey on C C  (cost=0.28..0.34 rows=1 width=1182) (actual time=0.003..0.003 rows=1 loops=100)
              Index Cond: ((A.C_id)::text = (id)::text)
              Buffers: shared hit=300
Total runtime: 0.530 ms

The original index is now ignored and only the new one is used, dropping the time a little bit more. This plunger time query time to half ms =)

In the end, I beat the 5 ms rule, and not only that, but 10 times better.

comments powered by Disqus