When the Oracle Optimizer needs to generate an execution plan for a query it goes to through the already taken statistics for the tables and indexes involved.
Histograms are one of those statistics taken from tables. Basically they show how the values of a column are distributed. Oracle will use that to determine how many number of rows will be return by a certain filter on the column. If there are few rows, then it will choose to use an index (if any on that column). If there are a lot of rows then Oracle could choose to do a full table scan.
Doing these calculations requires resources (CPU) so we use Bind Variables to avoid re-parsing of the SQL. So once the plan has been calculated, Oracle will use it again and again until it is removed from the shared pool. The execution plan is calculated only once. But when using Bind Variables, what values are used to calculate the plan? Yes, the values of the first query it processes.
Do you start to see the problem here?
Lets say your table has a column called callerid. This column stores the caller phone number, and "0" if no callerid was registered. Now let's say you store millions of calls per month. Your table is now 60 million rows.
Each phone number does an average of 5 calls per month. Top call registered from one number is 2000. Those are very good numbers for creating a btree-index on "callerid".
But what happens with the "0" value? You'll have million of rows and the index is not a good choice (since it will be cheaper to do a full scan than to use the index with so many rows).
That's exactly the problem: if the first query Oracle recieves has the "0" value in the callerid column, then the stored plan will not use an index, and all future queries will be slow even with different values on callerid!!!
Let's get the histogram information:
select num_rows from dba_tables where owner = 'OWNER' and table_name = 'CALLS';
NUM_ROWS
----------
63627796
select HISTOGRAM from dba_tab_col_statistics where owner = 'OWNER' and table_name = 'CALLS' and column_name = 'CALLERID';
HISTOGRAM
---------------
HEIGHT BALANCED
select callerid, count(1)
from calls
group by callerid
order by 2 desc
CALLERID COUNT(1)
--------------------------
0 3127825
310596574 1808
293584543 1405
915089527 1229
915144244 694
...
And now let's take a look at the histograms and see how even is this data distributed on the table:
SQL> select ENDPOINT_NUMBER, ENDPOINT_VALUE
2 from dba_histograms
3 where table_name = 'CALLS' and column_name = 'CALLERID' and owner = 'OWNER'
4 order by 1;
ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
8 0
9 269304330
10 283241184
11 289850129
12 295923724
13 296092273
14 296248610
15 296416759
16 296585656
17 296757192
...
As you can see the table is HEIGHT BALANCED according to the statistics taken previously, that is their values are evenly distributed, except for the first 8 buckets of the histogram. Each bucket in the histogram represents (63627796/254) ~= 250503 rows. So in our case the "0" value is repeated (8*250503)~= 2 million rows.
Remember we said that the Optimizer builds the execution path based on the values of the first query it receives? What happens if the first query it receives involves a CALLERID = 0?
select *
from calls
where callerid = :b1
:b1 bind variable is 0, so oracle receives the query and determine that it does not need to use the index on callerid. This is the best thing to do with a "0" value, but next time the application will do the same select with callerid = 269304330 and what will happen? Since the execution plan is already on the Shared Pool, it will be reused, WITHOUT THE INDEX!!!!
This is called "Variable Peeking", and is a know problem. It can happen at any time. Imagine your gather_stats_job run on the CALLS table at Sunday night. The execution plan becomes invalidated. On Monday a user opens the application and do the query with callerid = 0 and BAM! Your new execution plan is created without getting the index.
For this situation you can do several options. The best thing would be to HINT your SQL to use the right index (or not to use it when the value is 0, if you can know it before running it).
But sometimes you can't touch the application code. In those cases you can create a Store Outline for the query with the DBMS_OUTLINE package. See Metalink note 463288.1 for more details.
Jose