3.2.1 Performance Characteristics
At first glance, the plan bouquet approach, as described above, may appear to be utterly absurd and self-defeating because: (a) At compile time, considerable preprocessing may be required to identify the POSP plan set and the associated PIC; and (b) At run-time, the overheads may be hugely expensive since there are multiple plan executions for a single query – in the worst scenario, as many plans as are present in the bouquet!
However, we will now make the case that it is indeed possible, through careful design, to have plan bouquets efficiently provide robustness profiles that are markedly superior to the native optimizer’s profile. Specifically, the bouquet mechanism delivers substantial improvements with regard to MSO, the robustness metric defined in Section 2.
For instance, the runtime performance of the bouquet technique on the example query is profiled in Figure 8 (dark blue curve). We observe that its performance is much closer to the PIC (dark green) as compared to the worst case profile for the native optimizer (dark red), which is comprised of the supremum of the individual plan profiles. In fact, the MSO for the bouquet is only 3.6 (at 6.5%), whereas the native optimizer suffers a sub-optimality of around 100 when P5, which is optimal for large selectivities is mistakenly chosen to execute a query with a small selectivity of only 0.01%.
3.2.2 MSO Guarantees
Our motivation for the cost-based discretization of the PIC is that it leads to guaranteed bounds on the MSO value. For instance, we now prove that the cost-doubling strategy used in the 1D example results in an MSO upper-bound of 4 – this bound is inclusive of all exploratory overheads incurred by the partial executions, and is irrespective of the query’s actual selectivity. In fact, we can go further to show that 4 is the best guarantee achievable by any deterministic algorithm.
By virtue of our assumptions on plan cost behavior, the PIC turns out to be a monotonically increasing and continuous function throughout the ESS; its minimum and maximum costs are denoted by Cmin and Cmax , respectively. This PIC is discretized by projecting a graded progression of cost steps onto the curve. Specifically, consider the case wherein the steps are organized in a geometric progression with initial value a (a > 0) and common ratio r (r > 1), such that the
Figure 9: ID selectivity space
For 1 ≤ k ≤ m, denote the selectivity location where the kth cost step (ICk) intersects the PIC by qk, and the corresponding bouquet plan as Pk. All the qk locations are unique, by definition, due to the monotonicity and continuity features of the PIC. Finally, for mathematical convenience, assign q0 to be 0.
With this framework, the bouquet execution algorithm, outlined in Algorithm 1, operates as follows in the most general case, where a different plan is associated with each step: We start with plan P1 and budget cost(IC1), progressively working our way up through the successive bouquet plans P2 , P3, … until we reach the first plan Pk that is able to fully execute the query within its assigned budget cost(ICk). It is easy to see that the following lemma holds:
Lemma 3.1. If qa resides in the range (qk-1 , qk], 1 ≤ k ≤ m, then plan Pk executes it to completion in the bouquet algorithm.
Proof. We prove by contradiction: If qa was located in the region (qk, qk+1], then Pk could not have completed the query due to the PCM restriction. Conversely, if qa was located in (qk-2, qk-1], Pk-1 itself would have successfully executed the query to completion. With similar reasoning, we can prove the same for the remaining regions that are beyond qk+1 or before qk-2.
Performance Bounds Consider the generic case where qa lies in the range (qk-1, qk]. Based on Lemma 1, the associated worst case cost of the bouquet execution algorithm is given by the following expression:
where the “*” in the qe location indicates that, unlike conventional optimizers, the PlanBouquet algorithm does not make any such estimations. Further, note that the final expression is independent of k, and hence of the specific location of qa. Therefore, we can state, for the entire selectivity space, that: