Home > Database > Robust Query Processing in Database Systems*

## Robust Query Processing in Database Systems*

Theorem 3.2. Given a query Q with a one-dimensional ESS, and the associated PIC discretized with a
geometric progression having common ratio r, the PlanBouquet bouquet execution algorithm ensures that MSO ≤ r2/r-1

Further, the choice of r can be optimized to minimize this value – the RHS reaches its minima at r = 2, for which the value of MSO is 4. That is, it is important to note here that a MSO guarantee of 4 is impressive, given that conventional database systems are incapable of providing any guarantee at all! Moreover, the following theorem shows that this guarantee is the best performance achievable by any deterministic online algorithm – leading us to conclude that the doubling-based discretization is the ideal solution.

Theorem 3.3. Given a universe of cost-limited executions of POSP plans, no deterministic online algorithm can ensure MSO lower than 4 in the one-dimensional scenario.

Proof. We prove by contradiction, assuming there exists an optimal online robust algorithm, R*, with an MSOof f, f < 4. The proof is divided into two parts: First, we show that R* must be a monotonically increasing sequence of plan execution costs, [a1, a2, . . . , am]; and second, we demonstrate that achieving an MSOof less than 4 requires the ratio of cumulative costs for consecutive steps in the sequence to be strictly decreasing – however, this is fundamentally impossible and hence the contradiction.

(a) Assume that R* has cost sequence [a1, . . . , ai, aj , . . . ,am+1] which is sorted in increasing order except for the inversion caused by aj < ai.

Now, let us define a plan execution to be useful if its execution covers a hitherto uncovered region of the selectivity space. With this definition, an execution of aj after ai is clearly useless since no fresh selectivity ground is covered by this cheaper execution. A sample instance with reference to Figure 5, is executing P2, which covers the selectivity region (0, q2), after P3 which covers the region (0, q3) – this does not add any value since the latter subsumes the former.

In summary, an out-of-order execution sequence cannot provide any improvement over an ordered sequence, which is why aj can be safely discarded to give a completely sorted sequence [a1, . . . , ai, . . . , am].

(b) For the sorted execution sequence R*, denote the cumulative cost at each step with Aj = ∑j i=1 ai,
and the ratio between the cumulative costs for consecutive steps as Yj = Aj+1/Aj.
Note that, by definition Aj+1 > Aj.

Now, since R* has MSOg of f , the sub-optimality caused by each and every step should be at most f , that is,

and therefore

After dividing both sides with Aj , we get

Through elementary algebra, it is known that ∀z > 0, ( 1-1/z ) ≤ z/4. Therefore, we get

Since f < 4, it implies that the sequence Yj is strictly decreasing with multiplicative factor < 1. With repeated applications of the same inequality, we obtain

For sufficiently large j, this results in

which is a contradiction to our earlier observation that Aj+1 > Aj.

### 3.3 Multi-dimensional ESS

When the above 1D approach is generalized to a multi-dimensional selectivity environment, the IC steps and the PIC curve become surfaces, and their intersections represent selectivity surfaces on which multiple bouquet plans may be present. For example, in the 2D case, the IC steps are horizontal planes cutting through a hollow three-dimensional PIC surface, typically resulting in hyperbolic intersection contours featuring a multitude of plans covering disjoint segments of the contours. A sample 2D scenario is shown in Figure 10a, wherein the isosurfaces ICk are contours that represent a continuous sequence of selectivity locations (in contrast to the single location in the 1D case). Further, multiple bouquet plans may be present on each contour, as shown for ICk, wherein four plans,$$P^K_1 , P^K_2 ,P^K_3 , P^K_4$$ are the optimizer’s choices over disjoint (x, y) selectivity ranges on the contour.

Notwithstanding these changes, the basic mechanics of the bouquet algorithm remain virtually identical. The primary difference is that we jump from one isosurface to the next only after it is determined that none of the bouquet plans present on the current isosurface can completely execute the given query within the associated cost budget. This is because, in order to decide whether qa lies below or beyond ICk, in principle every plan on the ICk contour has to be executed – only if none complete, do we know that the actual location definitely lies beyond the contour.

Pages ( 7 of 9 ): « Previous1 ... 56 7 89Next »