Home > Database > Robust Query Processing in Database Systems*

Robust Query Processing in Database Systems*

3 The PlanBouquet Approach to RQP [17]

In this section, we present a recently developed and radically different approach to the cardinality modeling problem. In our new approach, called PlanBouquet, the classical compile-time estimation process is completely eschewed for error-prone cardinalities. Instead, these cardinalities are systematically discovered at run-time through a calibrated sequence of cost-limited plan executions. That is, the cardinality estimation problem is side-stepped, rather than addressed head-on, by adopting a “seeing is believing” perspective on these values.

To enable uniform treatment of all predicates, we will hereafter substitute cardinality values with normalized equivalents called selectivities – that is, selectivity is defined as the ratio of the estimated output number of rows from an operator to the maximum feasible number of output rows. This formulation causes all selectivities to range over the [0, 1] interval, resulting in the ESSbecoming an n-dimensional unit hypercube. Further, for ease of presentation, we will use the percentage form of selectivity in our discussion, i.e. selectivities range over [0%, 100%].

3.1 Plan Cost Behavior

An assumption that fundamentally underlies the entire PlanBouquet mechanism is that of Plan Cost Monotonicity (PCM) – that is, the costs of plans increase monotonically with increasing selectivity values. It captures the intuitive observation that when more data is processed by a plan, signified by larger selectivities, the cost of processing also increases. It has been empirically observed that this assumption generally holds for the plans generated by current database systems on decision support queries [22]. Apart from monotonicity, we also assume the cost functions to be continuous (smooth) throughout the ESS, again a commonplace feature in practice.

3.2 One-dimensional ESS

We introduce the PlanBouquet approach through a restricted one-dimensional version of the earlier example query (Figure 4), wherein only the p_retailprice < 1000 filter predicate is assumed to be error-prone. For this scenario, we first, through repeated invocations of the optimizer, identify the “parametric optimal set of plans” (POSP) [27] [28] that cover the entire selectivity range of the predicate. A sample outcome of this process is shown in Figure 6, wherein the POSP set is comprised of plans P1 through P5. Further, each plan is annotated with the selectivity range over which it is optimal – for instance, plan P3 is optimal in the (1.0%, 7.5%] interval. (In Figure 6, P = Part, L = Lineitem, O = Order, NL = Nested Loops Join, MJ = Sort Merge Join, and HJ = Hash Join).

The optimizer-computed costs of these POSP plans over the selectivity range are shown (on a log-log scale) in Figure 7. On this figure, we first construct the “POSP infimum curve” (PIC), defined as the trajectory of the minimum cost from among the POSP plans – this curve represents the ideal performance. The next step, which is a distinctive feature of our approach, is to discretize the PIC by projecting a graded progression of isocost (IC) steps onto the curve. For example, in Figure 7, the dotted horizontal lines represent a geometric progression of isocost steps, IC1 through IC7, with each step being double the preceding value. The intersection of each IC with the PIC (indicated by ■ ) provides an associated selectivity, along with the identity of the best POSP plan for this selectivity. For example, in Figure 7, the intersection of IC5 with the PIC corresponds to a selectivity of 0.65% with associated POSP plan P2. We term the subset of POSP plans that are associated with the intersections as the “plan bouquet” for the given query – in Figure 7, the bouquet consists of {P1, P2, P3, P5}.

The above exercise is carried out at query compilation time. Subsequently, at run-time, the correct query selectivities are implicitly discovered through a sequence of cost-limited executions of bouquet plans. Specifically, beginning with the cheapest cost step, we iteratively execute the bouquet plan assigned to each step until either:

1. The partial execution overheads exceed the step’s cost value – in this case, we know that the
actual selectivity location lies beyond the current step, motivating a switch to the next step in the sequence; or
2. The current plan completes execution within the budget – in this case, we know that the actual selectivity location has been reached, and a plan that is at least 2-optimal wrt the ideal choice, was used for the final execution.

Example  To make the above process concrete, consider the case where the selectivity of p_retailprice is 5%. Here, we begin by partially executing plan P1 until the execution overheads reach IC1 (1.2E4 | 0.015%). Then, we extend our cost horizon to IC2, and continue executing P1 until the overheads reach IC2 (2.4E4 | 0.03%), and so on until the overheads reach IC4 (9.6E4 | 0.2%). At this juncture, there is a change of plan to P2 as we look ahead to IC5 (1.9E5 | 0.65%), and during this switching all the intermediate results (if any) produced thus far by P1 are discarded. The new plan P2 is executed until the associated overhead limit (1.9E5) is reached. The cost horizon is now extended to IC6 (3.8E5 | 6.5%), in the process discarding P2’s intermediate results and executing P3 instead. The execution in this case will complete before the cost limit is reached since the actual location, 5%, is less than the selectivity limit of IC6. Viewed in toto, the net sub-optimality turns out to be 1.78 since the exploratory overheads are 0.78 times the optimal cost, and the optimal plan itself was (coincidentally) employed for the final execution.

Pages ( 5 of 9 ): « Previous1 ... 34 5 67 ... 9Next »

Leave a Comment:

Your email address will not be published. Required fields are marked *