### 2.3 Characterization of Robustness

As discussed above, the current performance of database systems is extremely sensitive to the accuracy of the underlying estimation models. Therefore, it is highly desirable to design robust solutions that provide performance stability. However, the definition of robustness itself has been a subject of intense debate, especially at the Dagstuhl seminars, and a consensus has been difficult to achieve. For instance, if worst-case performance is improved at the expense of average-case performance, is that an acceptable notion of robustness? Rather than get sidetracked into this vexed semantic tangle, we instead advocate that the database engine should have a *multiplicity* of components that separately but cooperatively cater to the various environments. For instance, if the cardinality estimates are expected to be reasonably accurate, then the traditional query optimizer is a reasonable choice for arriving at the right plan. On the other hand, if the estimates are expected to be brittle, then an alternative mechanism kicks in – as a case in point, the Plan Bouquet technique [17] discussed in Section 3, which does not use cardinality estimates at all.

Further, there are different levels at which notions of robustness can be introduced, ranging from individual operators (e.g [7] ) to entire query *plans* (e.g [12] ), or even end-to-end query *executions* (e.g. [17] ). Moreover, one can take recourse to *algorithmic* (e.g. [44] ), *statistical* (e.g [47] ) or *learning-based* (e.g. [36] ) approaches to incorporate the robustness features at these various levels. The big picture is that a rich variety of possibilities are currently available, and a judicious choice could potentially lead to the desired robustness. Most importantly, with the impending advent of the so-called Big Data world, wherein data will be the engine driving virtually all aspects of human endeavour, the role of robust query processing (RQP) will soon assume critical proportions.

### 2.4 RQP Metric

For the purpose of this article, we will define robustness to be “the worst-case sub-optimality in plan performance that can arise due to modeling errors resulting in inaccurate estimates”. We denote this metric as Maximum Sub-Optimality, or **MSO**, since it captures the worst-case ratio, over the entire modeling space, of the execution time taken by the chosen plan with respect to the optimal time incurred by an oracular system that magically knows all the correct parameter values. The precise mathematical definition of MSO is given below.

Assume that we have an *n*-dimensional estimation space, referred to as the ESS. Then, the optimizer-estimated location of the query in the ESS is denoted by *q _{e}*, whereas the actual location is denoted by

*q*. The optimal plan at

_{a}*q*, as determined by the native optimizer, is denoted by

_{e}*P*, and similarly the optimal plan at

_{opt}(q_{e})*q*by

_{a}*P*. Further, assume that the query locations and the associated estimation errors range over the

_{opt}(q_{a})*entire estimation space*, that is, all (

*q*) combinations are possible. Finally, the cost of a generic execution plan

_{e}, q_{a}*P*at an arbitrary query location

_{i}*q*in the ESS is denoted by

*cost(*

*P*, q)._{i}With the above notation, the sub-optimality incurred due to using plan P_{opt}(q_{e}) at location q_{a} is simply defined as the ratio:

The worst-case *SubOpt* for a given *q _{a}* is defined to be with respect to the

*q*that results in the maximum sub-optimality, that is, where modeling inaccuracies have the maximum adverse performance impact:

_{e}Now, the *global* worst-case situation is simply defined as the (*q _{e} , q_{a}*) error combination that results in the maximum value of

*SubOpt*over the entire ESS, that is:

As per this formulation, MSO values range over the interval [1, ∞).

### 2.5 RQP Problem Definition

Given the above framework, the problem of robust query processing is defined as follows: *For a given input SQL query Q with its associated ESS, and the search space consisting of tuples < q , P_{opt}(q), cost(P_{opt}(q) , q) > covering all locations q ∈ ESS, develop a query processing approach that minimizes the MSO value for Q.*

Unfortunately, the current situation is that, even with industrial-strength database systems, there are no inherent bounds on the MSO values. In fact, it has been empirically observed in [17] that MSO values can reach very large magnitudes in practice, often in the several tens to hundreds, and sometimes even as high as a million! However, as demonstrated in the following section, approaching classical problems with a fresh perspective can yield surprisingly potent results, and that it is indeed possible to provide attractive MSO guarantees.