4 Future Research Directions
In the previous section, we demonstrated the possibility of achieving progress in robust query processing by re-examining the fundamentals of the exercise. We now move on to outlining, in the final stage of this article, a sample set of future research directions. These open problems require leveraging of ideas across diverse branches of computer science, and their solutions will represent significant steps towards the ultimate quest for robust query processing.
Handling Database Scaleup: While PlanBouquet is inherently robust to changes in data distribution,
since these changes only shift the location of qa in the ESS, the same is not true with regard to database scale-up. That is, if the database size increases significantly, then the original ESS no longer covers the entire error space. An obvious solution to handle this problem is to recompute the bouquet from scratch, but most of the processing may turn out to be redundant. Therefore, the development of incremental and efficient techniques for bouquet maintenance is essential for catering to dynamic databases.
Query Graph-sensitive Robustness: The robustness techniques that have been developed in the
literature thus far are largely agnostic to the specifics of the join graph of the query under consideration. That is, whether the graph is a chain, cycle, star, clique, etc. with regard to its structure. It appears quite logical to expect that this information could be gainfully used to improve the robustness of the resulting execution – for instance, it should be simpler to assure good performance for chain queries, where the optimization choices are limited, as opposed to star queries.
Geometries of Plan Cost Functions: Most of the prior work has only assumed that plan cost functions are monotonic with regard to selectivities. However, in practice, the functions often exhibit greater regularity in their behavior – for example, the Bounded Cost Growth property, recently leveraged in,  to ensure bounded suboptimalities of the Parametric Query Optimization choices. In BCG, the relative increase of plan costs is modeled as a low-order polynomial function of the relative increase in plan selectivities. A stronger constraint that has also been found to be generally true in practice is to assume concavity in cost functions, wherein they exhibit monotonically non-increasing slopes. Such constraints can be utilized to improve the robustness of the query processing solutions.
Dimensionality Reduction of the ESS: An important question is what predicates constitute the ESSin the PlanBouquet framework. A simple conservative option would be to assign all predicates (filter, join, projection) to be uncertain, but this would needlessly increase the MSO guarantee. Therefore, an interesting exercise would be to use a combination of domain knowledge, query logs, cost behavior and machine learning techniques to reduce the number of dimensions to the bare minimum required in the ESS.
Portable MSO Guarantees: The PlanBouquet formulation, while breaking new ground, suffers from a systemic drawback – the specific value of p, and therefore the bound, is a function of not only the query, but also the optimizer’s behavioral profile over the underlying database platform (including data contents, physical schema, hardware configuration, etc. ). As a result, there are adverse consequences: (i) The bound value becomes highly variable, depending on the specifics of the current operating environment (ii) It becomes infeasible to compute the value without substantial investments in preprocessing overheads; and (iii) Ensuring a bound that is small enough to be of practical value, is contingent on the heuristic of “anorexic reduction”  holding true.
In a recent work  , we have made initial progress on addressing the above limitations. Specifically, we have presented an algorithm called SpillBound, which achieves portable MSO guarantees. That is, its MSO bound is only a function of D, the dimensionality of the ESS. However, the function is quadratic in D, which means that MSO values rapidly increase with increasing dimensionality. Therefore, an interesting future problem is to investigate whether it would be feasible to provide portable MSO bounds that are linear in D.
Machine Learning Techniques for Component Selection: We advocate that the database engine should have a multiplicity of components that separately but cooperatively cater to the various query processing environments. An obvious question that arises with such an architecture is how to determine the specific environment that is currently operational, and hence the associated component. We are of the view that machine learning techniques could be used to judiciously make this choice, similar to the exercise recently carried out in  in the context of analytical data flows.
Graceful Performance Degradation: A major problem faced in real deployments is the presence of “performance cliffs”, where the performance suddenly degrades precipitously although there has only been a minor change in the operational environment. This is particularly true with regard to hardware resources, such as memory. So, an important future challenge is to design algorithms that provably degrade gracefully with regard to all their performance related parameters.
In closing, we would like to use this platform to reiterate that there are a host of conceptually challenging and eminently practical problems to be addressed in the database query processing domain – it is our earnest hope that this article would encourage at least a few readers, especially young students, to investigate further and take the field forward.