Principles of Factorised Databases

State-of-the-art relational data representation and processing are highly redundant. This project investigates the sources of this redundancy and proposes alternatives that avoid it.

For relational join processing, the culprit behind this redundancy is the Cartesian product lurking within every database join. By avoiding the listing (enumeration) of the output of these products, we can lower the space and computational costs for joins. See first this simple example and then the second more insightful example .

This redundancy can be avoided via factorisation

We developed three distinct lossless factorised representations for join queries and even the more general functional aggregate queries:

  • The f-representations (see ICDT'12 paper ) are to the listing representation what nested formulas are to formulas in disjunctive normal form. They are relational algebra expressions with operators union and product over singleton relations (relations with one column and one row). They exploit the distributivity of the product over union to factor out common subexpressions.
  • The d-representations (see TODS'15 article ) are f-representations with definitions: A recurring subexpression can be given a name and referenced several times by name instead of content. They correspond to circuits.
  • The covers (see ICDT'18 paper ) of a query result is a minimal subset of this result from which we can losslessly reconstruct it entirely.

Compression by factorisation can also be applied to query provenance or lineage ( ICDT'12 and TAPP'11 papers).

Analytics can be computed directly over factorised data

The factorised representations of a join result allow for the enumeration of the result tuples with constant delay, which is as efficient as enumerating them from the listing representation of the result. This is a prerequisite for the computation of aggregates (and more complex analytics) in one pass over factorised data.
In case the materialisation of the factorised join result is not desirable, aggregates can be computed on the fly. In this case, the trace of the aggregate-join computation is that of the underlying factorisation and corresponds to (partially) pushing aggregates past joins.
To explore more on: constant-delay enumeration and worst-case optimal join factorisation see the TODS'15 article ; factorised join plans see the VLDB'12 paper; aggregates over factorised joins see the SIGMOD Record'16 and VLDB'13 papers; machine learning over factorised joins see the SIGMOD'16 and PODS'18 papers.

Demystifying Factorisation and Related Work

Publications

Overviews

  • Factorized Databases . [ preliminary version ] Journal
    Dan Olteanu and Maximilian Schleich.
    In SIGMOD Record (Database Principles Column), vol. 45, no. 2, June 2016.
  • Factorized Databases: A Knowledge Compilation Perspective . [ pdf ] Workshop
    Dan Olteanu.
    In BeyondNP, AAAI workshop, Phoenix, April 2016.

Factorised Representations for Data and Provenance

  • On Functional Aggregate Queries with Additive Inequalities . [ arxiv , slides , poster , video ] Conference
    M. Abo Khamis, R. Curtin, B. Moseley, H. Ngo, X. Nguyen, D. Olteanu and M. Schleich.
    In ACM Principles of Database Systems (PODS), Amsterdam, July 2019.
    Extended version in arXiv report 1812.09526 , April 2019.
  • Covers of Query Results . [ arxiv ] Conference
    Ahmet Kara and Dan Olteanu.
    In Int Conf on Database Theory (ICDT), Vienna, March 2018.
    arXiv report 1709.01600, March 2017.
  • Worst-Case Optimal Join At A Time . Tech Report
    Radu Ciucanu and Dan Olteanu.
    Technical report, November 2015.
  • Size Bounds for Factorised Representations of Query Results . [ preliminary version ] Journal
    Dan Olteanu and Jakub Závodný.
    ACM Transactions on Database Systems (TODS) 40(1):2, 2015.
  • Factorised Representations of Query Results: Size Bounds and Readability . [ pdf , slides ] Conference
    Dan Olteanu and Jakub Závodný.
    In Int Conf on Database Theory (ICDT), Berlin, 2012.
    Prior version (strict subset of results) in arXiv technical report 1104.0867 , April 2011.
  • On Factorisation of Provenance Polynomials . [ pdf , poster ] Workshop
    Dan Olteanu and Jakub Závodný.
    In 3rd USENIX Workshop on the Theory and Practice of Provenance (TaPP), June 2011, Heraklion, Crete.

Factorised Query Processing

  • Boolean Tensor Decomposition for Conjunctive Queries with Negation . [ arXiv ] Tech Report
    Mahmoud Abo Khamis, Hung Ngo, Dan Olteanu, and Dan Suciu.
    arXiv report 1712.07445, Dec 2017.
  • Worst-Case Optimal Join At A Time . Tech Report
    Radu Ciucanu and Dan Olteanu.
    Technical report, November 2015.
  • Size Bounds for Factorised Representations of Query Results . [ preliminary version ] Journal
    Dan Olteanu and Jakub Závodný.
    ACM Transactions on Database Systems (TODS) 40(1):2, 2015.
  • Aggregation and Ordering in Factorised Databases . [ pdf ] Conference
    Nurzhan Bakibayev, Tomáš Kočiský , Dan Olteanu, and Jakub Závodný.
    March 2013.
    In Very Large Data Bases (PVLDB), vol 6, 2013.
    Complementary information available in Tomas'thesis
  • Demonstration of the FDB Query Engine for Factorised Databases . [ pdf ; poster ] Demo
    Nurzhan Bakibayev and Dan Olteanu and Jakub Závodný.
    System demonstration. In Very Large Data Bases (PVLDB), 5(12), 2012. Istanbul, 2012.
  • FDB: A Query Engine for Factorised Relational Databases . [ pdf ] Journal
    Nurzhan Bakibayev and Dan Olteanu and Jakub Závodný.
    In Very Large Data Bases (PVLDB), 5(12), 2012. Istanbul, 2012.
    Prior version in arXiv technical report 1203.2672.

Code