This repository contains an implementation of factorised databases focused on computing the size of a join result and the resulting compression factor.
The aim of factorised databases is to reduce the redundancy appearing in query results of conjunctive queries. By distributing the products over unions it is often possible to avoid enumerating Cartesian products that appear in a join result. This is achieved using analysis of the query given an order of the variables used (d-tree). This representation is amenable for further computation and can be used o achieve significant performance improvements when training certain machine learning models directly on the database. See the TODS 2015 paper Size Bounds for Factorised Representations of Query Results for more details on this method.
The implementation in this repository focuses on computing the size of a factorised representation given a database and a query/variable order. It returns a compression factor computed as C = (A*T)/V where
- A is the number of attributes in the query result (i.e., the number of variables),
- T is the number of tuples in the query result, and
- V is the number of singleton values in the factorised representation of the result.
More information on the principles of factorised databases can be found on the main page.
FBENCH 1.0 is now released to the public.