Grouping on distinct values of attributes A, B, ... is specified with "grouping A, B, ..." completely analogous to the "group by" clause in SQL. Grouping on equivalence classes is described with the construct "SV", short for "Substitutable Values". This is because the Grouping-SV functionality behaves similar to the SV relations on preferences. The equivalence classes forming the groups are from now on called (grouping) SV classes . The syntactic schema of an SV classes specification is as follows (where the [...] braces indicate optional elements):
GROUPING
A SV ((A_1, A_2, ...) [AS 'class1_a'] ,
(A_3, A_4, ...) [AS 'class2_a'] , ...,
[OTHERS [AS 'others_class_a'] ] ) [AS sv_a],
B SV ((B_1, B_2, ...) [AS 'class1_b'] , ...,
[OTHERS [AS 'others_class_b'] ] ) [AS sv_b], ...
Thereby A_i and B_i are domain values according to the attributes A and B. The domain values (A_1, A_2, ...) are considered equal w.r.t. the given SV relation for attribute A. The names "class..." are aliases for the SV classes which occur in the projection. If not specified, (A_1, A_2, ...) is the default name for an SV class with attributes A_1, A_2, .... The keyword "others" puts all elements which were not be mentioned before, in one "default" SV class. If "others" is not given, all tuples which are not in one of the given SV classes form a SV class of its own.
The name of an SV-attribute (e.g. "sv_a") is also optional, but should be given to reference the SV grouping attributes in the projection. Note that the original attributes A, B, ... stay unchanged; they can still be referenced in the projection.
The schema of an entire Preference SQL query containing the grouping construct is depicted below, where agg(...) symbolizes an aggregation function, which is optional in the projection.
SELECT <projection, aggregation>
FROM <table_reference>
WHERE <hard_conditions>
PREFERRING <soft_conditions>
GROUPING <attribute_list>
TOP <number>
BUT ONLY <but_only_condition>
HAVING <aggregating_hard_conditions>
ORDER BY <attribute_list>
Such a query is evaluated as follows:
Note that top, but only, having and the use of aggregation is optional. Even the "preferring" keyword is optional, which allows us to use the Grouping-SV syntax for conventional aggregations. Thus we are able to simplify "group by" queries with the lengthy "case ... then ... end" statements. These can be mostly be replaced by our Grouping-SV-syntax which we will illustrate in the following Example.
The order of execution is as follows, wherein all those steps are skipped which are not specified in the query:
Assume the following sample dataset with some cars:
id | make | power | price |
---|---|---|---|
1 | Smart | 60 | 15000 |
2 | Mercedes | 200 | 38000 |
3 | Audi | 180 | 29000 |
4 | VW | 110 | 25000 |
5 | Bugatti | 1000 | 500000 |
We give an example to calculate the average price of the two cheapest cars of every car manufacturer. We assume that only an attribute "make" is given for every tuple in the relation, hence we have to retrieve the manufacturer by an appropriate SV grouping on "make".
SELECT manufacturer, avg(price)
FROM car
PREFERRING price LOWEST
TOP 2
GROUPING make SV ( ('Mercedes', 'Smart', 'AMG') AS 'Daimler', ('VW', 'Audi', 'Bugatti') AS 'Volkswagen' ) AS manufacturer
The result of this query on the given sample dataset is:
manufacturer | avg (price) |
---|---|
Daimler | 26500 |
Volkswagen | 27000 |
Note that the Bugatti is not in the Top-2 set of the manufacturer "Volkswagen", hence not included in the average price calculation.
If we omit the preference, i.e., just consider the aggregating query:
SELECT manufacturer, avg(price)
FROM car
GROUPING make SV ( ('Mercedes', 'Smart', 'AMG') AS 'Daimler', ('VW', 'Audi', 'Bugatti') AS 'Volkswagen' ) AS manufacturer
we could express this in standard SQL in a more lengthy way:
SELECT manufacturer, avg(price)
FROM (
SELECT (CASE when make IN ('Mercedes', 'Smart', 'AMG') then 'Daimler' when make IN ('VW', 'Audi', 'Bugatti') then 'Volkswagen' ELSE make end) AS manufacturer, price
FROM car) tmp_car
GROUP BY manufacturer
Hence the Grouping-construct allows us to denote group-by-case statements in a more concise way and without subqueries.