Library Articles
Aggregation over Query Tables and Materialized Windows
Authors: Larry Schumacher, Richard Tibbetts
Contributor: Dr. John Lifter
StreamBase Systems
26-February-2007
Applicable To: StreamBase 3.7, 5.0
Topics:
Prior to StreamBase 3.7, a StreamSQL EventFlow Query operator or a StreamSQL SELECT statement run against a Query Table Data Construct would emit one or more tuples, one for each row retrieved from the table. Field values in these emitted tuples were restricted to values derived from the fields in the incoming tuple and the tuples retrieved from the table. Unlike a query against a relational database table, it was not possible to include aggregate values, such as max, min, or average, as return fields.
If the objective of your application was to return a collection of tuples for further processing, this default behavior was appropriate. However, if your objective was to extract aggregate values from the collection of tuples, managing the collection could become quite involved as the number of tuples in the collection was generally variable. Your application would need several additional operators or StreamSQL statements in order to demarcate the beginning and end of the tuple collection returned by each query.
While the ability to retrieve a collection of tuples from a table is still available and will continue to be used in many applications, StreamBase v.7 adds the capability of replacing the collection of tuples with a single tuple containing values derived from aggregate functions applied to the collection of tuples. This functionality will reduce the complexity of your applications as it will no longer be necessary to use an EventFlow Aggregate operator or StreamSQL windowed stream to manipulate the collection and obtain aggregate values.
In addition, StreamBase 3.7 includes the Materialized Window Data Construct, which, like the Query Table Data Construct, is accessed through the Query operator or a StreamSQL SELECT statement. When using either of these data constructs, you have the option of returning either a collection of tuples or a single tuple containing aggregate values derived from the collection.
When using the Query operator in conjunction with the Query Table or Materialized Window Data Constructs, you make the decision to return a collection of tuples, versus a tuple containing values from aggregate functions, within the Properties view, Output Settings tab. The Output: group now includes three options: all input and tuple fields, explicitly selected fields, and expressions. The first two options are the same as in earlier versions of StreamBase and return a collection of tuples containing fields from the input tuple and each row selected from the Query table or Materialized Window. If you select the third option, you must list the fields in the emitted tuple in the Output Expressions: control, as shown in the following example, which assumes that the table or materialized view contains rows defined by the schema stock string(4), volume int, price double, and that the input tuple contains a single field named symbol.

When configured in this way, each incoming tuple will trigger a query and the emitted tuple will contain four fields:
- The field
totalShares, which contains the sum of the value in the volume field from all rows retrieved from the table.
- The field
highestPrice, which contains the largest value from the price field from all rows retrieved from the table.
- The field
averagePrice, which contains a value calculated from the return values of two aggregate functions applied to all rows retrieved from the table.
- The field
stockSymbol, which contains the value in the symbol field from the tuple that triggered the query.
Settings that specify what rows are retrieved from the table or materialized window are entered on the Properties view, Query Settings tab. From the Operation: dropdown control select Read and from the Where: dropdown control select All Rows, one of the available indexes, or Expression. Depending on the type of index (unordered hash or ordered btree), you complete the specification by making entries into the Matches: or Range Specification: control. The Expression option is a new choice that allows you to directly enter the predicate that will be used to select the tuples retrieved from the table or materialized window, as illustrated in the following figure.

In the preceding figure, the entry in the Lookup expression: text box is the predicate used to select rows from the table or materialized window. Note how the names of fields obtained from the incoming tuple must be prefaced with the identifier input even if there is not a similarly named field in the table or materialized window. You may optionally preface the names of fields in the table or materialized window with the identifier current. More involved predicates using multiple fields from the tuple, table or materialized window, and/or the &&, | |, <, =, >, =, !=, and/or arithmetic operators are also supported. The fields from the table or materialized window used in the predicate expression do not need to be indexed.
Using StreamSQL to query a table or materialized window is less involved than the EventFlow Query operator. The target list simply includes calls to aggregate functions and the predicate follows the WHERE keyword. Omitting the WHERE clause selects all rows from the table or materialized window. In writing a WHERE clause predicate, prefacing field names with the name of the tuple, table, or materialized window is optional as long as there is no ambiguity, for example, when a tuple field has the same name as a table field.
The following code fragment illustrates the content of a StreamSQL statement that uses aggregate function calls when applied against a Query table or Materialized Window. Note that in StreamSQL the prefixes input and current have no meaning and are replaced, if necessary, by the names of the stream and table or materialized window.
SELECT sum(volume) AS totalShares,
max(price) AS highestPrice,
sum(price)/sum(volume) AS averagePrice,
symbol AS stockSymbol
FROM StreamNameIn, QueryTableOrWindowName
WHERE
QueryTableOrWindowName.stock ==
StreamName.symbol
INTO StreamNameOut;
The WHERE clause is optional and if omitted the collection will include all rows in the table or materialized window. Other than the use of aggregate functions in the target list, the syntax of this SELECT statement is identical to the syntax used when a tuple is emitted for each entry in the collection returned by the query.
When the expressions option is selected, a tuple will be emitted even if the table or materialized window does not include rows that satisfy the selection criterion (the WHERE clause). Consequently, a comma, rather than the OUTER JOIN keywords, is used in the FROM clause. When the all input and table fields or explicitly selected fields options are chosen, the FROM clause should include the OUTER JOIN keywords unless you are certain that all queries will retrieve rows from the table or materialized window.
The Materialized Window Data Construct