Library Articles
The Materialized Window Data Construct
Authors: Eddie Galvez, Kimberley Burchett, Dr. John Lifter
StreamBase Systems
26-February-2007
Revised: 22-March-2007
Applicable To: StreamBase 3.7, 5.0
Topics:
A materialized window is a managed view of tuples passing on a stream. It is conceptually equivalent to a dimensioned Query Table — a store of tuples whose content changes as tuples move along the stream. The dimensioning of the table can be based on a fixed number of tuples, on a time interval measured on the computer running the StreamBase application, or on a field value in the tuples contained in the stream. The contents of a Materialized Window Data Construct are accessed through the StreamSQL EventFlow Query operator or a StreamSQL SELECT statement.
The Materialized Window Data Construct is available in StreamBase v3.7 and subsequent releases.
Tuple Based Materialized Windows
The window is configured to maintain a fixed number of tuples — for example, n.
When a query is executed against the materialized window, the window identifies the most recently stored tuple and tests it and the preceding n-1 tuples against the selection criteria. The query retrieves from the materialized window a collection of tuples, selected from the most recently stored n tuples, which meet the selection criteria.
The number of tuples contained in the collection cannot be larger than n and depending on the selection criteria it is possible that the collection could be empty.
Time Based Materialized Windows
The window is configured to store tuples that arrive over a specified duration of time — for example, s seconds.
When a query is executed against the materialized window in StreamBase 3.7.0, the window determines the arrival time of the most recently stored tuple and tests the tuples that arrived during the preceding s seconds against the selection criteria.
When a query is executed against the materialized window in StreamBase 3.7.1 and later releases, the window obtains the current time from operating system and tests the tuples that arrived during the preceding s seconds against the selection criteria.
The number of tuples contained in the collection cannot be predicted from the configuration of the materialized window since it is unknown how many tuples will arrive during the s second period. Depending on the selection criteria it is possible that the collection could be empty.
Field Based Materialized Windows
The window, which is configured to store tuples whose value in a specified field falls within a proscribed range – for example, r – uses the most recently arrived tuple as the anchor point for interpreting dimension specifications. To use this approach, the values in the specified field must be ordered, that is, increasing with each arriving tuple. While the tuple field may be of type integer or double, the range is of type double, which is the type used by the materialized window in evaluating its dimension.
When a query is executed against the materialized window, the window determines the value – v – in the specified field for the most recently stored tuple and tests previously arrived tuples whose field values are greater than v-r against the select criteria. The query retrieves from the materialized window a collection of tuples, selected from the tuples whose field value was within the specified range, which meet the selection criteria.
For example, if tuples with the following field values were submitted to the materialized window – 10, 11, 12, 13, 14, 15, and 16 – and the specified range is 5.0, then the only tuples that could be included in the collection would have field values 12, 13, 14, 15, and 16. If the range were set to 5.1, then the tuple with field value 11 would also be evaluated for inclusion in the collection
The number of tuples contained in the collection cannot be predicted from the configuration of the materialized window since it is unknown how many tuples will fall within the target range. Depending on the selection criteria it is possible that the collection could be empty.
A simple example application is shown in the following figure.

The schema on InputStream1 contains two fields — stock string(4) and value int, and the schema on InputStream2 contains the field symbol string(4).
To create the application, drag the Materialized Window Data Construct icon onto the application diagram. You cannot begin configuring this component until you connect a stream to its input port. The materialized window discovers its schema from the schema of the incoming stream, so you do not need to define a schema as is required when using the Query Table Data Construct.
In the data construct's Properties view, Window Settings tab, select the dimension type (tuple, time, or field), enter a size value (which is interpreted as the number of tuples, period of time, or range of values), and whether the contents of the window will be stored in memory or on disk. If you specify a field based dimension, the associated dropdown list control will include only the integer and double type fields in the tuple being stored.
The following figure shows configuration of a disk based materialized window for which the dimension is based on the contents of the tuple field value with a range setting of 5.1.

If desired, use the controls on the Partition Options tab to partition the stored tuples into separate windows. For example, in a financial services application, a separate window could be created for each stock represented in the incoming stream.
Use the Secondary Index tab to define unordered (hash) or ordered (btree) indexes on any of the fields in the stored tuple. Indexing allows more expressive selection criteria to be written, but has no effect on whether, and for how long, a tuple is stored in the materialized window.

Your application uses a Query operator to read from a materialized window. When the Query operator is connected to a Materialized Window Data Construct, it fine-tunes itself for the task, specifically on its Properties view, Query Settings tab the Operation: dropdown list is disabled and the Read operation is permanently selected. From the Where: dropdown control, select All Rows, one of the available indexes, or Expression. Depending on the type of index (unordered hash or ordered btree), you complete the specification by making entries into the Matches: or Range Specification: control. The Expression option is a new choice that allows you to directly enter the predicate that will be used to select the tuples retrieved from the materialized window, as illustrated in the following figure.

In the preceding figure, the entry in the Lookup expression: text box is the predicate used to select tuples from the materialized window. Note how the names of fields obtained from the incoming tuple must be prefaced with the identifier input even if there is not a similarly named field in the materialized window. You may optionally preface the names of fields in the materialized window with the identifier current. More involved predicates using multiple fields from the tuple or materialized window, and/or the &&, | |, <, =, >, =, and/or != operators are also supported. The fields from the materialized window used in the predicate expression do not need to be indexed.
On the Query operator's Properties view, Output Settings tab, the Output: group now includes three options: all input and tuple fields, explicitly selected fields, and expressions. The first two options are the same as in earlier versions of StreamBase and a collection of tuples containing fields from the input tuple and each tuple selected from the materialized window will be emitted. If you select the third option, you must list the fields in the emitted tuple in the Output Expressions: control, as shown in the following example, where the materialized window contains tuples defined by the schema stock string(4)and value int.

When configured in this way, each incoming tuple will trigger a query and the emitted tuple will contain one field, total, which contains the sum of the values in the value field from all tuples retrieved from the materialized window. The emitted tuple may also include fields from the input tuple.

Two StreamSQL statements are used to define a materialized window. Use the CREATE MATERIALIZED WINDOW statement to describe a materialized window.
CREATE [MEMORY | DISK] MATERIALIZED WINDOW
window_identifier AS
stream_identifier '[' window_specification']';
The single quotation marks indicate that the square braces are a required part of the statement syntax. Note that the statement does not define the schema associated with the window; the window discovers its schema from the schema defined on its incoming stream. The window_specification contains three entries: SIZE; one of the entries TIME, TUPLES, or ON field_identifier; and an optional PARTITION BY field_identifier entry. For the example discussed above, the statement would have the following content.
CREATE MEMORY MATERIALIZED WINDOW
MaterializedWindow1 AS
stream_identifier [SIZE 5.1 ON value];
Indexes are defined through the CREATE INDEX statement.
CREATE INDEX index_identifier ON
window_identifier [USING {HASH | BTREE}]
'('column_idnetifier[,...]')';
The single quotation marks indicate that the parentheses are a required part of the statement syntax. By default, ordered btree indexing is utilized. For the example discussed above, the statement would have the following content.
CREATE INDEX index_identifier ON
Materializedwindow1 USING HASH
(value);
The StreamSQL version of the entire application includes the following statements.
CREATE INPUT STREAM InputStream1 (
stock string(4),
value int
);
CREATE INPUT STREAM InputStream2 (
symbol string(4)
);
CREATE OUTPUT STREAM OutputStream1;
CREATE MEMORY MATERIALIZED WINDOW MaterializedWindow1
AS InputStream1 [SIZE 5.1 ON value];
CREATE INDEX i1 ON MaterializedWindow1
USING HASH (value);
SELECT sum(value) AS total, symbol AS selectedStock
FROM InputStream2 OUTER JOIN MaterializedWindow1
WHERE MaterializedWindow1.stock==InputStream2.symbol
INTO OutputStream1;
The prefixes used in the WHERE clause are not needed if there is no ambiguity between the tuple and window field names, although the WHERE clause is optional and if omitted the collection will include all tuples in the materialized window. Other than the use of aggregate functions in the target list, the syntax of this SELECT statement is identical to the syntax when a tuple is emitted for each entry in the collection returned by the query.
In the FROM clause, the OUTER JOIN keywords are required if it is possible that the query will retrieve an empty collection from the materialized window. This ensures behavior similar to an EventFlow Query operator, which guarantees that at least one tuple will be emitted into the target stream and that downstream processing can continue. If you are certain that a query will never return an empty collection, you may substitute a comma for the OUTER JOIN keywords.
Back to Top ^