Using the Materialized Window Data Construct

A materialized window is a managed view of tuples passing through an input stream. The view can be based on a fixed number of tuples, a time interval, or a field value. It can also be partitioned into multiple windows. You can then directly query the data using one or more (read-only) Query operators.

How a Materialized Window Data Construct Interprets Dimension Specifications

A materialized window uses the most recently arrived tuple as the anchor point for interpreting dimension specifications. From the most recently arrived tuple, the materialized window examines previously stored tuples to determine if they fall within the specified dimension.

Tuple Based Materialized Windows

The window is configured to maintain a fixed number of tuples.

When a query is executed against the materialized window, the window identifies the most recently stored tuple and tests it and the preceding n-1 tuples against the selection criteria. The query retrieves from the materialized window a collection of tuples, selected from the most recently stored n tuples, which meet the selection criteria.

The number of tuples contained in the collection cannot be larger than n and depending on the selection criteria it is possible that the collection could be empty.

Time Based Materialized Windows

The window is configured to store tuples that arrive over a specified duration of time.

When a query is executed against the materialized window, the window determines the arrival time of the most recently stored tuple and tests it and all tuples that arrived during the preceding s seconds against the selection criteria. The query retrieves from the materialized window a collection of tuples, selected from the most recently arrived tuple and the tuples arriving during the preceding s seconds, which meet the selection criteria.

The number of tuples contained in the collection cannot be predicted from the configuration of the materialized window since it is unknown how many tuples will arrive during the s second period. Depending on the selection criteria it is possible that the collection could be empty.

Field Based Materialized Windows

The window is configured to store tuples whose value in a specified field falls within a certain range. To use this approach, the values in the specified field must be ordered such that they increase with each arriving tuple. While the tuple field may be of type integer or double, the range is of type double, which is the type used by the materialized window in evaluating its dimension.

When a query is executed against the materialized window, the window determines a value v in the specified field for the most recently stored tuple and tests previously arrived tuples whose field values are greater than r-v against the select criteria. The query retrieves from the materialized window a collection of tuples, selected from the tuples whose field value was within the specified range, which meet the selection criteria.

For example, if tuples with the following field values were submitted to the materialized window: 10, 11, 12, 13, 14, 15, and 16, and the specified range is 5.0, then the only tuples that could be included in the collection would have field values 12, 13, 14, 15, and 16. If the range were set to 5.1, then the tuple with field value 11 would also be evaluated for inclusion in the collection.

The number of tuples contained in the collection cannot be predicted from the configuration of the materialized window since it is unknown how many tuples will fall within the target range. Depending on the selection criteria it is possible that the collection could be empty.

Using a Materialized Window in an EventFlow Application

The rest of this topic describes how to use a materialized window in an EventStream application. To use materialized windows with a StreamSQL application, see the CREATE MATERIALIZED WINDOW Statement in the StreamSQL Guide.

In the EventStream application, an input stream feeds a Materialized Window data construct, which in turn is connected to one or more Query operators. Each Query operator reads the data in the materialized window and can pass through or manipulate the data in different ways, merging the result with tuples from its own input stream. Note that a materialized window can be associated with multiple Query operators, but each Query operator can be associated only with one materialized window.

To use a materialized window in an EventFlow application:

  1. In an EventFlow, drag a Materialized Window icon from the Palette view to your canvas, creating a new data construct.

  2. Open the Properties view of your Materialized Window data construct.

  3. Name the component by editing the General Tab.

  4. Choose the window type by editing the Window Settings Tab.

  5. Optionally, partition the data by editing the Secondary Indices Tab.

  6. Optionally, group the data by editing the Secondary Indices Tab.

  7. Create the Query operator or operators that will be associated with the materialized window. For each one, drag a Query Table icon from the Palette view to the canvas.

  8. Connect the Query operator or operators to the Materialized Window data construct.

  9. Edit each Query operator's properties as described in Using the Query Operator.

General Tab

Name: Every application component must have a unique name. Use this field to specify or change the component's name. The name must contain only alphabetic characters, numbers, and underscores, and no hyphens or other special characters. The first character must be alphabetic or an underscore.

Description: Optionally, enter a description to briefly describe the component's purpose and function. In the EventFlow canvas, you can see the description by pressing Ctrl while the component's tooltip is displayed.

Window Settings Tab

In the Window Settings tab:

  1. Choose the window type and specify its boundary value:

    • Tuple: The window (or each partition in the window) will contain a fixed number of tuples. (For more information about partitions, read the section on the Partitions Options Tab.)

      Enter the number of tuples in the Size field. The window will contain up to the specified number of tuples. When an arriving tuple would cause the window to exceed the size, the window closes. A new window opens with the arriving tuple as its first member.

    • Time: The window will include tuples that arrive during a specified time.

      Enter the interval in seconds. After that interval, the next arriving tuple causes the window to close. A new window opens with the arriving tuple as its first member.

    • Field: The window will include tuples whose key fields have values within a certain range of each other. Note that this option is only available if the input stream schema contains a field of numeric data type: int, long, double, or timestamp (where the units are seconds).

      Click the drop-down control and choose the field on which you want to base the window. Then in the Size field, specify the size of the range of values.

      As each tuple arrives, its key field is compared to the same field in existing tuples. If the values of all the key fields are within the specified range, the tuple is added to the materialized window. If the new tuple causes the values in all the key fields to exceed the range, the new tuple is added but one or more existing tuples are flushed from the window, so that the key fields of the remaining tuples do not exceed the range.

      For example, suppose you define a Materialized window. For Field you enter Shares, and the Size is 20:

      1. Two tuples arrive with Shares values of 5 and 15. At this point the window includes both tuples because their key field values are within 20 of each other.

      2. Next, suppose a tuple arrives whose Shares field is 25. The window stores the latest tuple. Now the key value 5 is out of the specified range, so its tuple is flushed. The remaining tuples have Shares of 5 and 25, which is just within the range.

      3. Finally, a tuple arrives with a Shares value of 57. Now, after storing this tuple, the window flushes all the others, because none of their key fields are within the specified range from the new tuple.

  2. Specify the storage type: either In memory or On disk.

Partitions Options Tab

For tuple-based materialized windows only, you have the option of creating partitions based on key fields. A partition is created for each instance of the field that you specify. Each partition can have up to the number of tuples specified in the Size specified in the Window Settings tab.

For example, if you partition based on a field named Symbol, a partition is created for each value of Symbol that arrives on the input stream: thus, you might have separate partitions IBM, CTXS, and INTC. If the window size was set to 20, each partition can contain up to 20 tuples.

Partitions are optional. Without partitions, all tuples are treated as a single group. Note that you cannot set partitions (and the Partition Options tab is not usable) if you set the Time or Field type in the Window Settings tab.

To specify a partition, use the controls to move one or more fields from the Available Fields column to the Selected Fields column. If you add multiple fields, the partitions, are based on combinations of those fields. For example, if you add two fields, ID and Class, a partition is created for each combination of ID and Class that arrives on the input stream.

Secondary Indices Tab

A materialized window has a primary index that is created and managed automatically by StreamBase. You can optionally list one or more secondary indices in the Secondary Indices tab, specifying key fields to use when looking up values. Note that if you have partitioned the materialized window, indexing occurs within each partition.

Secondary fields can speed up performance during queries. For example, if you have defined a partition, you can also create a secondary index based on the partition field. Then, later when you configure a Query operator, you can set your query to read the secondary index instead of all the rows. To define a secondary index:

  1. Click Add to display the Edit Secondary Index dialog.

  2. In the Available Fields list, double-click each field that you want to add to the index.

  3. Click the Add to Index List button.

  4. In the Available Fields list, double-click each field that you want to add to the index. (Alternatively, use the arrow buttons.)

    You can also choose how keys are indexed for table read operations by using the Index Type control:

    Unordered, no ranges (hash)

    Keys are unsorted, and they are evenly distributed (hashed) across the index. A hash index is used for accessing keys based on equality, and are generally best for doing simple lookups.

    Ordered, with ranges (btree)

    Keys are sorted. A btree index is used when output ordering and range queries are desired. Note that the sort depends on the order of the fields in the index keys.

    The relative performance of hash and btree methods depends on many factors, including the distribution of keys in your dataset: we recommend trying both methods if you are in doubt which to use. Also consider that StreamBase Studio allows you to specify a key range and sort order using btrees, but not using hash access.