Using the Query Operator

This topic explains how the Query operator interacts with shared data in a Query Table, Materialized Window, Chronicle Connection, or JDBC Table data construct, and describes the Query operator's options on its Properties View.

Introduction

The Query operator allows you to read data from, write data to, or delete data from, a Query Table, Materialized Window, or an external data source that is connected through a Chronicle Connection or JDBC Table data construct.

A Query operator can merge data from its associated data construct and one input stream in the application. The operator enables you to query the associated table or data source and emit the result on the operator's output stream. The output can also pass through fields from the input stream. When accessing a Query Table or Materialized Window, a query operation can also be grouped on specific selections, so that separate windows are maintained for each column instead of a single result.

Query Tables may or may not be stored on-disk, depending on how you define them. For a memory-resident table, any data stored at runtime is not saved after the StreamBase Server is shut down. For disk-based Query Tables (Enterprise Edition only) and Materialized Windows, and for all JDBC Tables and Chronicle Connections, the data can persist indefinitely. For example, you could store data for a StreamBase application that runs only during a particular time each day, and requires data from the previous run during initialization. Another possible use of persistent tables is to maintain state information for a StreamBase application.

Associating a Query Operator with a Data Construct

The Query operator displays different settings and tabs, depending on which kind of data construct you connect it to. Most differences stem from the fact that Query Tables are defined and managed within StreamBase, while external data sources are not. So with a Query Table, you use StreamBase Studio controls, tables, and commands to perform read, write, and delete operations. By contrast, with an external data source, you use SQL statements like SELECT, INSERT, UPDATE, and DELETE. The exact SQL syntax required depends on the type of data source that you use.

To associate a Query Operator with a data construct in StreamBase Studio, create an arc connecting the bottom port of the operator and the port on the top of the data construct. Note that each Query operator can be associated with only one data construct. Conversely, a Query Table, Materialized Window, or JDBC Table can be associated with multiple Query operators.

Before you can define Query operator properties, the associated data construct must be set up. For setup information for each type of data construct, see the appropriate topic in the StreamBase Help Authoring Guide.

The remainder of this topic describes the actions you can take on each tab of the Query operator's Properties View.

General Tab

Name: Every application component must have a unique name. Use this field to specify or change the component's name. The name must contain only alphabetic characters, numbers, and underscores, and no hyphens or other special characters. The first character must be alphabetic or an underscore.

Enable Error Output Port: Check this box to add an Error Port to this component. In the EventFlow canvas, the Error Port shows as a red output port, always the last port for the component. See Using Error Ports and Error Streams to learn about Error Ports.

Description: Optionally, enter a description to briefly describe the component's purpose and function. In the EventFlow canvas, you can see the description by pressing Ctrl while the component's tooltip is displayed.

Query Settings Tab

Use the Query Settings tab to set the kind of query operation to be performed, and to select the data that you want to query in the associated data construct. This tab can be edited only after you have connected the Query operator to a valid Chronicle Connection, Query Table, or JDBC Table. The associated table name is shown at the top. In all other respects, the tab's appearance and function are different depending on the associated table's type. For details on using this tab, refer to one of these subsections:

Query Settings with an Associated Query Table

When the operator is connected to a Query Table, use the Operation control to specify the type of query to perform on the associated table: Read, Write, or Delete.

In the Where control, select the data that is to be queried by choosing one of the available options:

Primary Index or Secondary Index

Select data using one of the indices.

  • If the index has been set (in the associated Query Table's Properties view) to use the unordered, no ranges (hash) index method, the Matches table is displayed. The name, data type, and size of each key field from that index are displayed in read-only columns. Enter an expression in the last column to specify what to match on. The result of the expression must be compatible with the key's data type and size. For example, to match on all values, specify the name of the field; to match only on a specific value, specify the value.

  • If the index has been set to use the ordered, with ranges (btree) index type, you can set a range on each field within those primary and secondary keys. See Setting Query Range Operations for details.

All Rows

All data in the table is selected, ignoring primary and secondary keys.

Expression

Enter a lookup expression to select data. For example, stock==input.symbol returns all rows that match input.symbol.

Limit number of output rows

For read and delete operations only, if you chose a Where setting that returns more than one row, you can specify a maximum number of rows that can be returned by your query. To do so, enable the Limit number of output rows option and enter a value. You can use this option to retrieve a list of the top n rows, or to maintain a query table by deleting n top or bottom values. With this limit set, the query ends after the specified number of rows have been returned. If fewer than the specified number of rows can be returned, the query ends without returning values for the missing rows.

The Limit number of output rows option is not available for write operations, or operations where the query returns only one row (for example, using a primary index with only one field).

The contents and use of the other tabs varies depending on your selection (Read, Write, or Delete) in in the Operation field.

Query Settings with an Associated Materialized Window

When the operator is connected to a Materialized Window, the Operation is set to Read. Use Where to specify whether to perform the lookup using All rows or using a Secondary index. The second option is available only if you defined on or more secondary indices in the associated Materialized Window. If you select a secondary index, the fields of that index are displayed at the bottom of the tab. If you select All Rows, a full-table lookup will occur, ignoring any secondary keys.

If you select a secondary index that was set (in the Materialized Window's Properties view) to use the btree index type, you can set a range on each field within the secondary keys. See Setting Query Range Operations for details.

Finally, you can set the Limit number of output rows option. Note that this option has no effect if the query returns only one row.

Query Settings with an Associated Chronicle Connection or JDBC Table

This section describes the query settings available when the Query operator is connected to a Chronicle Connection or JDBC Table.

Associated Connection

A read-only field that identifies the data source connected to the associated data construct.

Options

By default, if a SELECT statement in the SQL statement returns no values, no tuple is output. To change this behavior, enable the option to Output null tuple on empty Result set. That is, the output tuple will contain the reserved value null.

Note

This option is currently only available for EventFlow applications. There is no analogous option in StreamSQL to change the default behavior.

SQL statement.

Query operations on the associated table are performed using the external database's native SQL variant, not StreamBase functions. Use this field to specify the operations you need to perform, including selecting, inserting, deleting, updating values, and calling stored procedures. Enter a single SQL statement in the SQL statement field.

Notes

  • By default, the Query operator includes the result of your SQL statement in its output. To omit some or all JDBC table query results, use the Result Settings tab.

  • The syntax of your SQL query in the SQL Statement field depends on the type of external data source you use, and is not documented in StreamBase Help. For example, if your external data source is an Oracle server, use Oracle PL/SQL syntax.

  • If your external data source supports stored procedures, you can create, modify, and call stored procedures in the SQL Statement field. Again, use the syntax of your external data source when dealing with stored procedures.

  • Use braces to delimit StreamBase expressions in a SQL statement. For example, the following statement inserts three fields from the Query operator's schema into the associated table:

    INSERT INTO Bikes values ({id}, {make},{model})

  • StreamBase Studio provides some assistance as you work. Any time StreamBase Studio performs a typecheck, the contents of your SQL Statement field is sent to the external data source. If your code contains SQL syntax errors, the data source server may return an error to StreamBase. If automatic typechecking is enabled, such errors are passed through to the Typecheck Errors view. On the EventFlow canvas, the background color of the Query operator also indicates a typecheck error.

Setting Query Range Operations

This section explains how to set up query ranges for a Query operator.

When you choose a primary or secondary index in the Query Settings tab of a Query operator's Properties View, the index fields are displayed in the tab. If the index has been set (in the Properties view of the associated Query Table) to use the btree index type, the display is titled Range Specification, and you can set a range on the displayed fields. The range specification is evaluated using the input tuple against the data stored in the query table. Note that the result of the query is the and-ing of all the field range specifications. There is no support for or-ing fields.

Set a range for each field. You must enter an operation and an expression for each field in the index, as follows:

  1. Choose a lower bound operation:

    = (equality)
    > (greater than)
    (greater than or equal)
    begin (unbounded lower range)
  2. Enter a lower bound expression. Range expressions can be constants, input tuple fields, or dynamic variables.

    • If your lower bound operation is equality (=), an upper bound is not needed, and its column is not editable.

    • If your lower bound operation is begin or your upper bound operation is end, the associated expression column is not editable.

  3. Choose an upper bound operation:

    < (less than)
    (less than or equal)
    end (unbounded upper range)
  4. Enter an upper bound expression.

The following figure illustrates the types of query ranges you can set in StreamBase Studio. Each index field is displayed in a row in the Range Specification grid, and uses a different combination of operations and expressions.

Operation Settings Tab

The Operation Settings tab is displayed only when the operator is connected to a Query Table or Materialized Window; it is not shown when connected to a Chronicle Connection or JDBC Table. The tab specifies operations to perform during the query. The operations available depend on the type of query you selected in the Query Settings tab:

Read and Delete Operations

For read and delete operations, use this tab to specify what the operator should output if no rows are returned by the query. Choose one of the following options:

output nulls

A tuple is output, but all fields in the tuple are null. This is set by default.

output nothing

No output tuple is emitted when the query returns no rows.

output the following

Allows you to specify what should be output. For each table field, enter an expression in the Expression column, or accept the default null() value that may have been autofilled.

Notes

This setting does not affect the output when you limit the number of output rows (in the Query Settings tab) and the query returns fewer than the specified number of rows. In that case, no values are returned for the missing rows.

If you specify an aggregate expression in the Output Settings tab, the expression is applied to the "no match" value before the output is generated.

Write Operations

For write operations, use the Operation Settings tab to define what is written to the associate table,

  1. Choose the Type of write: either Insert or Update.

    The Values to Insert or Values to Update grid is displayed.

    The grid is preloaded with all the rows in the associated table (except the field selected as the lookup key in the Query Settings tab) similar to this:

    Field Name Type Size Expression
    Name string 30 Name
    Price double 8 Price

    Notice that the default expressions are set to use the value of each input stream field. If this is what you intend, you do not need to modify the grid.

  2. You can modify the expressions in one or both of these ways:

    • Manually edit Expressions to change or manipulate input field values. For example:

      Field Name Type Size Expression
      Name string 30 Name+"_contrib"
      Price double 8 Price*2
    • Click the Smart Fill button to add a prefix to all the default expressions. The prefix denotes the operator's input stream, and can avoid ambiguity when input stream and associated table fields have the same names[1]. For example, the preceding grid would be changed to something like the following. The references to the input.Name and input.Price fields cannot be confused with the Name and Price fields in the table:

      Field Name Type Size Expression
      Name string 30 input.Name
      Price double 8 input.Price
  3. Optionally, change the default behavior that occurs if the write operation fails: Normally, a row that you want to insert should not already exist in the table. Conversely, a row that you want to update should exist. By default, no value is returned in these cases.

    The label for this option depends on the type of write operation:

    • If insert fails because an existing row was found

    • If update fails because no row was found

    The following options are available:

    Ignore

    No value is returned for the row. This is the default behavior.

    Insert (or Update) new row using values above

    The Values to Insert table is used.

    Insert (or Update) new row using new values

    Displays the Values to Insert (or Update) grid. Specify expressions (manually or using the Smart Fill button) to update each field with.

    Output these values instead

    Displays the Values to Output table. Instead of modifying values from the associated Query Table, this table allows you to specify values directly. For example, you might want to output a message string, a specific numeric value, or a null value.

Output Settings Tab

Use the Output Settings tab to select and name the fields to be output from the Query operator. Output fields can be based on data from both the input schema and the associated Query Table. (This tab is not available when the operator is connected to a JDBC Table or Chronicle Connection.)

  1. If you are performing a read operation (set in the Query Settings tab) with a primary or secondary key that uses btree indexing, you can use the Order control to set the order of the read operations.

    If no btree index is used, this control is labeled, "Not applicable": go on to the next step.

    To specify the read order:

    • If you do not want the read operation to sort tuples, click the None control. This causes their output order to be undefined.

    • To set an order, click the By control. Then:

      1. Choose the index whose order you want to set. If more than one index is available, use the drop-down control to choose one. If only one index is available, no choice is required.

      2. Choose either ascending (the default) or descending order in the drop-down list.

  2. Specify output fields using one of these Output options:

    all input and table fields

    Automatically passes all fields from the input stream and the associated Query Table to the output stream. See Using the All input and table fields Option for a description of the tab in this mode.

    explicitly specified fields

    Requires you to specify output fields manually (see Using the Explicitly selected fields Option for a description of the tab in this mode):

    1. Select each field that you want to include in the output; deselect fields that you want to omit. You can also use the Check/Uncheck All button to change your selections.

    2. Optionally, modify the output field names.

    expressions

    Available only for read queries. Does not specify input fields directly. Instead, specifies only expressions for output fields. An expression can consist of field names and simple or aggregate functions. See Using the Expressions Option for a description of the tab in this mode.

    The subsections that follow document each of the Output options in detail.

Using the All input and table fields Option

When you choose the all input and table fields option, the tab displays editable fields indicating the prefixes that will be added to the input and table fields to derive the output field names. The prefixes are designed to avoid duplicate names among fields from different sources. You can change the prefixes by editing the fields in this tab.

The following example shows the prefix fields available for an update:

The prefixes are all editable. The options and default prefix values vary slightly for each of the query types:

Include input, prefixing names with

Check this field to pass all fields from the input stream through the query operator. The prefix in the box is automatically added to these field names in the output stream.

Output table fields

This group of boxes controls the prefixes that are added, in output tuples, to the names of all fields in the result that are from the associated table:

Include {new | old | current} fields from table, prefixing names with

Check this option to mark fields in the table that your operation selects. The default prefix (and the text in the label) varies with the type of query.

Include old fields from table, prefixing names with

For update operations only, check this option to mark fields in the table that are not selected. This enables you to distinguish those fields from updated fields (if you used the preceding prefix). This option is not available for other query types.

Additional prefix for all table fields

The prefix in this box is added to all table fields, in addition to any other prefixes you chose in the preceding options.

Using the Explicitly selected fields Option

When you choose the explicitly selected fields option, the view lists each available field from the input stream and Query Table, and supplies an editable output field name for each one.

Quickly change prefixes for all input field names below

The prefix in this field is automatically added (in the output) to the names of all field names from the input stream. You can modify the default input_ prefix.

Fields Available from Input Stream

Use this grid to determine which fields from the input stream to pass through the Query operator on the output stream. Each row in the Field column is preloaded with a field from the input stream, while the Output Field Name column shows the name of the corresponding field in the output. Check or uncheck the box in each row to include or omit the field in the output, or the click Check/Uncheck All button to toggle your selection for all rows.

In the Field column, each field name from the table is qualified by default with the conventional input. prefix that StreamBase components use internally to denote input streams. For example, a field named Symbol in the input stream schema will be identified as input.Symbol here. However, values in this column are editable, so you can change the name if you need to.

Quickly change prefixes for all table field names

This box provides a prefix that will be automatically added to the names of every table field that is included in the output. The prefix is editable here.

Change all { Current | New | Old } field names

This box provides an additional prefix that will be added to all table fields that are included in the output. The label and default prefix that appears in the tab varies according to the type of query operation you are performing — one of the following:

Query Operation Type Prefix Label Default value
Read Current Table Prefix current_
Delete New Table Prefix new_
Write Old Table Prefix old_
Fields Available from Data Construct

Use this grid to control which fields from the associated Query table are passed through on the output stream. As in the other grid on this tab, this grid can help you avoid name collisions between field names.

  • In the Field column, each field name from the table is qualified with a prefix that indicates the query operation type: The following prefix conventions are used:

    Query Operation Type Field Column Format
    Read current.field
    Delete old.field
    Write new.field
  • The Output Field Name column shows the names of corresponding fields in the output. Here, the names are qualified with the combination of prefixes specified in the two Prefix fields above the grid. Some sample values are shown below for different query types (assumng the default Prefix fields):

    Query Operation Type Input Field Expression Output Field Name
    Read current.Id table_current.Id
    Delete old.Symbol table_old.Symbol
    Write new.Price table_new.Price

Using the Expressions Option

Choose the expressions option to define output fields using expressions. The view initially displays an empty table. Use the Add or Pass All button to add a row for each field. In each row enter:

Output Field Name

The name you want to appear in the output stream of the operator for this field.

Expression

An expression that resolves to the value of the output field. Expressions can include fields from the input stream or the associated table, constants, and simple functions. In some cases they can also contain aggregate functions.

Aggregate functions are used on sets of data to return a single result, and the set of data returned from the query is joined with the fields from the input stream. For aggregate expressions, this means that the size of the result that is passed to the aggregate function is always 1 or greater.

If there is no match from the table and the aggregate result set size is exactly 1, the input stream fields are joined with the "no match" values specified in the Operation Settings Tab tab.

The example below includes several aggregate expressions. Let us assume that this operator is also grouped by volume. This means that the first expression calculates an average of prices in each group; the second sums the number of tuples in each group; and the third calculates a cost in each group.

Note

We can use volume in the aggregate expression because the output is grouped by volume, which will therefore be constant for all values within a group.

For details about aggregate expressions and available functions, see the StreamBase Expression Language and Functions topic.

Pass Input Tab

When the Query operator is connected to a Chronicle Connection or JDBC Table, you can configure this tab to include fields from the input schema in the output. The fields will be added after the Chronicle or JDBC data in the output tuples. This tab is not available when the operator is connected to a Query Table.

Specify fields from the input stream using one of the two Output options:

all input fields

Automatically pass all fields from the input stream to the output stream. An additional option, Prefix for input values, adds a prefix to every input field name in the generated output fields. This can be useful for avoiding name collisions between fields from the input stream and the Chronicle or JDBC data.

explicitly specified input fields
  1. Click the Add button for each output field that you want (click the Remove button to delete one).

  2. In each row, specify a value in the Output Field Name column.

  3. Specify an expression for each field. The expression can be as simple as the name of an input field, or a more complex, computed expression involving StreamBase functions or custom functions.

To output no fields from the input stream, choose explicitly specified fields, but do not add any table rows.

Result Settings Tab

This tab is available only when the Query operator is connected to a JDBC Table. Use it to control which fields in the result of your SQL query (in the Query Settings tab) is included in the output. Choose one of these options:

Result Set From Query

Includes all fields in the result set of your SQL query. This is the default behavior.

Ignore Any Results From Query

No fields from the result set of your SQL query are included in the output.

Explicitly Specify Result Set Schema

Includes only the result set fields you specify. When you select this option, the Fields table is displayed. Edit the table to specify the result set fields.

For example, you might click the Copy Schema button, load all fields from your JDBC Table schema, and then delete fields you want to omit from the table.

Group Options Tab

This tab is only displayed when the Query operator is connected to a Query Table or Materialized Window; it is absent for a JDBC Table or Chronicle Connection. You can only use it for read operations.

Use this tab to optionally create one or more groups for a read query: each group is keyed to a specific field. For example, if you create a single group on a field named id, a separate window will be created for each unique value of id).

Click the Add button to add a row end enter the fields you want to group by. In each row:

  • In the Output Field Name column, enter the name you want to appear in the output stream of the operator for this field.

  • In the Expression column, enter the input field that you want to group on.

In the following example, the output will be subdivided into groups by unique combinations of volume and stock.

Notes

  • If you set a group, you must also select the expression option in the Query Output tab.

  • A group that you define here can also be referenced in an expression in the Output Settings Tab tab.

  • The Group Options Tab cannot be used when you select write or delete query operations in the Query Settings tab.

Dynamic Variables Tab

The Dynamic Variables tab allows you to define variables for this operator that can then be used in one of its expressions. A dynamic variable can be updated by any input stream or output stream in your application. For more information, see Using Dynamic Variables.

Concurrency Tab for Chronicle Connection and JDBC Table Query

In the Query operator, the Concurrency tab appears only when the operator is connected to a JDBC Table or Chronicle Connection data construct. The tab is absent with other connections.

Run this component in a separate thread

This option causes the server to process the component's requests concurrently with other processing in the application. You can distribute the processing of the threads automatically across multiple processors on an SMP machine.

If this is a compute-intensive component and you know that it can run without data dependencies on other components in the StreamBase application, you may be able to improve performance by enabling this option.

Caution

These features are not suitable for every application. For details, see Execution Order, Concurrency, and Parallelism. It includes important guidelines for the use of these features.

Run in parallel threads

If you checked the first option, you can also choose this option, which causes the server to run multiple instances of this component. That is, each instance runs in its own thread. At run time, tuples are dispatched to particular instances based on the Key Expression value (which must evaluate to an int).

If you have multiple Query operators connected to the same JDBC Table, parallel operations may also enable you to avoid database deadlocks. Please read the details in the Using JDBC Data Sources in the Authoring Guide.

Null Values

If the key field for a tuple record being written to a Query Table is null, the tuple is stored. In a Query Table with a sorted (btree) index, the null-keyed stored records are evaluated as less (in value) than other non-null records. On a subsequent read (lookup) operation, the null-keyed tuples can be located. For more information, see the topic, Using Nulls in the StreamBase Help.



[1] For details about Smart Fill, see the Button Reference section in Studio Tips and Tricks.