This topic describes the actions you can take on each tab of the Union operator's Properties View.
A Union operator accepts two or more input streams, and produces one output stream in the order the tuples arrive. The Union operator is not order sensitive (that is, tuples do not have to be in any order). If you want to order on a field by using a criterion such as time, consider using a Merge operator.
Let's say that you have installed StreamBase in a store with 20 checkout lines. After each sale each cash register outputs onto a stream a tuple that contains the amount, the number of items and the cashier's name. Your StreamBase application must take all of the data from all of the streams and sum up all of the sales so that the store can track the total amount of sales every hour of each day. The Union operator is used to join all of the cash register streams into one stream so that a single Aggregate operator can be used to sum all of the data from each of the cash registers.
In an EventFlow, you can choose between two levels of schema compatibility. We can describe the resulting types of unions as follows:
- Loose union
-
By default, schemas can have different fields, as long as all fields with the same name are of the same type. When an input tuple on one stream is missing fields that exist in another stream, the union fills them in with null fields.
For example, it is possible to perform a loose union of two input streams whose schemas have these fields:
Stream 1 Schema Stream 2 Schema string Symbol string Symbol double Price
The following two events show the output when data arrives on each port:
- Strict union
-
In a strict union, schemas must have the same fields, strings must have the same lengths and the fields must be in the same order. Typechecking will fail if the input streams do not have equivalent schemas. Strict unions are possible only in EventFlows; In the StreamSQL language, all unions are loose.
Tip
One way to make schemas equivalent is by using a Map operator, before the Union operator, to reorder fields or to change the length of strings.
Because the Union operator is order-agnostic, the output directly reflects the sequence of arriving tuples on the input streams. For example, consider the union of two streams with the same schema consisting of a single int field. Notice that there is no particular order to the value of tuples, nor to which input streams receive data:
|
|
|
The remainder of this topic describes the actions you can take on each tab of the Union operator's Properties view.
Name: Every application component must have a unique name. Use this field to specify or change the component's name. The name must contain only alphabetic characters, numbers, and underscores, and no hyphens or other special characters. The first character must be alphabetic or an underscore.
Enable Error Output Port: Check this box to add an Error Port to this component. In the EventFlow canvas, the Error Port shows as a red output port, always the last port for the component. See Using Error Ports and Error Streams to learn about Error Ports.
Description: Optionally, enter a description to briefly describe the component's purpose and function. In the EventFlow canvas, you can see the description by pressing Ctrl while the component's tooltip is displayed.
In the Union Settings tab, you must specify the number of streams that will participate in the union operation.
Optionally, choose the option, Force strictly matching schemas. By default, unions are loose: input fields can be the union of all fields in the input streams: if fields are different, nulls are filled in when an input tuple does not have that field. It is still a typecheck error to import more than one stream with the same field name, but different types for that name. Note that strictly matching schemas are not field-order sensitive.
The Dynamic Variables tab allows you to define variables for this operator that can then be used in one of its expressions. A dynamic variable can be updated by any input stream or output stream in your application. For more information, see Using Dynamic Variables.
- Run this component in a separate thread
-
This option causes the server to process the component's requests concurrently with other processing in the application. You can distribute the processing of the threads automatically across multiple processors on an SMP machine.
If this is a compute-intensive component and you know that it can run without data dependencies on other components in the StreamBase application, you may be able to improve performance by enabling this option.
Caution
These features are not suitable for every application. For details, see Execution Order, Concurrency, and Parallelism. It includes important guidelines for the use of these features.
- Run in parallel threads
-
If you checked the first option, you can also choose this option, which causes the server to run multiple instances of this component. That is, each instance runs in its own thread. At run time, tuples are dispatched to particular instances based on the Key Expression value (which must evaluate to an int).
