Contents
This sample demonstrates one use of the Aggregate operator. The time-based aggregate uses elapsed time to manage windows. This example uses two-second windows to compute the average price per share of symbols.
Consider the following problem: You are interested in the average price per share of a stock over some number of trades. You also want to know if the stock is active or not. If you get fewer than the requisite number of trades during some time period, then you conclude that the stock is relatively inactive. If you see more than that number of trades in the time period, the stock is very active.
This problem can be solved using an Aggregate with two window dimensions, one for the number of trades (as tuples), and another for time period. The time period is computed as a field-based aggregate using a timestamp field. In the following example, the first tuple emitted from Aggregate2Dimensions shows the average of five tuples. The second emitted tuple shows the average of only two tuples because only those two tuples fall within the time window as defined by the second dimension. The third tuple is emitted because five tuples had been received by Aggregate2Dimensions since the last five tuple group. However, because the first two of those input tuples had been calculated into the second emitted tuple, there are only three tuples available to be used to calculate the average. The numberShares, firstSeqnum, and lastSeqnum fields reflect this fact.
In StreamBase Studio, import this sample with the following steps:
-
From the top menu, click → .
-
Select operator from the Applications list.
-
Click OK.
StreamBase Studio creates a single project for all the operator samples.
By default, the sample files are installed in:
-
On Windows:
C:\Program Files\StreamBase Systems\StreamBase.n.m\sample\operator -
On UNIX:
/opt/streambase/sample/operator
When you load the sample into StreamBase Studio, Studio copies the
sample project's files to your Studio workspace. StreamBase Systems
recommends that you use the workspace copy of the sample, especially on UNIX, where
you may not have write access to /opt/streambase. In
the default installation, the path to this sample in your Studio workspace is:
UNIX: ~/streambase-studio-n.m-workspace/sample_operator Windows XP: C:\Documents and Settings\username\My Documents\StreamBase Studion.mWorkspace\ sample_operator Windows Vista: C:\Users\username\Documents\StreamBase Studion.mWorkspace\ sample_operator
-
In the Package Explorer, double-click to open the
AggregateByDim.sbappapplication. Make sure the application is the currently active tab in the EventFlow Editor. -
Click the
Run button. This opens the SB
Test/Debug perspective and starts the application.
-
In the Application Output view, select the AvgPricePSOut output stream. No output is displayed at this point, but the dequeuer is prepared to receive output. This view will eventually show the output of the application: the first tuple received will open a window that will close after receiving either five tuples or a tuple with time greater than or equal to
60. -
In the Manual Input view, enter
1,AMAT,20, and1in the seqnum, symbol, price, and time fields, respectively. -
Click , and observe that no output is displayed yet in the Application Output view.
-
Enter
2,AMAT,21, and11in the seqnum, symbol, price, and time fields, respectively. -
Click , and observe that no output is displayed yet in the Application Output view.
-
Enter
3,AMAT,22, and21in the seqnum, symbol, price, and time fields, respectively. -
Click , and observe that no output is displayed yet in the Application Output view.
-
Enter
4,AMAT,23, and31in the seqnum, symbol, price, and time fields, respectively. -
Click , and observe that no output is displayed yet in the Application Output view.
-
Enter
5,AMAT,24, and41in the seqnum, symbol, price, and time fields, respectively. -
Click , and observe this line in the Application Output view:
symbol=AMAT, numberShares=5, averagePricePerShare=22.0,
lowerBoundTimeWindow=0.0, upperBoundTimeWindow=60.0, firstSeqnum=1, lastSeqnum=5This input causes the Aggregate operator to close the first window, which triggers the release of the output tuple.
Tip
If output data is too long to easily see in the Application Output table, click a row to display its field data in the Display Fields pane below the table.
-
Enter
6,AMAT,25, and61in the seqnum, symbol, price, and time fields, respectively.This input causes a new window to open. Like the first window, it will close after receiving either five tuples or a tuple with time greater than or equal to 60.
-
Click , and observe that no output is displayed yet in the Application Output view.
-
Enter
7,AMAT,26, and119in the seqnum, symbol, price, and time fields, respectively. -
Click , and observe that no output is displayed yet in the Application Output view.
-
Enter
8,AMAT,27, and121in the seqnum, symbol, price, and time fields, respectively. -
Click , and observe this line in the Application Output view:
symbol=AMAT, numberShares=3, averagePricePerShare=25.5,
lowerBoundTimeWindow=60.0, upperBoundTimeWindow=120.0, firstSeqnum=6, lastSeqnum=7 -
Enter
9,AMAT,26, and150in the seqnum, symbol, price, and time fields, respectively. -
Click , and observe that no output is displayed yet in the Application Output view.
-
Enter
10,AMAT,26, and151in the seqnum, symbol, price, and time fields, respectively. -
Click , and observe this line in the Application Output view:
symbol=AMAT, numberShares=3, averagePricePerShare=28,
lowerBoundTimeWindow=120.0, upperBoundTimeWindow=180.0, firstSeqnum=8, lastSeqnum=10 -
When done, press F9 or click the
Stop Running Application button.
This section describes how to run the sample in UNIX terminal windows or Windows command prompt windows. On Windows, be sure to use the StreamBase Command Prompt from the Start menu as described in the Test/Debug Guide, not the default command prompt.
-
Open three terminal windows on UNIX, or three StreamBase Command Prompts on Windows. In each window, navigate to the directory where the sample is installed, or to your workspace copy of the sample, as described above.
-
In window 1, type:
sbd AggregateByDim.sbappThe window shows
notice[StreamBaseServer] listening on port 10000. -
In window 2, type:
sbc dequeue AvgPricePSOutNo output is displayed at this point, but the dequeuer is prepared to receive output. This window will eventually show the output of the application.
-
In window 3, type:
sbc enqueue TradesInThe sbc command is now awaiting keyboard input. Then type:
1,AMAT,20,1No output is displayed yet in the dequeue window.
-
Type:
2,AMAT,21,11No output is displayed yet in the dequeue window.
-
Type:
3,AMAT,22,21No output is displayed yet in the dequeue window.
-
Type:
4,AMAT,23,31No output is displayed yet in the dequeue window.
-
Type:
5,AMAT,24,41Observe this line in the dequeue window:
AMAT,5,22,0,60,1,5 -
Type:
6,AMAT,25,61No output is displayed yet in the dequeue window.
-
Type:
7,AMAT,26,119No output is displayed yet in the dequeue window.
-
Type:
8,AMAT,27,121Observe this line in the dequeue window:
AMAT,2,25.5,60,120,6,7 -
Type:
9,AMAT,28,150No output is displayed yet in the dequeue window.
-
Type:
10,AMAT,29,151Observe this line in the dequeue window:
AMAT,3,28,120,180,8,10 -
Press Control-Z (Windows) or Control-D (UNIX).
The sbc process will exit.
-
In window 3, type:
sbadmin shutdownThe sbadmin shutdown command terminates the server and dequeuer.
-
Launched StreamBase Studio.
-
Created (or subsequently used) the
sample_operatorproject. -
From the top menu, in the SB Authoring perspective, selected → → . Selected the
sample_operatorproject and enteredAggregateByDimfor the diagram name. -
Created an input stream:
-
Dragged an input stream from the palette to the EventFlow Editor.
-
Clicked the stream on the EventFlow Editor, which invoked the Input Stream Properties view.
-
On the General tab, Name:
TradesIn -
On the Edit Schema tab, added
-
Field Name:
seqnum, Type:int -
Field Name:
symbol, Type:string, Size:12 -
Field Name:
price, Type:double -
Field Name:
time, Type:timestamp
-
-
-
Created an Aggregate operator:
-
Dragged an Aggregate operator from the palette to the EventFlow Editor.
-
On the General tab, Name:
Aggregate2Dimensions -
Connected the TradesIn input stream to the Aggregate2Dimensions operator.
-
-
Set up the Aggregate2Dimensions operator:
-
On the Dimension tab, clicked the button. In the Edit Dimension dialog, added:
Name:
CountDimType:
tupleThe buffer set up for each window will contain the specified number of tuples. When the buffer contains all the tuples required for the window, any desired calculations will take place, a tuple containing the desired results will be emitted, and the window will be closed.
Opening policy: Open per: Advance:
5This indicates that a window should be open for a group of tuples. An Advance value of 5 advances the window by five tuples. A new window will be created after each group of 5 tuples enters the operator. Because the window is closed every 5 tuples (see next step), windows do not overlap.
Window size: Close and emit every
5tuplesThe number of tuples in the buffer for this window.
Emission policy: Selected "No intermediate emissions based on this dimension."
"Emission policy" allows tuples to be emitted before the window closes. For example, one could emit a tuple every second during the 30-second window, rather than waiting for the window to close.
Optional windows Unchecked the Create partial windows checkbox.
When set, this option creates "partial" windows which encompass the values that would have occurred before the arrival of the first tuple. For example, where
Advanceis less thanSize, additional windows would be opened to include the first tuple; these windows would start before theTimein the first tuple.At this point, the Edit Dimensions dialog for CountDim looks like this:
-
Clicked .
-
Clicked the button again. In the Edit Dimension dialog, added:
Name:
TimeDimType:
field. In the pull-down list, selected .The buffer set up for each window will contain a set of tuples based on the value of the time field. When a tuple arrives whose field value exceeds the range of the open window, the calculations specified in the operator will take place, a tuple containing the desired results will be emitted, and the window will be closed.
Opening policy: Open per: Advance:
60and Offset:0This indicates that a window should be open for a group of tuples. An Advance value of
60will advance the window by 60 seconds. In this case, where the "Close and emit" value is also 60 seconds, only one window will be open at a time for each group (see "Group by" below). If Advance were set to15, then windows would be created every 15 seconds and stay open for 60 seconds, overlapping each other.Window size: Close and emit after
60.This indicates each window is open for 60 seconds (as reflected in the time field for each tuple), and its buffer contains the tuples whose "time" field falls within that 60 second period.
Emission policy: Selected No intermediate emissions based on this dimension.
Emission policyallows tuples to be emitted before the window closes. For example, one could emit a tuple every second during the 60-second window, rather than waiting for the window to close.Optional windows Unchecked Open windows before first tuple.
When set, this option creates windows which encompass the values that would have occurred before the arrival of the first tuple. For example, where
Advanceis less thanSize, additional windows would be opened to include the first tuple; these windows would start before theTimein the first tuple.At this point, the Edit Dimensions dialog for TimeDim looks like this:
Clicked .
-
On the Aggregate Functions tab, unchecked the delta option, Output all input fields. Then added:
Output Field Name:
numberSharesExpression:
count()Returns the number of tuples (i.e. trades) that are represented in the window. For details about the available aggregate functions, see the StreamBase Expression Language and Functions topic in the Authoring Guide.
Output Field Name:
averagePricePerShareExpression:
avg(price)Returns the average price per share of the trades represented in this window.
Output Field Name:
lowerBoundTimeWindowExpression:
openval("TimeDim")Returns the lower boundary of the window, as determined by the TimeDim dimension. In this case, because the window's offset is 0, each window starts at a multiple of 60 seconds. Note that the window boundary might be different from the actual value in the window when it opens (for example, the value returned by the startval function).
Output Field Name:
upperBoundTimeWindowExpression:
closeval("TimeDim")Returns the upper limit of the window. In this case, each window ends at a multiple of 60 seconds. Note that the window boundary might be different from the actual value in the window when it closes (for example, the value returned by the startval function).
Output Field Name:
firstSeqnumExpression:
firstval(seqnum)Returns the sequence number of the first tuple that formed part of this window.
Output Field Name:
lastSeqnumExpression:
lastval(seqnum)Returns the sequence number of the last tuple that formed part of this window.
-
On the Group Options tab, added:
Output Field Name:
symbolExpression:
symbolCreates a window for each set of tuples whose value for the symbol field is the same.
Note that the "Output Field Name" need not be the same as the input field in the "Expression" column. This is most useful when the "Expression" is more complicated than just a field value.
That completed the definition of this two-dimension aggregate operator.
-
-
Created a map operator:
-
Dragged a Map operator from the palette to the EventFlow Editor.
-
On the General tab, Name:
ConvertTimeToSeconds -
Connected the Aggregate2Dimensions operator to the ConvertTimeToSeconds Map operator.
-
On the Output Settings tab, chose the Output option, explicitly specified fields, then clicked the button.
-
Changed the expression for lowerBoundTimeWindow to
to_seconds(lowerBoundTimeWindow). -
Changed the Expression for upperBoundTimeWindow to
to_seconds(upperBoundTimeWindow).
-
-
Created an output stream:
-
Dragged an output stream from the palette to the EventFlow Editor.
-
On the General tab, Name:
AvgPricePSOut -
Connected the ConvertTimeToSeconds operator to the AvgPricePSOut output stream.
-
