Aggregate Operator Two-Dimension Sample

This sample demonstrates one use of the Aggregate operator. The time-based aggregate uses elapsed time to manage windows. This example uses two-second windows to compute the average price per share of symbols.

Consider the following problem: You are interested in the average price per share of a stock over some number of trades. You also want to know if the stock is active or not. If you get fewer than the requisite number of trades during some time period, then you conclude that the stock is relatively inactive. If you see more than that number of trades in the time period, the stock is very active.

This problem can be solved using an Aggregate with two window dimensions, one for the number of trades (as tuples), and another for time period. The time period is computed as a field-based aggregate using a timestamp field. In the following example, the first tuple emitted from Aggregate2Dimensions shows the average of five tuples. The second emitted tuple shows the average of only two tuples because only those two tuples fall within the time window as defined by the second dimension. The third tuple is emitted because five tuples had been received by Aggregate2Dimensions since the last five tuple group. However, because the first two of those input tuples had been calculated into the second emitted tuple, there are only three tuples available to be used to calculate the average. The numberShares, firstSeqnum, and lastSeqnum fields reflect this fact.

Importing This Sample into StreamBase Studio

In StreamBase Studio, import this sample with the following steps:

  • From the top menu, click FileLoad StreamBase Sample.

  • Select operator from the Applications list.

  • Click OK.

StreamBase Studio creates a single project for all the operator samples.

Sample Location

By default, the sample files are installed in:

  • On Windows: C:\Program Files\StreamBase Systems\StreamBase.n.m\sample\operator

  • On UNIX: /opt/streambase/sample/operator

When you load the sample into StreamBase Studio, Studio copies the sample project's files to your Studio workspace. StreamBase Systems recommends that you use the workspace copy of the sample, especially on UNIX, where you may not have write access to /opt/streambase. In the default installation, the path to this sample in your Studio workspace is:

UNIX:       
  ~/streambase-studio-n.m-workspace/sample_operator
Windows XP:
  C:\Documents and Settings\username\My Documents\StreamBase Studio n.m Workspace\
      sample_operator
Windows Vista:
  C:\Users\username\Documents\StreamBase Studio n.m Workspace\
      sample_operator

Running AggregateByDim.sbapp in StreamBase Studio

  1. In the Package Explorer, double-click to open the AggregateByDim.sbapp application. Make sure the application is the currently active tab in the EventFlow Editor.

  2. Click the Run button. This opens the SB Test/Debug perspective and starts the application.

  3. In the Application Output view, select the AvgPricePSOut output stream. No output is displayed at this point, but the dequeuer is prepared to receive output. This view will eventually show the output of the application: the first tuple received will open a window that will close after receiving either five tuples or a tuple with time greater than or equal to 60.

  4. In the Manual Input view, enter 1, AMAT, 20, and 1 in the seqnum, symbol, price, and time fields, respectively.

  5. Click Send Data, and observe that no output is displayed yet in the Application Output view.

  6. Enter 2, AMAT, 21, and 11 in the seqnum, symbol, price, and time fields, respectively.

  7. Click Send Data, and observe that no output is displayed yet in the Application Output view.

  8. Enter 3, AMAT, 22, and 21 in the seqnum, symbol, price, and time fields, respectively.

  9. Click Send Data, and observe that no output is displayed yet in the Application Output view.

  10. Enter 4, AMAT, 23, and 31 in the seqnum, symbol, price, and time fields, respectively.

  11. Click Send Data, and observe that no output is displayed yet in the Application Output view.

  12. Enter 5, AMAT, 24, and 41 in the seqnum, symbol, price, and time fields, respectively.

  13. Click Send Data, and observe this line in the Application Output view:

    symbol=AMAT, numberShares=5, averagePricePerShare=22.0,
    lowerBoundTimeWindow=0.0, upperBoundTimeWindow=60.0, firstSeqnum=1, lastSeqnum=5

    This input causes the Aggregate operator to close the first window, which triggers the release of the output tuple.

    Tip

    If output data is too long to easily see in the Application Output table, click a row to display its field data in the Display Fields pane below the table.

  14. Enter 6, AMAT, 25, and 61 in the seqnum, symbol, price, and time fields, respectively.

    This input causes a new window to open. Like the first window, it will close after receiving either five tuples or a tuple with time greater than or equal to 60.

  15. Click Send Data, and observe that no output is displayed yet in the Application Output view.

  16. Enter 7, AMAT, 26, and 119 in the seqnum, symbol, price, and time fields, respectively.

  17. Click Send Data, and observe that no output is displayed yet in the Application Output view.

  18. Enter 8, AMAT, 27, and 121 in the seqnum, symbol, price, and time fields, respectively.

  19. Click Send Data, and observe this line in the Application Output view:

    symbol=AMAT, numberShares=3, averagePricePerShare=25.5,
    lowerBoundTimeWindow=60.0, upperBoundTimeWindow=120.0, firstSeqnum=6, lastSeqnum=7

  20. Enter 9, AMAT, 26, and 150 in the seqnum, symbol, price, and time fields, respectively.

  21. Click Send Data, and observe that no output is displayed yet in the Application Output view.

  22. Enter 10, AMAT, 26, and 151 in the seqnum, symbol, price, and time fields, respectively.

  23. Click Send Data, and observe this line in the Application Output view:

    symbol=AMAT, numberShares=3, averagePricePerShare=28,
    lowerBoundTimeWindow=120.0, upperBoundTimeWindow=180.0, firstSeqnum=8, lastSeqnum=10

  24. When done, press F9 or click the Stop Running Application button.

Running AggregateByDim.sbapp in Terminal Windows

This section describes how to run the sample in UNIX terminal windows or Windows command prompt windows. On Windows, be sure to use the StreamBase Command Prompt from the Start menu as described in the Test/Debug Guide, not the default command prompt.

  1. Open three terminal windows on UNIX, or three StreamBase Command Prompts on Windows. In each window, navigate to the directory where the sample is installed, or to your workspace copy of the sample, as described above.

  2. In window 1, type:

    sbd AggregateByDim.sbapp

    The window shows notice[StreamBaseServer] listening on port 10000.

  3. In window 2, type:

    sbc dequeue AvgPricePSOut

    No output is displayed at this point, but the dequeuer is prepared to receive output. This window will eventually show the output of the application.

  4. In window 3, type:

    sbc enqueue TradesIn

    The sbc command is now awaiting keyboard input. Then type:

    1,AMAT,20,1

    No output is displayed yet in the dequeue window.

  5. Type:

    2,AMAT,21,11

    No output is displayed yet in the dequeue window.

  6. Type:

    3,AMAT,22,21

    No output is displayed yet in the dequeue window.

  7. Type:

    4,AMAT,23,31

    No output is displayed yet in the dequeue window.

  8. Type:

    5,AMAT,24,41

    Observe this line in the dequeue window:

    AMAT,5,22,0,60,1,5

  9. Type:

    6,AMAT,25,61

    No output is displayed yet in the dequeue window.

  10. Type:

    7,AMAT,26,119

    No output is displayed yet in the dequeue window.

  11. Type:

    8,AMAT,27,121

    Observe this line in the dequeue window:

    AMAT,2,25.5,60,120,6,7

  12. Type:

    9,AMAT,28,150

    No output is displayed yet in the dequeue window.

  13. Type:

    10,AMAT,29,151

    Observe this line in the dequeue window:

    AMAT,3,28,120,180,8,10

  14. Press Control-Z (Windows) or Control-D (UNIX).

    The sbc process will exit.

  15. In window 3, type:

    sbadmin shutdown

    The sbadmin shutdown command terminates the server and dequeuer.

How We Created the AggregateByDim Sample

  1. Launched StreamBase Studio.

  2. Created (or subsequently used) the sample_operator project.

  3. From the top menu, in the SB Authoring perspective, selected FileNewEventFlow Application. Selected the sample_operator project and entered AggregateByDim for the diagram name.

  4. Created an input stream:

    1. Dragged an input stream from the palette to the EventFlow Editor.

    2. Clicked the stream on the EventFlow Editor, which invoked the Input Stream Properties view.

    3. On the General tab, Name: TradesIn

    4. On the Edit Schema tab, added

      • Field Name: seqnum, Type: int

      • Field Name: symbol, Type: string, Size:12

      • Field Name: price, Type: double

      • Field Name: time, Type: timestamp

  5. Created an Aggregate operator:

    1. Dragged an Aggregate operator from the palette to the EventFlow Editor.

    2. On the General tab, Name: Aggregate2Dimensions

    3. Connected the TradesIn input stream to the Aggregate2Dimensions operator.

  6. Set up the Aggregate2Dimensions operator:

    1. On the Dimension tab, clicked the Add button. In the Edit Dimension dialog, added:

      Name: CountDim

      Type: tuple

      The buffer set up for each window will contain the specified number of tuples. When the buffer contains all the tuples required for the window, any desired calculations will take place, a tuple containing the desired results will be emitted, and the window will be closed.

      Opening policy: Open per: Advance: 5

      This indicates that a window should be open for a group of tuples. An Advance value of 5 advances the window by five tuples. A new window will be created after each group of 5 tuples enters the operator. Because the window is closed every 5 tuples (see next step), windows do not overlap.

      Window size: Close and emit every 5 tuples

      The number of tuples in the buffer for this window.

      Emission policy: Selected "No intermediate emissions based on this dimension."

      "Emission policy" allows tuples to be emitted before the window closes. For example, one could emit a tuple every second during the 30-second window, rather than waiting for the window to close.

      Optional windows Unchecked the Create partial windows checkbox.

      When set, this option creates "partial" windows which encompass the values that would have occurred before the arrival of the first tuple. For example, where Advance is less than Size, additional windows would be opened to include the first tuple; these windows would start before the Time in the first tuple.

      At this point, the Edit Dimensions dialog for CountDim looks like this:

    2. Clicked OK.

    3. Clicked the Add button again. In the Edit Dimension dialog, added:

      Name: TimeDim

      Type: field. In the pull-down list, selected time.

      The buffer set up for each window will contain a set of tuples based on the value of the time field. When a tuple arrives whose field value exceeds the range of the open window, the calculations specified in the operator will take place, a tuple containing the desired results will be emitted, and the window will be closed.

      Opening policy: Open per: Advance: 60 and Offset: 0

      This indicates that a window should be open for a group of tuples. An Advance value of 60 will advance the window by 60 seconds. In this case, where the "Close and emit" value is also 60 seconds, only one window will be open at a time for each group (see "Group by" below). If Advance were set to 15, then windows would be created every 15 seconds and stay open for 60 seconds, overlapping each other.

      Window size: Close and emit after 60.

      This indicates each window is open for 60 seconds (as reflected in the time field for each tuple), and its buffer contains the tuples whose "time" field falls within that 60 second period.

      Emission policy: Selected No intermediate emissions based on this dimension.

      Emission policy allows tuples to be emitted before the window closes. For example, one could emit a tuple every second during the 60-second window, rather than waiting for the window to close.

      Optional windows Unchecked Open windows before first tuple.

      When set, this option creates windows which encompass the values that would have occurred before the arrival of the first tuple. For example, where Advance is less than Size, additional windows would be opened to include the first tuple; these windows would start before the Time in the first tuple.

      At this point, the Edit Dimensions dialog for TimeDim looks like this:

      Clicked OK.

    4. On the Aggregate Functions tab, unchecked the delta option, Output all input fields. Then added:

      Output Field Name: numberShares

      Expression: count()

      Returns the number of tuples (i.e. trades) that are represented in the window. For details about the available aggregate functions, see the StreamBase Expression Language and Functions topic in the Authoring Guide.

      Output Field Name: averagePricePerShare

      Expression: avg(price)

      Returns the average price per share of the trades represented in this window.

      Output Field Name: lowerBoundTimeWindow

      Expression: openval("TimeDim")

      Returns the lower boundary of the window, as determined by the TimeDim dimension. In this case, because the window's offset is 0, each window starts at a multiple of 60 seconds. Note that the window boundary might be different from the actual value in the window when it opens (for example, the value returned by the startval function).

      Output Field Name: upperBoundTimeWindow

      Expression: closeval("TimeDim")

      Returns the upper limit of the window. In this case, each window ends at a multiple of 60 seconds. Note that the window boundary might be different from the actual value in the window when it closes (for example, the value returned by the startval function).

      Output Field Name: firstSeqnum

      Expression: firstval(seqnum)

      Returns the sequence number of the first tuple that formed part of this window.

      Output Field Name: lastSeqnum

      Expression: lastval(seqnum)

      Returns the sequence number of the last tuple that formed part of this window.

    5. On the Group Options tab, added:

      Output Field Name: symbol

      Expression: symbol

      Creates a window for each set of tuples whose value for the symbol field is the same.

      Note that the "Output Field Name" need not be the same as the input field in the "Expression" column. This is most useful when the "Expression" is more complicated than just a field value.

    That completed the definition of this two-dimension aggregate operator.

  7. Created a map operator:

    1. Dragged a Map operator from the palette to the EventFlow Editor.

    2. On the General tab, Name: ConvertTimeToSeconds

    3. Connected the Aggregate2Dimensions operator to the ConvertTimeToSeconds Map operator.

    4. On the Output Settings tab, chose the Output option, explicitly specified fields, then clicked the Pass All button.

    5. Changed the expression for lowerBoundTimeWindow to to_seconds(lowerBoundTimeWindow).

    6. Changed the Expression for upperBoundTimeWindow to to_seconds(upperBoundTimeWindow).

  8. Created an output stream:

    1. Dragged an output stream from the palette to the EventFlow Editor.

    2. On the General tab, Name: AvgPricePSOut

    3. Connected the ConvertTimeToSeconds operator to the AvgPricePSOut output stream.