Library Articles
Tracking the Top N Query Items
Author: Denis Bradford
Contributors: Eddie Galvez, Ricardo Garcia, Dr. John Lifter, Richard Tibbetts
StreamBase Systems
10-April-2007
Topics:
Suppose you want to track a sorted range of values that are stored in a query table, such as the top n highest-priced stocks. One method is to use a query operator to repeatedly read all the rows in the table, and then use a series of other operators to process the result, compare prices across stock symbols, and compute the top n. Conversely, you could pre-process tuples upstream from a query table that has just one row containing n fields. Each time a tuple arrives on the input stream, compare values to see if the table needs to be updated.
Both of these options require complicated processing to compare the values and continually calculate the top n stocks. This article suggests a simpler method: using the Query operator's built-in option to limit the number of output rows.
We'll describe a simple application to demonstrate the basic idea: The application has two input streams. We feed stock data through one input stream and use one query operator to store it in a query table. When we have accumulated some data, we send a tuple through the other stream to another query operator. That tuple triggers a read operation that gives us just the current top three highest stocks. The following EventFlow screen shows the components and flow of data through the application:

Let's look at the StreamSQL representation of our hypothetical application, and then discuss the numbered callouts:
(1) CREATE INPUT STREAM StocksIn (
ID int,
Symbol string(5),
Price double
);
CREATE INPUT STREAM TriggerIn (
ShowTopN int
);
(2) CREATE MEMORY TABLE StocksTable (
ID int,
Symbol string(5),
Price double,
PRIMARY KEY(ID) USING hash
);
CREATE INDEX i1 ON StocksTable USING BTREE (Price);
CREATE OUTPUT STREAM OutputTableFields;
CREATE OUTPUT STREAM OutputTopN;
(3) INSERT INTO StocksTable (ID, Symbol, Price)
SELECT ID AS ID, StocksIn.Symbol AS Symbol, StocksIn.Price AS Price
FROM StocksIn
RETURNING StocksTable.ID AS table_new_ID, StocksTable.Symbol
AS table_new_Symbol, StocksTable.Price AS table_new_Price
INTO OutputTableFields;
(4) SELECT StocksTable.ID AS table_current_ID, StocksTable.Symbol
AS table_current_Symbol, StocksTable.Price AS Price
FROM TriggerIn OUTER JOIN StocksTable
ORDER BY Price DESC
LIMIT 3
INTO OutputTopN;
Note the following features of the application, while referring to the callouts in the preceding code:
- In addition to the stock symbol and price, the StocksIn schema includes a unique ID for each tuple; this will enable the table to store multiple rows for each symbol.
- In the Query table we set the primary key to the ID field using hash indexing. We set a secondary, btree index on the Price field.
- The INSERT and first SELECT operations in the StreamSQL code correspond to the LoadStocksTable Query operator in the EventFlow. As each tuple arrives on the StocksIn stream, this operator adds a new row to the table. The only purpose of the OutputTableFields output stream is diagnostic: it shows the values of each row as they are loaded.
- The second SELECT statement in the StreamSQL code reads the StockTable rows that we specify, but we also constrain the read operation by specifying a LIMIT. The following EventFlow screen shows how the operation is set up in the corresponding ReadStocksTable Query operator:

We specified a small number, and we might have called this application TOP-3, but you could substitute a different limit.
When we send a tuple from the TriggerIn stream, this operator queries only the range of rows that we have specified, and outputs their values on the OutputTopN stream in descending order. If you were interested in the bottom-n prices, you would use ascending order (the default) instead. The following EventFlow screen shows the corresponding Query output settings:

Back to Top ^