Developers: Detecting Patterns in StreamBase Applications

Home
Documentation
Library
Sample Code and Applications
FAQs
Articles
Community
Training
Download Center
Contact DevZone

Printer Friendly

Library Articles

Detecting Patterns in StreamBase Applications

Authors: Richard Tibbetts, Kimberly Burchett, Denis Bradford, Dr. John Lifter
StreamBase Systems

Date: 01-September-2007

Applicable To: StreamBase 5.0

Introduction


In your applications, you may be interested in knowing when tuples are available for analysis in a specific sequence. That is, you may want to limit your processing to events that occur within a specific time span, perhaps with an additional constraint such as a common internet protocol address or radio frequency identification number, etc.

StreamBase 5 introduces a pattern matching language that can be used to detect these desired patterns. This functionality is available in both EventFlow and StreamSQL applications and may be applied to events enqueued through single or multiple input streams.

The Pattern Matching Language


Statements in the pattern matching language adhere to the following syntax,

  <template> WITHIN <interval> TIME

where <interval> and <template> have the following meanings.

  • <interval> is the duration, in seconds, over which to evaluate the pattern.
  • <template> is the pattern to evaluate and consists of one of the following:
    • a stream identifier, with an optional AS alias clause
    • <template>
    • <template> OR <template>
    • <template> AND <template>
    • <template> THEN <template>
    • NOT <template>

That is, a template may be composed of nested templates or templates combined using logical operators. When writing a template, the logical operators may be expressed as a keyword (OR, THEN, AND, NOT) or as a shorthand symbol (||. ->, &&, !).

In an EventFlow application, pattern evaluation is performed by the Pattern operator, where separate controls on the StreamBase Properties view, Pattern Settings tab are used to specify the desired template, dimension, and interval values. A predicate entry allows further tuning of the match pattern.

In a StreamSQL application, pattern matching expressions are included within a SELECT statement‘s FROM clause and further tuning of the match pattern is specified in the WHERE clause.

Application Examples


It is appropriate to discuss pattern matching on a single stream and matching across multiple streams as separate topics.

Single Stream Pattern Matching

When you want to detect patterns within tuples, or events, arriving on a single stream, you must use aliasing to indicate how many events are included in the template. For example, if you want to consider pairs of events, the template will reference the same stream twice applying a different alias to each reference. The template

  InputStream AS stream1 THEN InputStream AS stream2

indicates that the pattern involves two events arriving in a specified order on the stream InputStream.

If your desired pattern involves a greater number of events, the template becomes more complex. The following template describes a pattern involving three events.

  (InputStream AS s1 AND InputStream AS s2)
THEN InputStream AS s3

Note how parentheses are used to specify the pattern; the AND operator indicates that the order of the first two events is not specified, but both of these events must occur before the third event, which follows the THEN operator.

To complete either of these pattern specifications, you must specify the period of time within which the pattern must be detected and apply constraints that would further limit the events that satisfy the pattern. Without these additional entries, the template itself has little meaning.

Let’s more fully flush out the details of an example, first using StreamSQL and then an EventFlow diagram.

  CREATE INPUT STREAM InputStream
(stock string(5), price double, shares int);

CREATE OUTPUT STREAM Out AS
SELECT s1.stock AS stock1, s1.price AS price1,
s1.shares AS shares1, s2.shares AS shares2
FROM PATTERN InputStream AS s1 THEN
InputStream AS s2 THEN
InputStream AS s3
WITHIN 20 TIME
WHERE s1.stock=s2.stock AND s2.stock=s3.stock AND
((s3.price>s2.price) OR (s3.price>s1.price));

A single input stream is referenced three times in the FROM PATTERN clause; consequently, each stream must later be referred to through its alias name. The WITHIN clause specifies that the desired pattern be detected within 20 seconds; fractional seconds, for example, 20.5, are also acceptable. Finally, the WHERE clause specifies that the pattern involves three events with identical values in the stock field, where the value in the price field of the third event is greater than the price field in either the first or second event. The target list following the SELECT keyword lists the field values that will be included in the tuple emitted by this statement when a successful pattern match has been identified.

The following EventFlow application produces the same application.

Two tabs within the Pattern operator’s StreamBase Properties view are used to configure this operator. On the Pattern Settings tab, enter the template, size value, and predicate (which correspond to the FROM PATTERN, WITHIN, and WHERE clauses in the StreamSQL statement), and on the Output Settings tab specify the fields that will be included in the emitted tuple. Since an arc in an EventFlow application does not have a name, in the following figures the incoming stream is referred to using a name derived from its corresponding input port. The single input port in this example is named input1.

 

Running the Application

Start either version of this application and then enqueue the following tuples. (You must submit all of the tuples within the specified time period. If you are uncomfortable with this constraint, increase the interval value.)

Stock

Price

Shares

a

22.0

1

a

23.0

2

a

21.0

3

a

25.0

4

After enqueuing the third tuple, nothing was emitted on the output stream. This is correct as s3.price was not greater than the price value in either of the first two tuples.

After enqueuing the fourth tuple, three tuples are emitted.

Stock

Price

Shares1

Shares2

a

22.0

1

2

a

22.0

1

3

a

23.0

2

3

Since all the tuples were enqueued during the specified time duration, the first emitted tuple results from the first, second and fourth tuples satisfying the pattern and the second emitted tuple results from the first, third and fourth tuples satisfying the pattern. Finally, the third tuple results from the second, third and fourth tuples satisfying the pattern.

As a further example, you should enqueue the following eight tuples.

Stock

Price

Shares

a

22.0

1

a

23.0

2

a

21.0

3

a

21.0

4

a

21.0

5

a

21.0

6

a

21.0

7

a

25.0

8

If you work quickly enough so that all eight tuples are enqueued within the specified time interval, no tuples are emitted until the eighth tuple is enqueued. Then, 21 tuples are emitted. Review the values in the shares1 and shares2 fields, which will illustrate that multiple matching patterns have been detected. Now, investigate the effect of changing the value of the stock field in some of the tuples; just be certain that to enqueue at least three tuples with the same stock field value and that the price field in the third tuple is greater than the price field values in the first and second tuples.

Multiple Stream Pattern Matching

Setting up a pattern matching specification that involves multiple streams is similar to the single stream variant with the exception that it is no longer necessary to provide an alias name for each distinct stream. Of course, if the pattern involves multiple events on one of the streams, then the multiple references to this stream will need to be aliased.

The following StreamSQL application detects a pattern across the tuples enqueued onto two separate input streams.

  CREATE INPUT STREAM InputStream1 (
stock string(5),
value double
);
CREATE INPUT STREAM InputStream2 (
stock string(5),
value double
);
CREATE OUTPUT STREAM Out;

SELECT InputStream1.stock AS stock,
InputStream1.value AS value1,
InputStream2.value AS value2
FROM PATTERN (InputStream1 THEN InputStream2) WITHIN 20 TIME
WHERE (InputStream2.value > InputStream1.value) AND
(InputStream1.stock = InputStream2.stock)
INTO Out;

Notice how it is unnecessary to provide an alias name for each distinct stream.

The equivalent EventFlow application is described in the following figures.

Note how each incoming arc is referenced through its associated input port.

Related Topics


StreamSQL SELECT Statement

StreamBase Pattern Matching Language