Using Legacy Feed Simulations

Introduction and Caveats

This topic is for application developers who need to run legacy feed simulations created with StreamBase 3.5 and earlier. These instructions require you to run your legacy feed simulations from a StreamBase Command Prompt or a terminal window, instead of in StreamBase Studio's Feed Simulations View.

Some background information: StreamBase 3.7 introduced a new version of the StreamBase Feed Simulator. Now feed simulations are functionally equivalent whether they are run in StreamBase Studio or from the command line using sbfeedsim. The new feed simulator offers several advantages, including application independence and the ability in StreamBase Studio to enqueue data to multiple streams from a single feed simulation. In Studio, when you open a legacy feed simulation and then save it, the .sbfs file is automatically upgraded to the new version. The XML elements that comprise the .sbfs file are all new as of StreamBase 3.7.

However, upgrading to the new StreamBase Feed Simulator may not be possible for all users with a large set of saved test cases in legacy feed simulation files. In addition, new format feed simulations do not support the group-by or switch elements that were available in prior releases. Thus, for compatibility with legacy feed simulations, StreamBase includes the sbfeedsim-old command. It can run legacy .sbfs configurations that contain elements no longer supported by the new StreamBase Feed Simulator.

The Basics for Legacy Feed Simulations

The StreamBase Feed Simulator connects to a running StreamBase Server and generates tuples for some or all of its input streams. Its default behavior is to generate uniformly random data on your application's input streams at a rate of one tuple per second, but you can customize the generated data in many ways.

The StreamBase Feed Simulator always talks to a StreamBase Server process that is hosting a running application. In this legacy topic, we will use the firstapp sample that came with your StreamBase software distribution. (This sample is the same application that you may have built yourself by running the Creating Your First StreamBase Application tutorial, but with some new files added to demonstrate the Feed Simulator features described here.)

Start a StreamBase Server by typing sbd /opt/streambase/sample/firstapp/firstapp.sbapp. In a separate window, type sbc dequeue -v --all to view the data being input and output by the StreamBase Server.

The simplest way to use sbfeedsim-old is to give it no configuration at all and let it generate "default load." Default load means that the Feed Simulator will generate about one tuple per second for each input stream in your application. Every int and double field will be assigned a random value from 0 to 10000; every boolean field will be assigned true or false; and every string field will be filled with a random set of uppercase characters. Start sbfeedsim-old by simply typing the command, and you will see something like the following, though with slightly different times and field values. (Press Control-C to terminate the Feed Simulator after a few lines have been output.)

t=0.927: ItemsInputStream1 ((time=2007-01-17 17:35:43.801Z)
    ITEM_NAME="ZNRNCRCDMW" SKU=6714)
t=2.947: ItemsInputStream1 ((time=2007-01-17 17:35:45.820Z)
    ITEM_NAME="CTAHKGLLSI" SKU=4863)
t=3.230: ItemsInputStream1 ((time=2007-01-17 17:35:46.104Z)
    ITEM_NAME="GATZCOEVLG" SKU=6013)
t=3.788: ItemsInputStream1 ((time=2007-01-17 17:35:46.661Z)
    ITEM_NAME="TWCXQTHKJC" SKU=243)

At 0.927 seconds after it was started, the Feed Simulator generated the first line; about 2 seconds later (at 2.947 seconds after it was started), the Feed Simulator generated the second line, and so forth. Note that tuples are not generated exactly one second apart: they are generated on average one second part, according to an exponential distribution, which is more representative of real-world, randomly-arriving data. (You can, however, instruct the Feed Simulator to generate exactly one tuple per second, i.e., at t=1.0, t=2.0, etc.; we will get to that explanation later.)

Look in the sbc dequeue terminal you started earlier; you will see each of these ItemsInputStream1 tuples there. Recall that the firstapp sample separates tuples into ItemsOutputStream1 (items with SKU > 5000) and AllTheRest (items with SKU <= 5000); therefore, for each of the four lines above you will also see a line containing ItemsOutputStream1 or AllTheRest.

There are a few command-line options you can try to modify the Feed Simulator's behavior. Try running:

  • sbfeedsim-old -n: The Feed Simulator will display output but not actually send it to the server (nothing will appear in your sbc dequeue window). This is useful for debugging more complex Feed Simulator configurations (you will see one later).

  • sbfeedsim-old -x 5: The Feed Simulator will generate data 5 times as fast as normal, i.e., about 5 tuples per second rather than 1 per second. (You could also use sbfeedsim-old -x .2 to generate 0.2 tuples per second, i.e., 5 times as slow as normal.)

  • sbfeedsim-old -z 100: The Feed Simulator will use a different random seed, so it will generate similar but slightly different data. In general, any two invocations of sbfeedsim-old with the same command-line arguments and the same configuration file (if any) will output exactly the same data, but -z provides a way to change the data. You can also us -z clock to use the system clock as a random seed, which will result in different data every time.

  • sbfeedsim-old --max-tuples 5: The Feed Simulator will stop after generating 5 tuples.

  • sbfeedsim-old --max-time 3: The Feed Simulator will stop after 3 seconds, regardless of how many tuples have been generated.

  • sbfeedsim-old -a ItemsInputStream1: this tells the Feed Simulator to generate default load on only ItemsInputStream1, rather than all input streams in the application. ItemsInputStream1 happens to be the only input stream in this simple application, so this behaves exactly like sbfeedsim-old without any command-line arguments, but it is good to know for more complicated applications.

Legacy .sbfs Configuration Files

Often you will want to configure the data generated by the Feed Simulator in more complex ways than the command-line switches above allow; to this end the Feed Simulator lets you provide a configuration file. Type sbfeedsim-old -s > firstapp.sbfs to generate a (legacy-format) customizable "skeleton" configuration file for your application into a file named firstapp.sbfs. (Note that, like any other sbfeedsim-old invocation, sbfeedsim-old -s requires that a StreamBase Server containing your application is running.) Open firstapp.sbfs in a text editor and you'll see something like this:

<?xml version="1.0"?>

<!-- FeedSim skeleton generated by ... at ... -->
<feed-simulation>
    <stream name="ItemsInputStream1">
        <rate per-second="1.0"/>
        <field name="ITEM_NAME"> <random-string/> </field>
        <field name="SKU"> <uniform min="0" max="10000"/> </field>
    </stream>
</feed-simulation>

As you can see, the skeleton is tailored for your application; it's actually the "default load" specification described earlier, i.e., one tuple per second per input stream. You can run the Feed Simulator using this configuration with sbfeedsim-old firstapp.sbfs (although right now, since you haven't modified the file, it will work exactly like sbfeedsim-old with no arguments).

The configuration file contains one or more stream elements, each describing the data to be generated for a particular input stream. (There's only one input stream in our sample application, so there's only one stream section here.) Each stream element contains, in order:

  1. A rate specification describing how often to generate a tuple for that input stream. The simplest form of rate specification is simply rate per-second="n", where n is a rate in tuples per second. We'll discuss other kinds of rate specifications in a bit.

  2. A field element for each field in that stream. Each field element must contain a source, which is a description of how the Feed Simulator should create each value for that field in the tuples it generates. There are several kinds of sources, including:

    • random-string/: fill the field with a series of randomly-generated uppercase characters (for example, "RGQWOZ" for a six-character string field). This applies to string fields only.

    • uniform min="min" max="max"/: generate a random number greater than or equal to min and less than max.

    • step min="min"/: start at min (defaults to 0), incrementing by 1 each time a value is generated. For example, 0, 1, 2, ...

    • random-walk min="min" max="max"/: start at a value between min and max, incrementing or decrementing by 1 (chosen at random) each time a value is generated (pinned between min and max). For example, random-walk min="1" max="3"/ might generate 2, 3, 3, 2, 1, 2, 3, ...

    • constant value="value"/: always just use value.

    Note that this is not a complete reference: there are many other sources (and other parameters for some of the sources that are listed here). Refer to the StreamBase Legacy Feed Simulation XML reference topic for a complete list.

Try replacing the uniform .../ tag in the configuration with step min="10"/ or random-walk min="10" max="30"/, and type sbfeedsim-old -n --max-tuples=20 -x10 firstapp.sbfs to see what is generated (without sending tuples to the server [-n], stopping after 20 tuples [--max-tuples=20], and at 10x speed [-x10]).

Legacy Rate Specifications

We have covered the simplest form of rate specification, rate per-second="n"/, but there are two other forms:

  • interval source /interval: Use source to determine the amount of time the Feed Simulator should let pass between tuples. For example, if you want tuples to be exactly one second apart, use:

    <interval> <constant value="1.0"/> </interval>
    

    The following would put 1 second before the first tuple, then 2 seconds between the first and second tuple, then 3 seconds between the second and third tuples, and so forth.

    <interval> <step min="1.0"/> </interval>
    
  • timestamp source /timestamp: Use source to determine the relative point in time at which the Feed Simulator should generate the tuple. The values generated by source must be strictly non-decreasing (it doesn't make sense to generate a tuple at t=4 seconds, then t=5 seconds, then t=3 seconds)! This is mostly useful in combination with trace files, described in the next section.

Try replacing the rate element in the configuration with one of the interval snippets above, and type sbfeedsim-old -n --max-tuples=20 firstapp.sbfs again. (Add -x10 if you are in a hurry, but be aware this will make all your rate specifications 10 times faster.)

Legacy Trace (Data) Files

You can also use the Feed Simulator to read tuples (or parts of tuples) from a trace file rather than randomly generating data. Let's say that we have a list of item names and SKUs in a CSV (comma-separated values) file called firstapp-trace1.csv that looks like this:

TEA,96371
COFFEE,785799
EGGS,873904
CHEESE,353728
EGGS,394293
COCOA,575788

(The first column is the item name, and the second column is the SKU.) We want to send these tuples to the server at the usual 1 tuple per second. The following legacy (pre-3.7) configuration will do this:

<feed-simulation>
    <stream name="ItemsInputStream1" trace-file="firstapp-trace1.csv">
        <rate per-second="1.0"/>
        <field name="ITEM_NAME"> <trace column="1"/> </field>
        <field name="SKU"> <trace column="2"/> </field>
    </stream>
</feed-simulation>

This file is included in the firstapp sample as firstapp-trace1.sbfs; to run it go into the /opt/streambase/sample/firstapp directory and type sbfeedsim firstapp.sbfs.

Note that the name of trace file is listed in the stream attribute, and the special trace column="n" source is used to refer to a particular column in that trace file.

Not all fields in a stream must come from the trace file. You could replace the SKU field specification with another source, such as uniform as above; the item name for each row would still be read from the trace file but the SKU would be randomly generated as before.

Timestamps in Trace Files

Trace files may also include timestamps. firstapp-trace2.csv looks like the trace file above, except that it has a timestamp in the third column:

TEA,96371,0.171
COFFEE,785799,1.041
EGGS,873904,1.733
CHEESE,353728,2.479
EGGS,394293,3.211

By replacing the rate tag with timestamp> <trace column="3"/> </timestamp, we can tell the Feed Simulator to generate the given tuples at the given times in the third column (TEA at t=0.171 seconds, COFFEE at t=1.041 seconds, and so on).

Timestamps must be strictly non-decreasing: if the timestamp on any line in the file were less than a previous timestamp, then the Feed Simulator would abort with an error message. (It wouldn't make sense for the Feed Simulator to generate a tuple, say, at t=2.479 seconds, and then generate a tuple at t=2.211 seconds!)

Try sbfeedsim firstapp-trace2.sbfs to run this example.

If your timestamps don't start at zero, you can use the origin attribute of the timestamp tag to specify an offset. See the StreamBase Feed Simulation XML reference topic for more information.

Legacy Enumerations

Often you will find it useful to instruct the Feed Simulator to choose at random from a predefined set of possible values. You can use enumerations to do this. Define an enumeration with the define-enum tag, and then refer to it later with the special enum source:

<feed-simulation>
    <define-enum name="item-names">
        <value>CEREAL</value>
        <value>MILK</value>
        <value weight="2">EGGS</value>
    </define-enum>
    <stream name="ItemsInputStream1">
        <rate per-second="1.0"/>
        <field name="ITEM_NAME"> <enum ref="item-names"/> </field>
        <field name="SKU"> <uniform min="0" max="10000"/> </field>
    </stream>
</feed-simulation>

The Feed Simulator will pick an item name for each tuple at random from the item-names enumeration. Note the weight="2" to make EGGS twice as likely to appear as other items. This configuration can be run by typing sbfeedsim firstapp-enum.sbfs.

Legacy Advanced Topics

This section covers advanced topics.

Legacy Switches

Sometimes you may find it useful for some fields' values to depend on the contents of a "key field." For instance, let's say that for CEREAL we want SKUs to be fixed at 1, for MILK we want SKUs chosen at random from between 1000 and 2000, and for everything else we want SKUs to start at 3000 and increase. We can use the switch construct to do this:

<switch field="ITEM_NAME">
    <case value="CEREAL">
        <field name="SKU"> <constant value="1"/> </field>
    </case>
    <case value="MILK">
        <field name="SKU"> <uniform min="1000" max="2000"/> </field>
    </case>
    <default>
        <field name="SKU"> <step min="3000"/> </field>
    </default>
</switch>

Legacy Grouping by a Key Field

Some sources, like random-walk, are stateful: their value is not generated independently each time but depends on previous values. Use group-by to cause a separate state to be used for each value of a key field, such as one random-walk for CEREAL, one for MILK, and one for EGGS:

<stream name="ItemsInputStream1">
    <rate per-second="1.0"/>
    <field name="ITEM_NAME"> <enum ref="item-names"/> </field>

    <group-by field="ITEM_NAME">
        <field name="SKU"> <random-walk min="1000" max="2000"/> </field>
    </group-by>
</stream>

Of course, this does‑not make much sense for groceries. But it would be useful, for example, in the case of stock tickers (where each ticker symbol should have its own state) or position information (where each of a series of objects is moving independently).

Related Topics

This topic described legacy StreamBase feed simulations, which are provided for compatibility with pre-3.7 .sbfs configurations. If you need more information, see:

Back to Top ^