Using the Feed Simulation Editor

Introduction

The Feed Simulation editor is StreamBase Studio's interface for creating new feed simulations or editing existing ones. See Running Feed Simulations to learn about running your feed simulations.

StreamBase 3.7 introduced a new version of the StreamBase Feed Simulator and a new file format for the feed simulation files. If you have feed simulations saved with StreamBase 3.5 or earlier, you can upgrade your simulation files to the 3.7 or later format. See Upgrading Legacy Feed Simulations.

Opening the Feed Simulation Editor

Open the Feed Simulation editor by opening an existing feed simulation file or by creating a new feed simulation.

Open an existing feed simulation file as follows:

  • In the SB Authoring perspective, double-click the name of a feed simulation file (with .sbfs extension) in the Package Explorer.

  • In the SB Test/Debug perspective, in the Feed Simulations view, double-click the name of an existing feed simulation file. (A StreamBase application must be running to see its list of feed simulations.)

Create a new feed simulation in one of the following ways:

  • From Studio's top-level menu, select FileNewFeed Simulation.

  • In the SB Authoring perspective, right-click in the Package Explorer and select NewFeed Simulation.

  • In the SB Test/Debug perspective, right-click in the Feed Simulation view and select New Feed Simulation.

In the New StreamBase Feed Simulation dialog:

  1. Select an existing project to contain the feed simulation file.

  2. Provide a name for your feed simulation, which must be unique within the project.

  3. Click Finish.

Graphical and Source Views

The Feed Simulation editor contains both graphical and source views of the same feed simulation. Notice that there are two tabs at the bottom of the editor. Use these tabs to switch between graphical and source presentations. The settings and values you enter on one tab are reflected on its partner tab after you save the changes.

Overview of the Graphical Presentation

The graphical presentation of the Feed Simulation editor contains the following sections:

Simulation Description Section

Enter a brief description for the overall feed simulation in the top Simulation Description field. (You can document each stream in this simulation with the second description field at the bottom of the editor view.)

An example of the Simulation Description section:

Simulation Streams Section

Use the Simulation Streams section to add, edit, or remove input streams from the current feed simulation. For example:

A single feed simulation file can be defined to enqueue data to one or more input streams. The feed simulation definition for each stream can define properties unique to that stream.

Note

The name you specify for each stream must exactly match an input stream name in the StreamBase application this simulation will run against. It is not enough that the stream's schema matches. The stream name in the simulation file and in the application file must be identical.

You can use a feed simulation with more than one application, provided that the input stream names match in those applications.

The schema for an input stream in the application does not have to exactly match the schema defined in the feed simulation. For example, you might use a feed simulation to send a subset of data to a stream, using default values for the other fields in the schema.

Let's say a StreamBase application has two input streams: TradesIn and FuturesIn. In the Simulation Streams section, click Add to invoke the Add Simulation Stream dialog. In the following example, we have identified the name of the additional input stream in the application:

In this dialog, add a schema to the feed simulation using either of the following methods:

  • Use the plus sign () icon to add the fields and their data types and sizes line by line.

  • Use the Copy Schema From Existing Component () icon, which invokes the Copy Schema dialog, shown below. Use this dialog to select an existing schema from the Saved Schemas view, from a system container stream, or from any application in your Studio workspace.

After clicking OK on this dialog and again on the Add Simulation Stream dialog, the updated Simulation Streams section might look like this:

Note

If you have multiple streams defined in the Simulation Streams section, remember to select the target stream before editing the other sections in the feed simulation editor. This is especially true of the Generation Method and Processing Options sections.

Generation Method Section

Use the Generation Method section to specify how this feed simulation should generate or read data for the selected stream. Be sure to select the stream of interest in the Simulation Streams section before continuing with the Generation Method section.

There are four ways to generate or read data:

  • Default: generate uniformly random data of the correct data type for each field in the selected input stream.

  • Data File: read from a file containing delimited values for each field in the selected input stream.

  • Custom: generate random data, with precise control over the type and range of data that goes to each field.

  • JDBC: read data for each field from a table in a JDBC-compliant database.

Generation Method: Default

Select the Default option to specify that this feed simulation should generate a default load. This means the following when you run the feed simulation on a running StreamBase application:

  • The feed simulation generates about ten tuples per second for each input stream in your application. (You can adjust the rate in the Generation Options section of the Feed Simulation editor.)

  • Every int, long, double, and timestamp field is assigned a random value from 0 to 10000.

  • Every boolean field is assigned true or false.

  • Every string field is filled with a random set of uppercase ASCII characters.

  • Every blob field is assigned 16 bytes of random data, corresponding to uppercase ASCII characters.

  • Every tuple field has all of its subordinate fields filled using the rules above.

When you select the Default generation method, the Timestamp from column, Tuple buffer, and Prefill Tuple buffer controls are greyed out.

Generation Method: Data File

In the Generation Method section, select the Data File option, then click Options. In the Data File Options dialog, specify the path to an existing data file that this feed simulation should use to populate the selected stream. You can also specify which fields in the application's input stream correspond to which columns in the data file.

For example:

The following table describes the data file options:

Option Default Description
Data file None Type the name of an existing delimited value file in your current project, or click Import to import a data file from any location on the local file system. The specified data file is imported into the current Studio project, and a preview of its contents is shown in the File preview section.
File preview None Shows a read-only view of the first few rows of the data file specified in the previous field.
First row as header Disabled Enable this option to specify that the selected data file has one row of column headers in the first line of the file. You can watch the preview table to the right of the First row control to see if enabling this option is correct for your data file.
Column mapping None

Use this section to map fields in the selected input stream to fields in the specified data file. By default, fields are lined up one to one. Use the dropdown list in the Map to file column to specify which data file column should be mapped to each schema field.

If your data file has a column with timestamp data, you can designate that column for use as a relative timestamp entry. In this case, you map the timestamp data column using the Timestamp from column control in the Processing Options section, not with the Column Mapping section of this dialog. See the Timestamp from column section for details.

Map to Leaf Fields Disabled

Use the Map to Leaf Fields option (shown greyed out in the example above) to specify that the fields of a flat CSV file are to be mapped to the subfields of tuple fields, not to the tuple fields themselves. This feature lets Studio read flat CSV files generated manually or generated by other applications and apply them to schemas that have nested schemas. See Using the Map to Leaf Fields Option for more on this subject.

Delimiter Comma Specifies the character that delimits fields in your data file.
Quote Single Specifies the quote character, single or double, that delimits strings within fields in your data file.
Timestamp format "yyyy-MM-dd HH:mm:ss"

Specifies the incoming format of the timestamp field you have designated as the Timestamp from column field. See the Timestamp from column section for details.

The timestamp format pattern uses the time formats of the SimpleDateFormat class described in the Sun Java Platform SE reference documentation.

Generation Method: Customize Fields

In the Generation Method section, select Custom. In this case, the Timestamp from column, Tuple buffer, and Prefill Tuple buffer controls are greyed out.

Click Customize Fields. In the Customize Fields dialog, choose values in the Generation Method column for each field in the schema. For example:

The table displays the names and types of the fields available in the selected input stream. The Generation Method column summarizes the way you customize each field. To specify a customized value, select the row and click into the Generation Method cell. Use the pull-down menu for that cell to select from a number of different options, then press Tab or click to move the cursor to the Data Type column. With the cursor moved, one or more data fields appear below the table. In those fields, specify the values relevant for the input stream field.

The choices in the dropdown list vary by the data type for each field, as described in the following sections.

For Numeric Value Types

  • Constant: a constant numeric value you specify will always be used.

  • Enumerated: selects a numeric value from an enumeration, or set of possible values, each of which may have a different weight. Weights for each value in the enumeration default to 1, meaning they are all equally likely to appear. Use a larger value to weigh a value to appear more often, or lower values to weigh toward less often.

  • Incremented: a numeric value that starts at a specified value, then is incremented by a specified value until it is outside a restricted range, defined by minimum and maximum values. If you specify a double as the increment, that increment will be used as is, but the result will be truncated. When the value is outside the range, you can choose whether or not to reset the values and repeat.

  • Random: performs a random walk. That is, a numeric value starts at a particular value, increments by a particular value, and has a restricted range, defined by minimum and maximum values. If you specify a double as the increment, that increment will be used as is, but the result will be truncated.

  • Uniform: a uniformly-generated random number in the range defined by minimum and maximum values, inclusive.

  • Undefined: No custom value is generated for this field.

For String Value Types

  • Constant: a constant string value you specify will always be used.

  • RandomString: a string value consisting of a series of random, uppercase characters will be used. You can also specify whether the value should have a random length (up to the maximum length of the string's defined size).

  • Enumerated: selects a numeric value from an enumeration, or set of possible values, each of which may have a different weight. Weights for each value in the enumeration default to 1, meaning they are all equally likely to appear. Use a larger value to weigh a value to appear more often, or lower values to weigh toward less often.

  • Undefined: No custom value is generated for this field.

For Boolean Value Types

For Boolean fields, a number with absolute value 0.5 or greater is considered true. For Boolean fields, you can select:

  • Constant: set the same specified value for each tuple. Specify true, false, or any of the recognized synonyms for true and false.

  • Uniform: set a range of values to alternate between. Setting a range of 0 to 1 will alternate between true and false, setting the field to true about half the time. Specify different range values to weight the distribution of true and false values towards 1 (true) or towards 0 (false). For example, a range of .25 to 1.0 will set the field to true about twice as often as it sets the field to false.

  • Undefined: No custom value is generated for this field.

Null Probability Value

For all data types, in the Probability of generating null option, you can specify a range or 0.0 to 1.0 to indicate how often to randomly set the field's value to null. For example, entering 0.12 causes the field's value to be null in about 12% of the generated tuples. By default, this field contains the value 0, meaning the feed simulation will not set any field values to null.

Generation Method: JDBC Data Source

In the Generation Method section, select the JDBC option, then click Options. In the JDBC Data Source Options dialog, specify the information required to connect to a JDBC-compliant database, and specify a SQL query that will generate a response from the database that will populate the selected input stream.

Note

The JDBC generation method cannot be used for an input stream whose schema contains a field of type tuple.

To use this option, you must obtain and install the JAR file or files that implement the JDBC driver for the target database from the database vendor. These JDBC JAR files must be installed in one of the following locations on the computer that will run the feed simulation:

Windows

Copy the JDBC JAR file or files to:

%STREAMBASE_HOME%\jdk\jre\lib\ext
(Example: C:\Program Files\StreamBase Systems\StreamBase.n.m\jdk\jre\lib\ext)

or

%windir%\Sun\Java\lib\ext
(Example: C:\Windows\Sun\Java\lib\ext)
UNIX

Copy the JDBC JAR file or files to:

%STREAMBASE_HOME%/jdk/jre/lib/ext
(Example: /opt/streambase/jdk/jre/lib/ext)

or

/usr/java/packages/lib/ext

The following table describes the JDBC Data Source options:

Option Default Description
Driver Class None

Required field. Select or enter the fully qualified name of the class that implements the JDBC driver for the database you want to use.

The dropdown list for this field is populated with example JDBC driver class names. The class name to enter in this field is determined by the actual JDBC driver you obtain from your database vendor and may have changed from these examples.

You can select an example driver class name and then edit it, if your driver's class name has changed.

URI None

Required field. Enter (or select and edit) the JDBC URI that connects to the target database at your site.

The dropdown list for this field is populated with example JDBC URI strings, with placeholders for site-specific information such as hostname and database name. These example URI strings cannot be used as provided. They must be edited to specify the correct local information for your target database.

User Name None Optional field. If access to your target database requires it, enter a user name that has the authorization level necessary to run the SQL query specified in the SQL field.
Password None Optional field. If access to your target database requires it, enter the password for the user name specified in the previous field.
SQL None

Required field. Enter a fully tested and known working SQL statement that returns rows with the columns in the correct order for the schema of the specified input stream. Use the SQL syntax of your target database to construct your SQL statement.

StreamBase Systems strongly recommends using your database vendor's command line query tool or a third-party database query tool to develop the SQL query to use in this field. Get the SQL query to a known, working state outside of StreamBase Studio before attempting to use it with a feed simulation.

If your SQL SELECT statement returns the right columns in the wrong order, then adjust your SQL statement to return columns that line up by data type with the schema of the specified input stream.

JDBC Fetch Size 0 (disabled)

Optional field. Specify an integer to designate a JDBC fetch size, which gives the JDBC driver a hint as to the number of rows that should be fetched from the database when more rows are needed. The fetch size is a standard feature of JDBC drivers, and does not designate a row limit. Some JDBC drivers ignore the fetch size.

Consult your database vendor's documentation to learn about methods of determining the optimum fetch size for your target database.

Connect timeout 15 seconds Optional field. Specify an integer number of seconds for the JDBC driver to wait for results before declaring an error.

Processing Options Section

Use the Processing Options section to set the runtime behavior of the selected stream. Be sure to select the stream of interest in the Simulation Streams section before continuing with the Processing Options section.

The following table describes the feed simulation processing options:

Option Applies to Default Description
Log to Application Input All generation methods. Enabled (after an input stream and its schema are selected) If enabled, the generated feed simulation data is shown in the SB Test/Debug perspective's Application Input view. This helps you see the generated data and compare it with the dequeued results in the Application Output view.
Specify maximum tuples All generation methods. 0 (no set maximum) Specifies the maximum number of tuples to generate on the input stream. For example, to stop generating on the input stream after 20 tuples, enter 20 in the Specify maximum Tuples field. The default setting, zero, means there is no limit.
Specify maximum time All generation methods. 0 (no time limit) Specifies in seconds the longest time period to run this feed simulation. The default setting, zero, means there is no time limit.
Null string All generation methods. "null" Specifies the string used to designate null values when encountered in an incoming data stream (whether read from a data file, read from a JDBC database, or generated as part of customized simulation data). The default value is null, which means that by default, StreamBase sends a null value for that field when it encounters the lowercase letters n u l l in a field in an incoming data stream. You can change the null string value to blank to specify the empty string, or to any value you expect the incoming data to have. For example, a CSV file generated from a MySQL database table dump might contain \N to designate null fields. See Null Handling for further information.
Data rate All generation methods. 10 tuples per second (after an input stream and its schema are selected) Specifies the number of tuples per second to be sent to the specified input stream. You can use the up and down arrows to change the value, or click into the text box and enter an integer. A feed simulation with a specification of zero tuples per second will not run.
As fast as possible All generation methods. Disabled

Enables an alternative to specifying a Data Rate value. Enable this option to send the feed simulation data as fast as the host computer allows. The actual data rate at runtime is a factor of the machine's speed, and may be somewhat limited. The feed simulator uses a substantial amount of CPU resources when this option is enabled.

Use this setting only if any one of the following is true:

  • You are sending a small finite data set.

  • You are sending a large or infinite data set, and the Maximum Tuples and Maximum Time options are set to low values.

  • You are sending a large or infinite data set, you have disabled the Log to Application Input setting, and you have clicked the Disable dequeuing of application output icon () on the Application Output view.

Note

Remember that StreamBase Studio is not intended for benchmarking the full performance capabilities of StreamBase Server on production machines. This IDE-managed environment is not suitable for high-speed data rates and high CPU utilization.

Timestamp from column Data File and JDBC generation methods only. Disabled

Allows you to designate the field or column that contains relative timing values to be used to control the pace of sending tuples for this feed simulation.

You can designate a column in an incoming JDBC database query, in which case the data type of the column must be double or timestamp. You can also designate a column in an incoming data file, in which case the data type of the column must be double or must contain a string representation of a timestamp.

When enabling this option, you must also specify whether to start counting time values from zero or from the value in the first value read. For most cases, select first value read.

With this option enabled, the feed simulation uses the relative times from the designated timestamp column to drive the timing of the feed simulation. See Using the Timestamp from Column Feature for more on this feature.

Tuple buffer Data File and JDBC generation methods only. 40,000 tuples Specifies the size in tuples of the buffer that holds tuples read from a data file or database query.
Prefill Tuple buffer before starting simulation Data File and JDBC generation methods only. Disabled Designates whether to fill the entire tuple buffer before sending the first tuple to StreamBase Server.

Note that when you enter default values for these settings, the values do not appear in the generated feed simulation source file.

Using the Timestamp from Column Feature

When using a data file or database query as input for your feed simulation, you can specify one input column as the source for a relative timestamp to use in timing the simulation. If you are using a captured feed as input, and the original feed includes timestamp values, you can use this feature to run your feed simulation with the same pace of sending tuples used by the original feed.

When using a JDBC database query as the timestamp column, the data type of the specified timestamp column must be StreamBase timestamp or double. When using a column of a data file as the timestamp column, the data type must be either double or the string representation of a timestamp. When using a column of type double, the numbers represent seconds. When using a string representation of a timestamp, you must also use the Timestamp format field in the Data File Options dialog (described above) to specify a format that describes how the string is to be interpreted as a timestamp.

When the feed simulation is run, tuples are sent on the schedule specified by the time difference in seconds between each successive timestamp. That is, StreamBase ignores the absolute time and date in the timestamp value, and instead calculates the relative amount of time between the timestamp in each incoming tuple.

For example, we might have an input CSV file that starts with the following lines:

BA,51.25,"09/27/08 16:20:30",100.0
DEAR,25.43,"09/27/08 16:20:45",102.0
EPIC,142.85,"09/27/08 16:20:50",105.0
FUL,15.76,"09/27/08 16:21:00",105.5
GE,25.151,"09/27/08 16:21:05",106.0

This is the same file used as an example in the picture of the Data File Options dialog described above.

In this example input file, we can designate either column 3 or 4 to be a source of relative timestamps:

  • To use column 3 as the timing source, designate column 3 in the Timestamp from column option, and set the option to start counting from the first value read. You must also specify the string mm/dd/yy HH:mm:ss as the format specifier for the timestamp strings in column 3. In this case, the feed simulator sends the first tuple immediately on startup, then sends the next few tuples on the following schedule:

    First tuple sent On startup of the feed simulation
    Second tuple sent 15 seconds later 16:20:45 minus 16:20:30 = 15 seconds
    Third tuple sent 5 seconds later 16:20:50 minus 16:20:45 = 5 seconds
    Fourth tuple sent 10 seconds later 16:21:00 minus 16:20:50 = 10 seconds
    Fifth tuple sent 5 seconds later 16:21:05 minus 16:20:00 = 5 seconds
    And so on ... ...
  • To use column 4 as the timing source, designate column 4 in the Timestamp from column option, and set the option to start counting from the first value read. In this case, the feed simulator sends tuples according to the following schedule:

    First tuple sent On startup of the feed simulation
    Second tuple sent 2 seconds later 102 minus 100 seconds = 2 seconds
    Third tuple sent 3 seconds later 105 minus 102 seconds = 3 seconds
    Fourth tuple sent 1/2 second later 105.5 minus 105.0 seconds = 0.5 seconds
    Fifth tuple sent 1/2 second later 106.0 minus 105.5 seconds = 0.5 seconds
    And so on ... ...

Null Handling in CSV Files or Generated Data

StreamBase handles empty strings encountered in incoming data files, database queries, or feed simulations with custom generated data as follows:

  • If an empty string is encountered in an incoming field whose target schema field is a blob or a string, StreamBase sends an empty string value.

  • If an empty string is encountered in an incoming field whose target schema is other than blob or string, StreamBase sends a null value.

You can also set a specific string to be treated as an incoming null field value, as described in the Null string field in the preceding section.

Description for Each Stream

Use the Description for StreamName field at the bottom of the Editor view to document the currently selected stream. (Use the Description field at the top of the view to document the feed simulation as a whole.)

For example:

Saving Feed Simulations

On the upper tabs of the Feed Simulation Editor, StreamBase Studio displays an asterisk if the file has changed. You cannot toggle between a graphical or source presentation of the feed simulation until you save the file. All feed simulations have the .sbfs file extension, and are saved by default in the currently active Studio project.