Using the Feed Simulation Editor

Introduction

The Feed Simulation Editor is StreamBase Studio's interface for creating new feed simulations or editing existing ones. See Running Feed Simulations to learn about running your feed simulations.

Opening the Feed Simulation Editor

Open the Feed Simulation Editor by opening an existing feed simulation file or by creating a new feed simulation.

Open an existing feed simulation file as follows:

  • In the SB Authoring perspective, double-click the name of a feed simulation file (with .sbfs extension) in the Package Explorer.

  • In the SB Test/Debug perspective, in the Feed Simulations view, double-click the name of an existing feed simulation file. (A StreamBase application must be running to see its list of feed simulations.)

Create a new feed simulation in one of the following ways:

  • From Studio's top-level menu, select FileNewFeed Simulation.

  • Click the New Feed Simulation () button in the toolbar.

  • In the SB Authoring perspective, right-click in the Package Explorer and select NewFeed Simulation.

  • In the SB Test/Debug perspective, right-click in the Feed Simulation view and select New Feed Simulation.

In the New StreamBase Feed Simulation dialog:

  1. Select an existing project to contain the feed simulation file.

  2. Provide a name for your feed simulation, which must be unique within the project.

  3. Click Finish.

Graphical and Source Views

The Feed Simulation Editor contains both graphical and source views of the same feed simulation. Notice that there are two tabs at the bottom of the editor. Use these tabs to switch between graphical and source presentations. The settings and values you enter on one tab are reflected on its partner tab after you save the changes.

Overview of the Graphical Presentation

The graphical presentation of the Feed Simulation editor contains the following sections:

Simulation Description Section

Enter a brief description for the overall feed simulation in the top Simulation Description field. (You can document each stream in this simulation with the second description field at the bottom of the editor view.)

An example of the Simulation Description section:

Simulation Streams Section

Use the Simulation Streams section to add, edit, or remove input streams from the current feed simulation.

A single feed simulation file can be defined to enqueue data to one or more input streams. The feed simulation definition for each stream can define properties unique to that stream.

Note

The name you specify for each stream must exactly match an input stream name in the StreamBase application this simulation will run against. It is not enough that the stream's schema matches. The stream name in the simulation file and in the application file must be identical.

You can use a feed simulation with more than one application, provided that the input stream names match in those applications.

The schema for an input stream in the application does not have to exactly match the schema defined in the feed simulation. For example, you might use a feed simulation to send a subset of data to a stream, relying on default values for the other fields in the schema.

Starting with release 7.1.0, you can specify a input stream with an empty schema, as described in Using Empty Schemas. In this case, only the Default generation method is available for such a stream. You can specify a limited set of Processing Options for such streams. When run, a feed simulation for streams with an empty schema sends a series of no-fields tuples to the specified stream.

Copy from Stream or Named Schema

Use the Copy from Stream/Named Schema button to quickly copy the schema from an existing stream or named schema in your workspace.

Clicking this button opens a dialog that shows you a tree list of all modules and interfaces in your current Studio workspace. Click the arrows next to the project folder that you know contains the module or interface of interest. Select the module or interface that contains the stream or named schema of interest. Select that stream or schema and click OK to copy that schema to the Simulation Streams section.

Adding a New Stream

Let's say a StreamBase application has two input streams: TradesIn and FuturesIn. In the Simulation Streams section, click New Stream to invoke the Add Simulation Stream dialog. In the following example, we have identified the name of the additional input stream in the application:

In this dialog, add a schema to the feed simulation using either of the following methods:

  • Use the plus sign () icon to add the fields and their data types and sizes line by line.

  • Use the Copy Schema From Existing Component () icon, which invokes the Copy Schema From dialog described on Copying Schemas. Use this dialog to select an existing schema from a system container stream, or from any module in your Studio workspace.

After clicking OK twice, the updated Simulation Streams section looks like this:

Note

If you have multiple streams defined in the Simulation Streams section, remember to select the target stream before editing the other sections in the feed simulation editor. This is especially true of the Generation Method and Processing Options sections.

Generation Method Section

Use the Generation Method section to specify how this feed simulation should generate or read data for the selected stream. Be sure to select the stream of interest in the Simulation Streams section before continuing with the Generation Method section.

There are four ways to obtain data for this feed simulation:

  • Default: generate uniformly random data of the correct data type for each field in the selected input stream.

  • Data File: read from a file containing delimited values for each field in the selected input stream.

  • Custom: generate random data, with precise control over the type and range of data that goes to each field.

  • JDBC: read data for each field from a table in a JDBC-compliant database.

Generation Method: Default

Select the Default option to specify that this feed simulation should generate a default load. This means the following when you run the feed simulation on a running StreamBase application:

  • The feed simulation generates about ten tuples per second for each input stream in your application. (You can adjust the rate in the Generation Options section of the Feed Simulation editor.)

  • Every int, long, double, and timestamp field is assigned a random value from 0 to 10000.

  • Every boolean field is assigned true or false.

  • Every string field is filled with characters from a random set of uppercase ASCII characters.

  • Every blob field is assigned 16 bytes of random data, corresponding to uppercase ASCII characters.

  • Every tuple field has all of its subordinate fields filled using the rules above.

When you select the Default generation method, the Timestamp from column, Tuple buffer, Prefill Tuple buffer, and Loop on Tuple buffer controls are dimmed.

Generation Method: Data File

In the Generation Method section, select the Data File option, then click Options. In the Data File Options dialog, specify the path to an existing data file that this feed simulation should use to populate the selected stream. You can also specify which fields in the application's input stream correspond to which columns in the data file.

The following image shows the Data File Options dialog filled in for a simple data file example.

The following table describes the options in the Data File Options dialog.

Option Default Description
Data file None Type the name of an existing delimited value file in your current project, or click Import to import a data file from any location on the local file system into the current Studio project. A preview of of the selected file's contents is shown in the File preview section.

The feed simulator reads uncompressed plain text CSV files, compressed CSV files, and StreamBase binary files. Compressed and binary files are recognized by their file name extension, as described in the following table:

File name Extension Read by Feed Simulator As ...
.zip CSV file compressed with zip.
.gz CSV file compressed with gzip.
.bz2 CSV file compressed with bzip2. (Compression with bzip2 can result in significantly smaller files, but at the cost of slower reading times.)
.bin Binary output file generated with the StreamBase Binary File Writer adapter. (You must generate the binary files with the same release of StreamBase currently running the feed simulation.)
.bin.gz Binary output file generated with the StreamBase Binary File Writer adapter, with that adapter's compression option enabled.
Any other extension or no extension Uncompressed, plain text CSV file.
Custom reader None Advanced. Use this button to specify the fully qualified name of a Java class that implements a custom file reader for non-standard, proprietary, or binary files. The specified class must be on the classpath of the JVM running StreamBase Studio (or the JVM running the sbfeedsim command). See Feed Simulation with Custom File Reader for instructions on using this feature.
File preview None Shows a read-only view of the first few rows of the file specified in the Data file field. The preview updates automatically to reflect selections in other fields of the dialog.
Lines to skip 0

Enter an integer number of lines to skip before interpreting lines as data. Use this control to skip a header line without using the First row as header option. This allows you to designate a column as a timestamp value without having to map fields to data by incoming column name.

You can also use this control to start reading the data file at an arbitrary preferred starting point. For example, a compressed file of market data might have been recorded starting at 9:00 AM on a trading day, but you want to run your feed simulation starting with the trades that occurred after the exchange opened at 9:30 AM. You must empirically determine the starting point row in the data file, such as by entering a guessed number of rows to skip and checking the timestamp for the new top row in the File preview section or the Column mapping grid. Then enter a lower or higher number of rows to skip until the timestamp in the previews matches the timestamp you seek.

When the First row as header option is enabled at the same time as this option, the number of lines to skip are counted from the second row of the data file, leaving the first row intact to be interpreted for its label information.

First row as header Disabled

When disabled (the default), the column headings of the File preview grid are numbered instead of named. This helps you line up data file columns to field names when using the Map to file control.

Enable this option to specify that the selected data file has one row of delimited column headers in the first line of the file. Watch the File preview control and the Column mapping grid to see if enabling this option is correct for your data file. When enabled, the column headings of the File preview grid show the heading text read from the file instead of numbers.

This option does not skip the first row, it reads and interprets the first row as labels. If you have enabled both First row as header and Lines to skip options, the first row is read for its header information, then the lines to skip start counting after the first row.

If the current input stream (for which you are defining this input data file) has an empty schema, the First row as header control is dimmed and unavailable. It is still possible to specify a data file as input to a stream with an empty schema, such as to specify a sequence of timestamps to use with the the Timestamp from column field, as described in Using the Timestamp from Column Feature.

Column mapping None

Use the Column mapping grid to map fields in the data file to fields in the currently selected input stream. By default, fields are lined up one to one. Use the drop-down list in the Map to file column to specify which data file column should be mapped to each schema field. The Map to file column shows column names from the header row if you enabled the First row as header control, and shows column numbers otherwise.

If your data file has a column with timestamp data, you can designate that column for use as a relative timestamp entry. In this case, map the timestamp data column using the Timestamp from column control in the Processing Options section, not with the Column mapping section of this dialog, as described in Using the Timestamp from Column Feature.

Map data file columns to sub-fields of tuple fields Disabled

This option only appears in the dialog when the schema of at least one of the input streams for this feed simulation contains at least one field of type tuple.

Use the Map data file columns to sub-fields option to specify that the fields of a flat CSV file are to be mapped one-to-one to the sub-fields of tuple fields. This feature lets Studio read flat CSV files generated manually or generated by non-StreamBase applications and apply them to schemas that have tuple fields. See Map to Sub-Fields Option for details.

Delimiter Comma Specifies the character that delimits fields in your data file.
Quote Double Specifies the quote character, single or double, that delimits strings within fields in your data file. The default is the double quote.
Timestamp format "yyyy-MM-dd HH:mm:ss"

Specifies the incoming format of the timestamp field you have designated as the Timestamp from column field. See the Timestamp from column section for details.

The timestamp format pattern uses the time formats of the java.text.SimpleDateFormat class described in the Sun Java Platform SE reference documentation.

Timestamp builder None Used in conjunction with the Timestamp format field, this control allows you to build a single timestamp from a combination of two or more data file columns plus optional text strings. See Using the Timestamp from Column Feature for details on using this feature.

Generation Method: Customize Fields

In the Generation Method section, select Custom. In this case, the Timestamp from column, Tuple buffer, Prefill Tuple buffer, and Loop on Tuple buffer controls are dimmed.

Click Customize Fields. In the Customize Fields dialog, choose values in the Generation Method column for each field in the schema. For example:

The table displays the names and types of the fields available in the selected input stream. The Generation Method column summarizes the way you customize each field. To specify a customized value, select the row and click into the Generation Method cell. Use the drop-down menu for that cell to select from a number of different options, then press Tab or click to move the cursor to the Data Type column. With the cursor moved, one or more data fields appear below the table. In those fields, specify the values relevant for the input stream field.

The choices in the drop-down list vary by the data type for each field, as described in the following sections.

For Numeric Value Types

  • Constant: a constant numeric value you specify will always be used.

  • Enumerated: selects a numeric value from an enumeration, or set of possible values, each of which may have a different weight. Weights for each value in the enumeration default to 1, meaning they are all equally likely to appear. Use a larger value to weigh a value to appear more often, or lower values to weigh toward less often.

  • Incremented: a numeric value that starts at a specified value, then is incremented by a specified value until it is outside a restricted range, defined by minimum and maximum values. If you specify a double as the increment, that increment will be used as is, but the result will be truncated. When the value is outside the range, you can choose whether or not to reset the values and repeat.

  • Random: performs a random walk. That is, a numeric value starts at a particular value, increments by a particular value, and has a restricted range, defined by minimum and maximum values. If you specify a double as the increment, that increment will be used as is, but the result will be truncated.

  • Uniform: a uniformly-generated random number in the range defined by minimum and maximum values, inclusive.

  • Undefined: No custom value is generated for this field.

For String Value Types

  • Constant: a constant string value you specify will always be used.

  • RandomString: a string value consisting of a series of random, uppercase characters will be used. You can specify the minimum and maximum length of the generated strings, which defaults to 4 characters for both fields.

  • Enumerated: selects a string value from an enumeration, or set of possible values, each of which may have a different weight. Weights for each value in the enumeration default to 1, meaning they are all equally likely to appear. Use a larger value to weigh a value to appear more often, or lower values to weigh toward less often.

  • Undefined: No custom value is generated for this field.

For Boolean Value Types

For Boolean fields, a number with absolute value 0.5 or greater is considered true. For Boolean fields, you can select:

  • Constant: set the same specified value for each tuple. Specify true, false, or any of the recognized synonyms for true and false.

  • Uniform: set a range of values to alternate between. Setting a range of 0 to 1 will alternate between true and false, setting the field to true about half the time. Specify different range values to weight the distribution of true and false values towards 1 (true) or towards 0 (false). For example, a range of .25 to 1.0 will set the field to true about twice as often as it sets the field to false.

  • Undefined: No custom value is generated for this field.

For List Value Types

For fields of type list, the Generation Method column offers the ListSize option. You can specify the minimum and maximum number of elements to be generated, which defaults to 4 elements for both fields.

For Tuple Value Types

For fields of type tuple, each field of the tuple is shown separately. Each field's row offers the appropriate choice for its data type, as described in previous sections.

Null Probability Value

For all data types, in the Probability of generating null option, you can specify a range or 0.0 to 1.0 to indicate how often to randomly set the field's value to null. For example, entering 0.12 causes the field's value to be null in about 12% of the generated tuples. By default, this field contains the value 0, meaning the feed simulation will not set any field values to null.

Generation Method: JDBC Data Source

You can configure a feed simulation to use the response from a SQL query to a JDBC database as the source of input tuples for the feed simulation.

This configuration of this feature is discussed in Feed Simulation with a JDBC Data Source.

Processing Options Section

Use the Processing Options section to set the runtime behavior of the selected stream. Be sure to select the stream of interest in the Simulation Streams section before continuing with the Processing Options section.

The following table describes the feed simulation processing options. Note that when you specify default values for these settings, those values do not appear in the feed simulation source file.

Option Applies to Default Description
Log to Application Input All generation methods. Enabled (after an input stream and its schema are selected) If enabled, the generated feed simulation data is shown in the SB Test/Debug perspective's Application Input view. This helps you see the generated data and compare it with the dequeued results in the Application Output view.
Specify maximum tuples All generation methods. 0 (no set maximum) Specifies the maximum number of tuples to generate on the input stream. For example, to stop generating on the input stream after 20 tuples, enter 20 in the Specify maximum Tuples field. The default setting, zero, means there is no limit.
Specify maximum time All generation methods. 0 (no time limit) Specifies in seconds the longest time period to run this feed simulation. The default setting, zero, means there is no time limit.
Null string All generation methods. "null" Specifies the string used to designate null values when encountered in an incoming data stream (whether read from a data file, read from a JDBC database, or generated as part of customized simulation data). The default value is null, which means that by default, StreamBase sends a null value for that field when it encounters the lowercase letters n u l l in a field in an incoming data stream. You can change the null string value to blank to specify the empty string, or to any value you expect the incoming data to have. For example, a CSV file generated from a MySQL database table dump might contain \N to designate null fields. See Null Handling for further information.
Data rate All generation methods. 10 tuples per second (after an input stream and its schema are selected) Specifies the number of tuples per second to be sent to the specified input stream. You can use the up and down arrows to change the value, or click into the text box and enter an integer. A feed simulation with a specification of zero tuples per second will not run.
As fast as possible All generation methods. Disabled

Enables an alternative to specifying a Data Rate value. Enable this option to send the feed simulation data as fast as the host computer allows. The actual data rate at runtime is a factor of the machine's speed, and may be somewhat limited. The feed simulator uses a substantial amount of CPU resources when this option is enabled.

Use this setting only if any one of the following is true:

  • You are sending a small finite data set.

  • You are sending a large or infinite data set, and the Maximum Tuples and Maximum Time options are set to low values.

  • You are sending a large or infinite data set, you have disabled the Log to Application Input setting, and you have clicked the Disable dequeuing of application output icon () on the Application Output view.

Note

Remember that StreamBase Studio is not intended for benchmarking the full performance capabilities of StreamBase Server on production machines. This IDE-managed environment is not suitable for high-speed data rates and high CPU utilization.

Timestamp from column Data File and JDBC generation methods only. Disabled

Allows you to designate the field or column that contains relative timing values to be used to control the pace of sending tuples for the currently selected stream in the Simulation Streams section.

You can designate a column in an incoming JDBC database query, in which case the data type of the column must be double or timestamp. You can also designate a column in an incoming data file, in which case the data type of the column must be double or must contain a string representation of a timestamp.

When enabling this option, you must also specify whether to start counting time values from zero or from the value in the first value read. For most cases, select first value read.

With this option enabled, the feed simulation uses the relative times from the designated timestamp column to drive the timing of the feed simulation. See Using the Timestamp from Column Feature for more on this feature.

Include in synchronized timestamp group Data File and JDBC generation methods only. Disabled This check box is disabled unless (1) the currently selected stream in the Simulation Streams section uses a Data File or JDBC generation method, and (2) you used the Timestamp from column feature to specify a column in the input schema to use as a source of timestamp information.

This option designates the currently selected stream as a member of a group of streams in the current feed simulation file for which StreamBase attempts to coordinate delivery of tuples in timestamp order. See Using Synchronized Timestamp Groups for more on this feature.

Tuple buffer Data File and JDBC generation methods only. 40,000 tuples

Specifies the size in tuples of the buffer that holds tuples read from a data file or database query.

When used in conjunction with the two Prefill options, the specified size serves as a maximum upper limit of the buffer to be prefilled.

Prefill Tuple buffer before starting simulation Data File and JDBC generation methods only. Disabled Designates whether to fill the entire tuple buffer before sending the first tuple to StreamBase Server.
Prefill and loop on Tuple buffer Data File and JDBC generation methods only. Disabled

When enabled, specifies that this feed simulation is to run in a loop, starting over and resending the first tuple in the buffer after it reaches the last tuple in the buffer. Use this feature to continuously replay a known, repeatable data set, or to generate a longer data set from a smaller one. See Using the Prefill and Loop on Tuple Buffer Feature for more information.

When you select this check box, the Prefill Tuple buffer before starting simulation check box is automatically checked for you and dimmed.

Timestamp from Column Feature

The Timestamp from Column feature is discussed in Using the Timestamp from Column Feature.

Include in Synchronized Timestamp Group Feature

The Include in Synchronized Timestamp Group feature is discussed in Using Synchronized Timestamp Groups.

Using the Prefill and Loop on Tuple Buffer Feature

With the Prefill and loop on Tuple buffer feature enabled, the feed simulation reads from the specified data file or JDBC data source. The tuple buffer grows to fit the data source, until either all data source rows have been read, or the buffer reaches the limit specified in the Tuple buffer field, or the process runs out of memory (which results in an error). The feed simulation then sends tuples from the tuple buffer from beginning to end. When the last tuple is reached, the feed simulation immediately resends the first tuple in the buffer, then resends subsequent tuples according to the Data rate and other settings for this simulation.

If you also specify the Timestamp from column option, the buffer is sent according to the timing information in the timestamp column. When replaying the tuple buffer, the second and subsequent times through the buffer are also replayed with the same timing.

Tip

To minimize or eliminate the effects of disk access speed when running a feed simulation for benchmarking purposes, you can restrict the size of the buffer with the Tuple buffer option to exactly fit the number of tuples you expect to read from the data source, then use the Prefill and loop on Tuple buffer option.

As tuples are resent from the tuple buffer, no fields are modified in the buffer, including any timestamp or sequence number fields. Let's say the first tuple read into the buffer has sequence number 1, and the last tuple has sequence number 20,000. On starting over and rereading the buffer, sequence number 1 is resent to the running StreamBase application. If your application requires monotonically increasing sequence numbers or timestamp fields, then you must adjust the application logic to detect the restart of a looped-buffer feed simulation, to compensate for the restart of sequenced fields. For example, your application's input stream logic might add 20,000 to each sequence number after a loop restart is detected.

You can use this technique to generate a larger feed simulation data set from a smaller one. For example, let's say you captured a day's worth of trading data in real time in a CSV data file or JDBC database. You want to use this data set with its real-world timestamps to run a month's worth of trading data for testing or benchmarking purposes. To do so, use the Prefill and loop on Tuple buffer option, and have your application's input logic detect when the last tuple in the buffer is received. Then add one day to the received timestamp of the next set of tuples received.

Null Handling in CSV Files or Generated Data

StreamBase handles empty strings encountered in incoming data files, database queries, or feed simulations with custom generated data as follows:

  • If an empty string is encountered in an incoming field whose target schema field is a blob or a string, StreamBase sends an empty string value.

  • If an empty string is encountered in an incoming field whose target schema is other than blob or string, StreamBase sends a null value.

You can also set a specific string to be treated as an incoming null field value, as described in the Null string field in the preceding section.

Description for Each Stream

Use the Description for StreamName field at the bottom of the Editor view to document the currently selected stream. (Use the Description field at the top of the view to document the feed simulation as a whole.)

For example:

Saving Feed Simulations

On the upper tabs of the Feed Simulation Editor, StreamBase Studio displays an asterisk if the file has changed. You cannot toggle between a graphical or source presentation of the feed simulation until you save the file. All feed simulations have the .sbfs file extension, and are saved by default in the currently active Studio project.