StreamBase Tuning Tips

Measure Current Performance to Establish Baseline for Comparisons

Measure the current performance of your StreamBase application before starting a tuning project. You can then use the current performance numbers as a baseline, and observe the positive or negative results of experimenting with different application designs and configuration setttings.

If you have the resources, create a final test or staging environment that contains the same hardware and software configuration you propose to use when the application is deployed into production.

In general, you may notice that applications typically have bottlenecks that slow the overall throughput. Bottlenecks can occur with:

  • Enqueuing (input) speed in adapters or clients

  • Dequeuing (output) speed

  • Operators waiting for matching tuples on queues (Merge, Join, Gather), which could benefit from the addition of Heartbeat operators

  • Possible misuse of Aggregate or Query operators

  • Java operators that are providing custom functionality, but are not performing well

To measure normal and peak performance of a StreamBase application, you can use several tools:

  • The StreamBase Feed Simulator provides a convenient way to send test data to your application during the development and testing phases. It is also a good tool during staging, where you can submit real world data through a feed simulation that uses CSV input files specified in an *.sbfs configuration file. For example, the Moving Average Convergence Divergence (MACD) sample installed with StreamBase contains the MACD.sbfs configuration file that references the test data file MACD.csv. Run the sbfeedsim command from the command line to submit test data to your application. Do not use the Feed Simulator view in StreamBase Studio for performance testing, as it incurs the overhead of the IDE itself, which may skew your results.

    For further information, see Using the Feed Simulation Editor and the sbfeedsim command reference. Feed simulation configurations provide many options, including the ability to throttle up the data rate to stress-test the application closer to expected peak data rates.

  • With your application running, you can use the StreamBase Monitor to see which operators are hot spots, meaning which ones receive the most tuples, and which threads are using the most CPU resources. Once hot spots are identified, you can distribute the processing of hot spot operators into separate threads.

    With the StreamBase Server running, and real world streaming test data running through the application, run StreamBase Monitor as described in Using the Performance Monitors.

    The following example shows a Monitor window for the running MACD sample:

    For more information, see:

Filter Early, Filter Wide

For traditional and real-time applications, an important guideline is to minimize the amount of data movement for a given task. This simple rule is critical when processing high-volume, real-time data streams.

Avoid defining multiple sets of Filter operators too far downstream in your application (where downstream means toward the right side of the EventFlow). Instead, spit the data as soon as possible after it enters an Input Stream, so that the downstream operators only process or move the data appropriate for that path through the application. A common design mistake is to add multiple Filter operators throughout the EventFlow, when a single Filter operator with multiple predicates (expressions that result in TRUE or FALSE) would be sufficient to do the work, and would not incur the overhead of excessive data movement.

Remember that each operator in a StreamBase application has to move data in, and move data out. Assume for the moment that the cost of performing these steps, per operator in your application, is 1. In reality, the cost is different for each operator, but let's start by adding up the costs of moving data through applications using varying designs.

Example 1: Simple View of Just Moving Tuples Through Three Operators

In the example above, the total cost to simply move tuples through the operators is 3.

In the next example, consider the cost when you add a Filter operator on the right side (downstream) of the EventFlow. In this example, assume that the Filter operator's predicates and the inbound data values resulted in 30% of the tuples being moved out port #1, and 70% out port #2, to the next set of downstream components:

Example 2: Filter Operator is Added, But Too Far Downstream for Efficient Processing

We can redefine this application so that the Filter operator does the split processing further upstream, closer to when the data first enters the system. Doing this avoids moving 100% of the data tuples through the operators that preceded the (further downstream) Filter operator shown above.

Example 3: Filter Operator Moved Upstream, Avoiding Expensive Data Movement by Downstream Operators

In the example above, we pushed our predicates in the Filter operator upstream (to the left in an EventFlow), to reduce the flow of tuples in the downstream operators. In addition, you can define your Filter operator so that the predicates cover the conditional cases you want, and also drop any non-matching tuples that do not return TRUE for any of the predicates. In doing so, you can design the application so that uninteresting data records do not clog your system, further reducing the cost of data processing and increasing the overall throughput of the application.

Filter operators are efficient. The point is that the more expensive operation is actually moving the data, not determining how to split the data based on the predicate expressions.

Define or Process Minimal Inbound Tuples

The previous section described increasing performance by performing data filtering as early as possible in the EventFlow. We can generalize this rule for more cases than filtering: optimize performance by reducing the number of tuples processed on each path, or the number of fields in each tuple, as early as possible. Narrowing down the tuples to the essential fields has the biggest benefit when you do this at the source, before the data is enqueued into StreamBase. So if possible for your application, determine which fields are essential for a tuple, and remove any unnecessary fields.

For data that must be entered into the StreamBase application, you can use the Filter and Map operator to define criteria that set when certain tuples can be dropped from the stream. However, these operators have caveats about their placement in an application, as noted in the previous section about Filters, and the next section about Maps.

Remember: developers who are new StreamBase tend to focus on the relative processing cost of individual operators or operator type. However, the number of tuples processed by the operator dominates the relative cost of the operator.

Consolidate Multiple Map Operators

As an application developer, you have an instinct to design or create modular code. In the case of StreamBase applications, this can sometimes result in using multiple sets of Map operators. You might think that (for example) Map1 should change the value of FieldA, while Map2 should exist to do something to the value of FieldB, and Map3 could drop FieldC and FieldD. However, unless Map2 is dependent on the result of Map1's processing, and unless Map3 is dependent on the results of Map1 and Map2's processing, you can consolidate these Map operators into a single Map, and use multiple expressions to perform the work. Using one Map operator is much more efficient in performing the work and not forcing the application to move data three times before it is then sent to the next downstream component.

Moving data has a cost, and so does evaluating expression and changing the schema. As noted earlier, filter streams as far to the left in the EventFlow as you can. And mutate with a Map operator as far to the right in the EventFlow as you can.

The Rule of Thumb summary: filter on the left, mutate on the right.

Set Query Table Cache Size If Using Disk-Based Option

If your application uses disk-based Query Tables (available only in StreamBase Enterprise Edition), your performance can improve if you increase the cache size internally allocated to disk-based query table operations. Consider the <server> section of the sbd.sbconf server configuration file:

<server>
    <param name="tcp-port" value="portno"/>
    <param name="datadir" value="dir"/>   
    <param name="disk-querytable-cache" value="number"/>
</server>

The disk-querytable-cache parameter is commented out by default. The value attribute sets the amount of main memory to be allocated to any disk-based query table operations. The value units are in MB of main memory and must be a power of 2. When unspecified, the default value is 1 MB. Use caution when setting this parameter, as too high a value can consume more memory than needed and could negatively impact other resources that require memory during the execution of the StreamBase application or other applications. The memory footprint of the sbd program is increased by the amount specified in this parameter. As with any resource setting, establish baseline performance metrics and then test the effect of increasing or decreasing values under normal and peak loads.

In Clients, Narrow Results with Filtered Subscriptions

You can use the StreamBase Java or C++ client API to write a consumer dequeue client that narrows the result set, by using a predicate on the subscription to an output stream. The predicate is applied to the data output from the stream before it is delivered to the client. By narrowing the result set that the server must provide to the client, you can increase the efficiency of your application. The predicate language follows the same rules and syntax as implemented in the Filter operator.

We have provided examples with the installed client sample:

streambase-install-dir/sample/client/FilteredSubscribe.cpp
      streambase-install-dir/sample/client/FilteredSubscribe.java

For details, see Narrowing Dequeue Results with Filtered Subscribe in the API Guide.

Add Custom Functions to Increase Performance

You may find that the processing of your application can be improved by introducing your own custom functions, instead of performing special calculations in client applications. Once implemented and configured, your custom functions can be used in the expressions of most operators in your application. To learn how to write your own functions by extending the StreamBase Java or C++ APIs, see these API Guide topics:

Consider Using Concurrency and Data Parallelism Features

The Properties view for most StreamBase components has a Concurrency tab that lets you specify running that component in a separate thread, with data parallelism options. For compute-intensive components that you know can be run without data dependencies on other components in the StreamBase application, you may be able to improve performance by enabling these options.

When a component runs in its own thread, the server processes the component's requests concurrently with other processing in the application. The processing of the threads may be distributed automatically across multiple processors on an SMP machine.

The concurrency features are not suitable for every application. For details, see Execution Order, Concurrency, and Parallelism, which includes important cautions on using these features.

Tune Memory Parameters

Before you can use tuples that exceed the operating system's page size, you must increase the StreamBase memory parameters defined in the StreamBase Server configuration file. The StreamBase page-size parameter must be a multiple of the system page size. You must allocate sufficient memory for the StreamBase Server process (sbd) so that it can host your application and its largest tuples. However, remember that the operating system is constrained by competing concurrent requests for RAM from other processes. Thus, you must estimate the total of all memory requests on the machine during peak loads. You can always help your application's throughput by limiting the size of input tuples to the bare minimum. Keeping the input tuples as small as possible is always more efficient than dropping tuples during the processing of the StreamBase application.

The maximum size of a tuple in your application cannot exceed the value set by the page-size parameter. The default StreamBase page sizes are:

  • 4096 bytes on Windows or Linux machines

  • 8192 bytes on SPARC

If you have larger tuples, increase the page-size parameter's value in the page-pool section of the StreamBase Server configuration file for your application. The StreamBase page size, if increased, must be a multiple of the system's page size. The default maximum size of a schema is 1048576 bytes (1 MB).

The initial memory allocation is determined by the JVM heap size for the hosted StreamBase application. If needed for StreamBase applications with large memory demands, you can set a higher initial allocation of heap by specifying -Xms (min) and -Xmx (max) JVM arguments. For your StreamBase Server instance, these values can be set in the java-vm section of sbd.sbconf configuration files. The default maximum for the JVM is 256 MB. For example:

<param name="jvm-args" value="-Xms64m -Xmx256m"/>

At runtime, this default may be insufficient for your StreamBase application. Experiment with the JVM arguments for memory use and increase the values as determined by your testing. For example:

<java-vm>
    <param name="java-home" value="/usr/lib/jvm/java" />
    <param name="jvm-args" value="-Xms512m -Xmx1024m" />
</java-vm>

For more information, see the <java-vm> section of StreamBase Server Configuration XML.

Also pay attention to the param name="max-client-pages" parameter in the <page-pool> section of the sbd server configuration. Its value sets the maximum number of pages that a dequeue client connection can allocate. The sbd process disconnects clients that try to allocate more. This parameter is designed to protect sbd from a slow or hung dequeue client. Therefore to understand the resulting number of bytes allowed, multiple the max-client-pages value by the page-size value. A default max-client-pages value of 2048 pages results in about 8 megabytes, for a page size of 4096 bytes. If you want to enable a dequeue client connection that would allocate unlimited memory use, change the max-client-pages value to 0.

Increase Computing Resources

If design and configuration changes do not result in sufficient performance, consider increasing your computing resources, such as the processor speed, the number of processors on an SMP machine, and the amount of memory. You can also investigate using a cluster of StreamBase Servers using high availability techniques. See Clustering and High Availability for more information.

Back to Top ^