Library Articles
Using the BLOB data type
Authors: Jason Garbis, StreamBase Systems
Date: 01-October-2007
Applicable To: StreamBase 5.0
The BLOB data type, introduced as a new native data type in StreamBase 5.0, allows developers to handle more complex data types than were supported in earlier product versions. While the BLOB data type can be used as a catchall for almost any large or complex set of data, the use cases that drove its design and implementation involve handling media data types (such as audio and video streams, and images) and large documents in unstructured or structured format (such as XML).
BLOB data is passed through a StreamBase application very efficiently – the data is stored in memory as it’s received, and passed by reference thereafter. This eliminates the need to perform additional memory allocation and deallocation operations. BLOB data is “opaque” to the StreamBase server in the sense that no built-in StreamBase functions operate on it directly. Instead, BLOB data is accessible from custom StreamBase function or operators which can perform any necessary application-specific processing.
These functions may perform some analysis of the BLOB data, for example calling an image processing library to detect text within an image. Or, these functions may extract a subset of a large document, for example calling an XPath operation on an XML document. Note that BLOBs are passed through as immutable fields, and may not be modified by functions or operators.
A Simple Example
Let's explore a very simple use of the BLOB data type. We construct a very simple application, which has a single input stream (generating a tuple containing a BLOB field), a single map operator, and an output stream.
Here's the input stream's schema:

and here's the simple application:

Notice that the blob datatype is shown with a size of -1; indicating that it's of variable length.
The Map operator is used to call a custom Java function that calculates the length of the Blob, adding the result as an integer field in the outbound tuple:

Notice that we're calling what appears to be a native function to calculate the length of the field. In actuality, this is a custom Java function; we're using the new function alias feature to eliminate the need to for the “calljava(...” syntax. This is briefly explained below, in the “Using the Custom Function Alias” section.
This simple example peeks into the contents of the BLOB from a custom function, which returns a single integer value representing the BLOB's length. BLOBs can also be accessed from within custom operators, which are much more powerful than custom functions: functions are limited to a single return value, while custom operators output results through Streams – and can control the number of output streams as well a their schema.
Note that both custom functions and custom operators have built-in Wizards included in Studio, which dramatically simplify their creation, by generating functioning starting point source code.
Running the simple application, we see that in Test/Debug mode, Studio offers us the change to enter sample BLOB data as a string. (For testing with binary data, we recommend using the Wizard to create a StreamBase client that can submit such data).
Here's the Manual Input pane, where we enter a simple string which is passed through as BLOB data.

Once the tuple is sent, we can see the results in the Application Output pane – our function has successfully determined that the string is 44 bytes long.

This example illustrates basic use of the BLOB data type, as well as basic examination of BLOB contents via a custom function.
While this example shows a very simple use of the BLOB, real-world applications that have more complex requirements will naturally exhibit more complex BLOB usage than shown here.
Another Example
A common application scenario for BLOBs is the processing of a stream of image files, for example frames from a video camera. Each frame is passed into the application as part of a StreamBase tuple. An operator in the application performs some initial image analysis, and adds its results to fields in the tuple. Downstream components in the application would then use this metadata to make a determination of whether and how to further process this tutple.
For instance, a function might extract the EXIF header from a BLOB in JPEG format, storing information of interest (such as the image resolution) in a tuple field. Consider the video stream example from above, where police cameras may be sending video frames from stoplights throughout a city. The application needs to process these frames in a series of stages. First, the application would perform a quick pre-analysis, to determine whether or not there was a readable license plate in the frame. This could be performed by the initial function, which would add an indicator of this to the other metadata fields in the tuple (such as location, time, and camera ID). The next operator in the application would then make a decision about the tuple by looking at the new field in the tuple. If there was no license plate detected, it would discard the image. If it did detect a license plate, it would then compare the plate to a database of those being actively sought – such as stolen cars. If a match was found, it would then initiate the appropriate followup response, such as alerting a human, or pushing this information into a downstream application for processing.
Using the Custom Function Alias
Newly introduced in StreamBase 5.0 is the concept of a custom function alias. This eliminates the need to use the awkward “calljava” method to invoke a custom Java function. We encountered this earlier in this article, where we saw the Map operator calling
calcBlobLength(BLOBData)
rather than
calljava('com.sb.demo.BlobAnalyzer', 'calculateBlobLength',BLOBData)
This more concise syntax is enabled by adding a configuration entry to the <custom-functions>
section of the sbd.sbconf file:
<custom-function language="java" class="com.sb.demo.BlobAnalyzer" name="calculateStringLength" alias="calcStringLength" args="auto"/>
This line maps the Java class to the more user-friendly alias used in the Map operator, and provides a single place in which to control the Java class used in operators.
For more information on this topic, please reference this article, Using Configuration to Add a Custom Function to a StreamBase Application.
For more information on the ByteArrayView data type, please reference the product documentation here.
Recommendations
While there is no architectural limit on the size of a StreamBase BLOB; there are practical limits based on the number of simultaneous tuples to be processed and the memory capacity of the system. Clearly, memory should be treated as a scarce resource, and for that reason StreamBase developers should design their applications to determine BLOBs of interest — discard others—as early as possible in the application flow.
Custom Function Java Source
public class BlobAnalyzer {
public static int calculateBlobLength(ByteArrayView theData)
{
return theData.length();
} // end function calculateBlobLength
} // end class BlobAnalyzer