StreamBase Data Types

This topic describes how data types are represented in StreamBase. For information on entering literals for each data type in expressions, see Specifying Literals in Expressions in the Expression Language topic.

blob Data Type

Blobs provide a way of representing binary data in a tuple. They are designed to efficiently process large data objects such as video frames or other multimedia data, although performance might diminish with larger sizes.

In expressions, blobs can be used in any StreamBase function that supports all data types, such as string() or firstval(). To use a blob in other functions, the blob must first be converted to a supported data type. For example, error() does not accept a blob, but you can call error(string(b)), where you first recast the blob as a string.

The StreamBase blob() function converts a string to a blob.

Important

See Using Large Fields if you need to define very large blob fields.

bool Data Type

A Boolean always evaluates to either true or false.

int Data Type

An int is always 4 bytes. The valid range of the int data type is -2147483648 to 2147483647, inclusive. Integers are always signed. Thus, the following expression is not valid, because the number 2147483648 is not a valid int:

2147483648 + 0

However, integer computations wrap around unchecked, so the following expression is valid and evaluates to -2147483648:

2147483647 + 1

The following expression is valid, because the period means that the number is treated as a floating-point value, which can have a much greater value:

2147483648. + 0.

double Data Type

A double is always 8 bytes. Any numeric literal in scientific notation is considered a double. That is, you do not need a decimal point for a numeric literal to be recognized as a double (instead of an int). For example, 1e1 is considered to be 10.0 (double) instead of 10 (integer).

long Data Type

A long is always 8 bytes. The range is -9,223,372,036,854,775,808 [-263] to +9,223,372,036,854,775,807 [263 -1]. You can use the long data type to contain integer numbers that are too large to fit in the four-byte int data type.

When specifying a long value in a StreamBase expression, append L to the number. Thus, 100L and 314159L are both long values. Without the L, StreamBase interprets the number as an int.

string Data Type

A text field whose size must be declared. The theoretical maximum for a string in a single-field tuple is around 2 gigabytes, but the practical limit is much smaller.

StreamBase provides support for large tuples, including large string fields. Be aware that moving huge amounts of data through any application negatively impacts its throughput.

Important

See Using Large Fields if you need to define very large string fields.

timestamp Data Type

Timestamp expressions are represented in the following date and time format:

YYYY-MM-DD hh:mm:ss[.sss][+TTTT]

The maximum precision is milliseconds.

tuple Data Type

The tuple data type is an ordered collection of fields, each of which has a name and a data type. The fields in the collection must be defined by a schema, and can be of any StreamBase data type, including other tuples. The size of a tuple depends on the aggregate size of its fields.

The tuple data type allows schemas to contain other schemas, nested to any depth.

Nested Schemas in EventFlow Applications

In EventFlow applications, a schema must be named to be used as a field in another schema. For example, the following image shows the schema for an input port named input1. It contains two fields, recd_time of type timestamp, and the named schema feed_format of type tuple:

The following image shows the feed_format field expanded to display its own fields:

Using the tuple Data Type in StreamSQL

In StreamSQL, tuple is not a keyword like string or blob. Instead, you use the tuple data type when you define a schema, either named or anonymous. The following example shows how the tuple data type is used in StreamSQL to form nested schemas:

CREATE SCHEMA NamedSchema1 (myInt1 int, myDouble1 double);         1
CREATE SCHEMA NamedSchema2 (myInt2 int, myTuple1 NamedSchema1);    2

CREATE INPUT STREAM in (
  myInt3 int,
  AnonTupleField1 (myDouble2 double, myString1 string (8)),        3
  AnonTupleField2 (myString2 string(32), mySubTuple NamedSchema1)  4
); 

These comments refer to lines with callout numbers in the example above:

1

Create a named schema containing an int field and a double field.

2

Create another named schema, this time containing an int field and a tuple field. The tuple field's schema consists of a reference to the first named schema. Therefore, its fields are MyInt1 and myDouble1. Notice that named schemas are defined independently of any other component.

3

The input stream's schema includes an int field followed by two tuple fields with anonymous schemas.

4

The second tuple field's schema includes a string followed by a nested tuple, which references one of the named schemas.

See the CREATE SCHEMA Statement reference page for more on the distinction between named and anonymous schemas.

Using the tuple Data Type in Expressions

For information about using tuples in expressions, see tuple in the Expression Language reference page.

Specifying Hierarchical Data in CSV Format

In contexts where a tuple value appears in textual string form, comma-separated value (CSV) format is used. Examples of such contexts include the contents of files read by the CSV Input Adapter and written by the CSV Output Adapter, and in the result of the Tuple.toString() Java API method.

For example, the string form of a tuple with three integer fields whose values are 1, 2, and 3 is the following:

1,2,3 

We will refer to the above as the string version of tuple A.

When a value like tuple A appears inside another tuple whose value is presented as a string, quotes must be used. For example, a second tuple, B, whose first field is a string and whose second field is tuple A, has a string form like the following:

Hello,"1,2,3" 

These quotes protect the comma-separated values inside the second field from being interpreted as individual values.

With deeper nesting, the quoting gets more complex. For example, suppose tuple B, the two-field tuple above, is itself the second field inside a third tuple, C, whose first field is a double. The CSV string form of tuple C is:

3.14159,"Hello,""1,2,3""" 

The above form shows doubled pairs of quotes around 1,2,3, which is necessary to ensure that the nested quotes are interpreted correctly. There is another set of quotes around the entire second field, which contains tuple B.

StreamBase's quoting rules follow standard CSV practices, as defined in RFC 4180, Common Format and MIME Type for Comma-Separated Values (CSV) Files (link to HTML version).

Using Large Fields

Before you can use tuples that exceed the operating system's page size (such as 4096 bytes on Windows or Linux, or 8192 bytes on Solaris), you must increase the StreamBase memory parameters defined in the StreamBase Server configuration file. The StreamBase page-size parameter must be a multiple of the system page size. You must allocate sufficient memory for the StreamBase Server process so that it can host your application and its largest tuples.

See Tune Memory Parameters for details.