Regular Expression Socket Reader Input Adapter

Introduction

This input adapter allows StreamBase applications to read custom-formatted text from a TCP socket, parsed with regular expressions. It closely resembles the Regular Expression File Reader adapter.

Unlike the Regular Expression File Reader adapter, though, this socket adapter reads input data from a TCP socket connected to a specified external address. Also unlike the file reader, the input source of this adapter is indefinite and naturally timed, so repetition and timing are not specified as properties.

Properties

Property StreamSQL Property Default Description
Host Name HostName none A string specifying the host or IP address to connect to.
Port Port none An integer specifying the TCP port to connect to.
Format Format none A string specifying the regular expression used to parse the input file. This must be a Java regular expression as expected by the java.util.regex.Pattern class. For example, ([^,]*),([^,]*) could be used to parse a simple, two-field CSV file.
Drop Mismatches DropMismatches checked (true) If this check box is selected, records that do not match the regular expression in the Format field are ignored and the next record is immediately examined. Otherwise, a tuple with all fields set to null is emitted when a non-matching input line is encountered.
Timestamp Format TimestampFormat MM/dd/yyyy hh:mm:ss aa Specifies the format used to parse timestamp fields extracted from the input file. Specify a string in the form expected by the java.text.SimpleDateFormat class described in the Sun Java Platform SE reference documentation.
Log Level LogLevel INFO Controls the level of verbosity the adapter uses to issue informational traces to the console. This setting is independent of the containing application's overall log level. Available values, in increasing order of verbosity, are: OFF, ERROR, WARN, INFO, DEBUG, TRACE, and ALL.

Use the Edit Schemas tab to specify the schema to output from the adapter.

Typechecking and Error Handling

Typechecking fails if the Format property contains an invalid regular expression, if the number of fields in the output schema does not match the number of capture groups in the Format property, or if the Timestamp Format is malformed.

Malformed records (lines that do no match the Format regular expression) cause the adapter to either ignore the input line or to emit a tuple with all fields set to null, depending on the value of the Drop Mismatches property.

If a field extracted from the file cannot be coerced into the type specified for that field in the schema (for example, if "abc" is extracted where a int field is expected), that field is set to null in the output tuple. Likewise, if a capture group in the Format expression fails to match, but the overall regular expression does match, the corresponding field in the output tuple is set to null.

Suspend/Resume Behavior

On suspend, this adapter closes its input socket.

On resumption, it reconnects its socket and continues reading tuples from it.

This adapter does not leave its socket open during suspend because the input source is naturally timed, so the input source itself cannot be paused. Leaving the socket open could lead to buffering problems, ultimately causing the socket to close with an error.