Regular Expression Socket Reader Input Adapter

Introduction

This input adapter allows StreamBase applications to read custom-formatted text from a TCP socket, parsed with regular expressions. It closely resembles the Regular Expression File Reader adapter.

Unlike the Regular Expression File Reader adapter, though, this socket adapter reads input data from a TCP socket connected to a specified external address. Also unlike the file reader, the input source of this adapter is indefinite and naturally timed, so repetition and timing are not specified as properties.

Properties

Property Description
Host Name (string) The host or IP address to connect to.
Port (int) The TCP port to connect to.
Format (string) The regular expression used the parse the input file. This must be a Java regular expression as expected by the java.util.regex.Pattern class. For example, ([^,]*),([^,]*) could be used to parse simple, two-field CSV input.
DropMismatches (optional boolean) If true, records that do not match the Format regular expression are ignored and the next record immediately examined. Otherwise, a tuple with all fields set to null is emitted when a non-matching input line is encountered. This property defaults to true.
TimestampFormat (string) The format used to parse timestamp fields extracted from the input file. This should be in the form expected by the SimpleDateFormat class. For more information, see the com.streambase.sb.adapter.Adapter class description in the StreamBase Java API documentation.
Schema (schema) The schema to output.

Typechecking and Error Handling

Typechecking will fail if the Format property contains an invalid regular expression, if the number of fields in the output schema does not match the number of capture groups in the Format property, or if the Timestamp Format is malformed.

Malformed records (lines that do no match the Format regular expression) will cause the adapter to either ignore the input line or to emit a tuple with all fields set to null, depending on the value of the DropMismatches property.

If a field extracted from the file cannot be coerced into the type specified for that field in the schema (for example, if "abc" is extracted where a int field is expected), that field is set to null in the output tuple. Likewise, if a capture group in the Format expression fails to match, but the overall regular expression does match, the corresponding field in the output tuple will be set to null.

Suspend/Resume Behavior

On suspend, this adapter closes its input socket.

On resumption, it reconnects its socket and continues reading tuples from it.

This adapter does not leave its socket open during suspend because the input source is naturally timed, so the input source itself cannot be paused. Leaving the socket open could lead to buffering problems, ultimately causing the socket to close with an error.