Developers: Using the StreamBase XML Normalizer

Home
Documentation
Library
Sample Code and Applications
FAQs
Articles
Community
Training
Download Center
Contact DevZone

Printer Friendly

Library Articles

Using the StreamBase XML Normalizer

Authors: Robert Hoffman, John Smart
StreamBase Systems
Updated: 27 Aug 2008

Applicable To: StreamBase 3.7, 5.0, 5.1, 6.0, 6.1

Topics:

Introduction

This feature is an add-on to a full StreamBase kit. Here are the download pages:

  • StreamBase XML Normalizer download kits.
  • Full StreamBase Developer Edition kits.
  • Or request an evaluation copy of StreamBase Enterprise Edition.
The StreamBase XML Normalizer is a custom Java operator that takes a well-formed XML string and creates one or more output tuples from the parsed XML. The Java operator acts as a front-end to the SAX parsing engine, which decomposes the XML string. Once converted, the tuples can be processed by any StreamBase operator. Like other custom StreamBase Java operators, the XML Normalizer extends com.streambase.sb.operator.

 

Note that installation and configuration instructions differ slightly across StreamBase Studio releases. Specific differences are noted in this article.

Also note that currently, the XML normalizer does not support working with the new blob datatype in StreamBase 5.0 or later.

Installed XML Normalizer Files

File Locations

You can install the StreamBase XML Normalizer in any directory. Remember the location, because you will need to import its JAR (and optionally, other sample files) into a StreamBase Studio project.

Defaults

On Windows: C:\Program Files\StreamBase Systems\StreamBase XML Normalizer\

On UNIX: The files are installed in a "streambase-xml-normalizer" subdirectory of your current working directory

Contents of the Installed Directory

Runtime files:

  • XMLNormalizer.jar — includes the Java Operator that you will import into a StreamBase Studio project.
  • XMLSimple.sbapp — A sample StreamSQL Event Flow application diagram, to demonstrate the operator's features.
  • sbd.sbconf - A StreamBase Server configuration file, if you have Enterprise Edition and start the sbd server from the command line.

Source files:

  • XMLNormalizerOperator.java - StreamBase Java Operator
  • XMLNormalizerOperatorBeanInfo.java - StreamBase properties bean info
  • XMLNormalizer.java - class which interfaces with SAX parsing
  • build.xml - ant build file

Import the JAR into Your StreamBase Studio Project

For StreamBase Studio 3.5 and 3.7:

Follow these steps:

  1. In StreamBase Studio, define or open a project. For example, enter "File > New > Project..."
  2. In the Projects view, right-click the project name and select Import...
  3. On the Import dialog, select the "User-defined Operators and Adapters (JAR)" option
  4. On the Select resources dialog, click Add... and browse to the installed directory for the XML Normalizer. Select the XMLNormalizer.jar file
  5. Click Finish.

For StreamBase Studio 5.0 and later: 

Follow these steps:

  1. In StreamBase Studio, define or open a project if you have not already done so.

    For example, click File > New > StreamBase Project. In the wizard, you do not need to the options to create application or configuration files, or to add the StreamBase ClientAPI to the build path. Just enter a project name and click Finish to create the project.

  2. In the Package Explorer, right-click the project and choose Build Path > Add External Archives.
  3. In the Jar selection dialog, navigate to the XML Normalizer's installed directory and select the XMLNormalizer.jar file.
  4. Click Finish.

 

Import Sample File (Recommended)

To make it easier for you to see the results of StreamBase XML Normalizer parsing XML strings, we recommend that you also import a sample StreamSQL Event Flow application diagram: XMLNormalizer.sbapp.

For StreamBase Studio 3.5 and 3.7:

  1. In StreamBase Studio, select File > Import > StreamBase Resources on File System. If not your current default, browse to the installed XML Normalizer directory.
  2. Select only the XMLNormalizer.sbapp file and import it into the same project that contains the JAR.

For StreamBase Studio 5.0 and later:

  1. In the Package Explorer, right-click the project and choose Import. In the Import wizard, expand General, choose File System, and click Next.
  2. In the Import wizard's File From directory field, enter or browse to the XML Normalizer's installed directory. The directory is added to the Import Wizard's left selection pane below the From directory field.
  3. In the right selection pane, select the XMLSimple.sbapp file. (The file is selected when its checkmark is displayed in the list.)

Click the Java1 operator, which is our implementation of the XMLNormalizer Java operator class. StreamBase Studio displays its Properties view.

 

Open the Sample EventFlow: XMLSimple.sbapp

In the Projects view, under Application Diagrams, double click the XMLSimple.sbapp sample file, to open it on the Palette.

Click the Java1 operator, which is our implementation of the XMLNormalizer Java operator class. StreamBase Studio displays its Properties view:



Subsequent sections of this topic describe the XMLNormalizer Java operator's UI parameters, and the parsing rules.

Parsing Supported by the XML Normalizer Java Operator

This Java Operator accepts exactly one input stream and generates one output stream.

The Properties view for the Java operator in StreamBase Studio allows you to set the parameters which control how the XML normalization is done. Except where otherwise noted, the properties are required.

To help clarify the descriptions in subsequent sections, refer back to this XML string:

<transactions>
        <trade>
                <symbol market="Nasdaq">MSFT</symbol>
            <price>
                    <value>29.33</value>
                    <currency>USD</currency>
            </price>
            <volume>100</volume>
        </trade>
        <trade>
                <symbol market="NYSE">IBM</symbol>
                <price>
                    <value>93.51</value>
                    <currency>USD</currency>
            </price>
        </trade>
</transactions>

Input field

The name of a String field in the input stream which contains the XML string to be parsed.

XML element being parsed

The node under which the fields of interest are found, sometimes called the parent element. This is not the same as the XML root node. In the above example, "trade" is the XML element being parsed and "transactions" is the root element. There may be many instances of the parent element, but only one root.

Note: SAX parsing supports only a single root element.

List of elements to be returned

Enter XPath-compliant specifications for choosing data from the XML string. When there is no match in the XML for the specified element, the resultant field in the output tuple is null. If a tuple would consist of all null fields, the tuple is not generated.

This operator supports a subset of XPath-compliant parsing strings for normalizing the input XML string.

The entry can be in one of the following forms:

  • simple node, e.g. 'symbol' or 'volume'. 'symbol' returns a value of 'MSFT' in the first tuple and 'IBM' in the second. 'volume' returns a value of '100' in the first tuple and null in the second.
  • hierarchical node, e.g. 'price/value'. 'price/value' returns '29.33' in the first tuple and '93.51' in the second.
  • attribute node, e.g. 'symbol@market'. 'symbol@market' returns 'Nasdaq' in the first tuple and 'NYSE' in the second.
  • attribute predicate, e.g. 'symbol[@market="NYSE"]'. Chooses the node symbol which has an attribute of 'market' with a value of 'NYSE'. The first tuple returns null (because the attribute value does not match 'NYSE') and the second tuple returns IBM (the contents of symbol where the attribute 'market' does match 'NYSE').

Note:The parsing supports only double quotes in the XML string and element list.

List of output fields for parsed elements

The names of the fields in the output tuples. There is a strict one-to-one correspondence between the list of elements and the output fields. The field names must be StreamBase compliant field names and may not contain any duplicates.

The field names do not have to be the same as the node names. In fact, several XPath separators are not legal StreamBase field name characters, such as "/", "@", and "[".

Size of tuple output fields

All the output fields (specified in the above list) are Strings, and all are the same length. Specify a size which is sufficient for any anticipated data from the original XML string.

If the return for a field is greater than the size of the field, then null is returned.

Pass input fields

If set to true, all input fields, except the field containing the original XML string, will be copied to each output tuple. If set to false, no input fields will be copied. The default setting is true. If you desire only a subset of fields to be passed, insert a Map operator before the XML Normalizer to strip the unwanted fields.

This feature could be used to mark the output tuples with an input identifier to associate the output tuples with the original XML string.

Field for per tuple error message (Optional)

If this field is filled in with the name of an output field, then each output tuple will contain this output field. The field name must be a legal StreamBase field name and not be a duplicate of any other output field.

Errors which are specific to an tuple will be placed in this field. For example, if the data is larger than the field size, the error field will contain that message. If the input XML string is not well formed within a parent element, that will be indicated in the error field.

Some errors will terminate SAX parsing. If the errors are not associated with an output tuple, an extra tuple will be sent with the error field filled in and the output fields for the parsed elements set to null. If the "Pass input field" option had been selected, the input tuple fields are copied to this special error output tuple; this will allow the user to associate the global error with the correct input tuple.

How to Run XMLSimple.sbapp in StreamBase Studio

This section assumes that you have already imported the XMLNormalizer JAR and sample files, as described in previous sections.

  1. If you have not already done so, in the Projects View, open the Application Diagrams folder, right-click on XMLSimple.sbapp, and select Open. The Java1 operators that you see on the application diagram are XMLNormalizerOperator.class operators.
  2. Click the Java1 operator on the canvas. On the Parameters tab of its Properties View, notice the various parameters. In particular, see that the parent element is trade and the returned elements are symbol, price, and volume.
  3. In the Projects View, right-click on XMLSimple.sbapp, and select Run Application (F11).
    StreamBase Studio switches to the Test/Debug perspective.
  4. Click on the Manual Input tab.
  5. Type the following string, which is shown here on two lines only to fit on this page; use a continuous string when you enter it:
    <transactions><trade><symbol>MSFT</symbol><price>29.33</price>
    <volume>100</volume></trade></transactions>


    For example:


    [+] Click to enlarge

    Click on Send Data. On the Application Output View, notice that on OutputStream1, Instrument (from symbol) is MSFT, SalePrice (from price) is 29.33, and Quantity (from volume) is 100. For example:



    [+] Click to enlarge

    The Java operator has transformed the XML string into tuple fields.
  6. Click Run > Stop Application (F9)
    Note:While using the sample in the Test/Debug perspective, if you switch from Lower Case to Upper Case on the operator's Parameters tab, you must Save the application file and Restart the server, in order to see the updated functionality of the operator.

Using the StreamBase Feed Simulation Tool with XML Attributes

The StreamBase Feed Simulation (sbfeedsim) tool may be used to send data to a StreamBase application. If the XML data contains attributes, there are two salient points to note:

  • The Java Operator supports quoted strings using only double quotes. This applies to both the list of elements and the actual XML input string.
  • Because sbfeedsim by default treats double quotes as the quote character, you have to instruct sbfeedsim to use an alternate quote character, such as the single quote character. In your Feed Simulation Configuration XML file (typically with the file extension ".sbfs"), you must indicate the alternate quote character using the "trace-file-quote" attribute on the "stream" element. For example:
    <stream name="InputStream1" trace-file="XMLAttributes.csv" trace-file-quote="'">

    For details on this configuration file, see "Using the Feed Simulation Editor," a topic in the StreamBase Test/Debug Guide.

Related Topics

Back to Top ^