Developer Edition Note: The feature described in this topic is available only for the StreamBase Enterprise Edition.
This topic explains how to use the JDBC Table data construct and one or more Query operators in a StreamBase application diagram to access an external JDBC data source. To use a JDBC data source with a StreamSQL application, see the APPLY Statement in the StreamSQL Guide.
In a StreamBase EventFlow, a JDBC Table data construct points to the JDBC data source that you want to use. When connected to the JDBC Table, a Query operator can manipulate the JDBC data together with tuples from the application's input streams. JDBC Tables can be associated with multiple Query operators, but each Query operator can be associated only with one JDBC Table.
When you develop an application that uses a JDBC data source, StreamBase Studio actually connects to the data source to perform typechecking. Before performing the steps in this topic, you will need the following information from the database administrator:
-
The JDBC URI to connect to the data source server.
-
If required, a username and password to use when connecting to the data source server.
-
The class name of the driver.
-
The name and location of the JARs and libraries that the data source server uses.
Notes
Under some circumstances, the JVM can access the wrong data source if multiple drivers attempt to access the same data source. To avoid this potential problem, it is best for all applications that access a given data source to be in the same project, within a StreamBase workspace.
One difference between using JDBC data sources in EventFlows and StreamSQL diagrams is that the APPLY JDBC statement always passes through all of the input stream's fields. To suppress one or more fields in the output, you can add a SELECT statement after the APPLY JDBC statement. By contrast, in an EventFlow, the Query operator allows you to explicitly choose the input stream fields to include in the output.
To use a JDBC data source in an EventFlow:
-
In the StreamBase Studio Package Explorer, navigate to your application's project folder and open the project's server configuration file, traditionally named
sbd.sbconf. -
In your configuration file, declare the JDBC data source in the
data-sourcessection. For example:<data-sources> <data-source name="myDB" type="jdbc"> <uri value="jdbc:derby://localhost:1527/myDB;user=me;password=mine"/> <driver value="org.apache.derby.jdbc.ClientDriver"/> </data-source> </data-sources>The name you assign to your data source must follow the StreamBase identifier naming rules, as described in Identifiers.
-
Make the JAR files and any library files required by your JDBC data source driver (such as
.soorDLLfiles) visible in your project, as described in Managing Studio Resource Files. For example, import them into your project. -
Drag a JDBC Table icon from the Palette view to your canvas, creating a new data construct.
-
Open the Properties view of your JDBC Table operator. Name the component and identify the data source it will use, as described in Using the JDBC Table Data Construct.
-
Create the Query operator or operators that will be associated with the JDBC Table. That is, for each one, drag a Query Table icon from the Palette view to your EventFlow.
-
Connect the Query operator or operators to the JDBC Table. At this time, StreamBase Studio performs typechecking. If a typecheck error reports an unknown data-source, make sure the JDBC data source is running correctly configured in the
sbd.sbconfserver configuration file. -
For each Query operator, edit its properties as described in Using the Query Operator.
When you complete your application and are ready to deploy it outside of
StreamBase Studio, you can export all the files you have developed in
StreamBase Studio to your deployment environment, using the Eclipse
Export wizard. The sbd.sbconf file that you use in your
runtime environment must then be configured to access the JDBC data sources there, as
follows (refer to StreamBase Server
Configuration XML for full details):
-
Each JDBC data source used in the application must be declared in the
data-sourcessection. -
The JAR files and library files required by the JDBC driver (which were loaded by your project during development) must be added within the
java-vmsection so that they are loaded at run time:-
Declare each JAR file in a
jarelement. For example:<jar file="saxon.jar">
-
Declare each library file in a
libraryelement.
Note
During development, you could enable StreamBase Studio to resolve the locations of resources in various ways. For example, the resources in your exported
sbd.sbconfmay be specified as just file names with no directory paths. By contrast, for deployment (that is, outside of StreamBase Studio) you must explicitly include the path information for all resources in your configuration file. -
Different JDBC drivers allocate different sizes for column types. Very large column
types (such as the text column for the Microsoft JDBC driver) can cause typecheck
errors. For this reason, the StreamBase Server sbd.sbconf configuration file sets a maximum column size for JDBC
tables. By default, this parameter is set to 2048.
If your JDBC database uses wide column types, there are several options to avoid typecheck problems in StreamBase:
-
Limit column sizes in the SQL statements of individual APPLY JDBC operations (in StreamSQL applications) or Query operators (in EventFlow applications). As long as you specify a column size of 2048 or less, output tuple sizes will be acceptable. The SQL syntax depends on your database; here are three different examples of setting a column size to 1000:
select cast(large_col as varchar(1000)) from tableselect substr(large_col,1,1000) from tableselect convert(large_col,varchar(1000)) from tableNote
If the actual data in your columns exceeds the limit you set, the data will be truncated.
-
Set the StreamBase Server's maximum column size to a higher value: Open the StreamBase Server
sbd.sbconfconfiguration file, uncomment thejdbc-max-column-sizeparam, and set the value. For example:<data-sources> <param name="jdbc-max-column-size" value="4096"/>
This setting applies to all JDBC tables connected to the StreamBase Server. Note: The
jdbc-max-column-sizecan affect the size of tuples sent to the output stream. Because the entire tuple's schema must be able to fit into the page size, consider also increasing the page-pool/page-size parameter, as documented in StreamBase Server Configuration XML.
If neither of these options is acceptable, consider changing your JDBC table schema to avoid wide column types.
Recall that StreamBase Studio communicates with the data source server
in order to typecheck an application that has a JDBC data source connection. If the
data source server does not respond during a preset jdbc-timeout interval, the application fails to
typecheck.
The default timeout value of 15 seconds is adequate for normal local area network
connections; consider increasing the jdbc-timeout interval if you experience typecheck
failures due to a slow network connection:
-
Open your project's
sbd.sbconffile in a text editor. -
Uncomment the
param name="jdbc-timeout"element underdata-sources. -
Change the default value. For example:
<data-sources> <param name="jdbc-timeout" value="25000"/>
You may need to repeat this process, trying different jdbc-timeout values, until your application typechecks normally.
It is possible for deadlocks to occur if you perform multiple query operations against the same JDBC data source, and when the output of one query operation feeds the input of another. By default, when multiple EventFlow Query operators or StreamSQL APPLY statements connect to the same JDBC Table, all the query operations run in the same thread, and each query operation holds a connection to the external data source. Therefore, a deadlock can occur if the following conditions are all true:
-
The application contains a path running on a single thread with a Query operator connected to a JDBC database.
-
The query returns more than one row from one or more tables.
-
On the same path and in the same thread, a downstream Query operator attempts to update one or more of the selected JDBC tables.
Note
When we refer to a JDBC Query operator, we mean a Query operator that is connected to a JDBC Table data construct, which in turn is configured to connect to an external database. In a text-based StreamSQL application, we mean cases where you are using APPLY JDBC statements that operate on a selected result set, following by an attempted update of the selected tables.
For example, consider the following EventFlow diagram, where the first Query operator is used to read data from a large result set in a JDBC table. The second Query operator consumes those output tuples and then attempts to update the same JDBC table rows:
In this scenario, if the first query runs to completion first (all rows in the result set are fetched), the second query can run without a problem.
Now, consider what happens if the Select operator passes only a partial result set in its first output tuple to the Update operator. Because the select operation is not finished, the JDBC server forces the Update operator to wait. The Update operator cannot write to the data source while the JDBC server is waiting. Meanwhile, the Select cannot finish fetching rows because the thread shared by the two operators is waiting for the update operation.
To avoid this issue, you can take any one of these steps:
-
Redesign your application to eliminate multiple query operations on the same data source. This might involve writing more complex SQL update operations that select as well as update data. For example, combine the operations into a single Query operator, and within the updating expression use UPDATE ... WHERE.
-
Write a Query that will return a result set (per SELECT) that is less than the configured
jdbc-fetch-size. Most JDBC drivers have a default fetch size that is larger than 1, but the actual value varies in different JDBC drivers. Consider explicitly setting thejdbc-fetch-sizeparameter in the data-sources section of the StreamBase Server configuration file,sbd.sbconf. This parameter attempts to specify the size of the buffer used by the JDBC driver when fetching rows during query execution. Decide on the minimum number of tuples that you can reasonably fetch from the database per query, and then make sure thatjdbc-fetch-sizeis set to a larger value. -
Change the fetch size used by your JDBC server to match or exceed the maximum result set expected in select and update operations, as described in Limiting Buffer Size for Fetched JDBC Table Rows. For more information about
jdbc-fetch-size, please refer to the StreamBase Server Configuration XML reference topic, which is part of the StreamBase Reference Guide. -
Process one or more of the query operations in parallel mode instead of the default serial mode, if feasible for your application.
-
In an EventFlow application, set the option on one or more of the Query operators to Run this component in a separate thread, as described in Using the Query Operator.
-
In a StreamSQL application, add the PARALLEL keyword to the APPLY JDBC, as described in the APPLY Statement topic.
-
-
Another workaround is to use the (separately installed) external StreamBase Adapter for JDBC, instead of the JDBC Table data construct within an application. If you are interested in this option, please contact your StreamBase Systems representative for information about obtaining the external StreamBase Adapter for JDBC.
Multiple operators in multiple threads can conflict, but they will usually succeed eventually, because StreamBase retries the DML statement that fails. In general, one way to work around deadlocks is to put operations in separate threads.
Queries to some external JDBC data sources can produce very large result sets. If a
SELECT statement returns too much data, memory can be exhausted. If the problem
cannot be addressed in the JDBC data source itself, consider setting the jdbc-fetch-size parameter in the data-sources section of the StreamBase Server
configuration file.
This parameter attempts to limit the size of the buffer used by the JDBC driver when fetching rows during query execution. The use and effects of this parameter vary in different JDBC drivers. Please refer to the StreamBase Server Configuration XML reference topic for details.
If you know that the database may return an exception due to transaction deadlocks in the database, and the expected behavior of your JDBC server is to roll back the operation or transaction and retry, consider enabling your StreamBase application to detect such exceptions for a select, insert, or update query.
To use the retry feature, edit your application's sbd.sbconf configuration file and add a param name = "jdbc-retry-sqlstate" entry in the data-source section of your JDBC data source (as described
in the StreamBase Server
Configuration XML reference topic). For example, the following code checks for
deadlocks in a MYSQL database:
<data-source name="mysqldb" type="jdbc"> <uri value="jdbc:derby://localhost:1527/mysqldb;user=jeanne;password=darc"/> <driver value="org.apache.derby.jdbc.ClientDriver"/> <param name="jdbc-retry-sqlstate" value="40001"/> </data-source>
If the database driver returns an sqlstate code of 40001, the StreamBase application retries the query.
