Library Articles
Stream Processing Technology — Gearing Up for Low-Latency FX Trading
Author: Dr. Ugur Cetintemel, StreamBase Systems
07-September-2006
Introduction
Stream Processing: Combining Performance, Programmability and Persistence
Trends and Challenges in FX Trading
Example Streaming Applications
Addressing Risk Management and Compliance
Stream Processing and the Future of FX Trading
Stream processing has recently emerged as a response to sophisticated monitoring and analytical applications that require low-latency processing of high volume, real-time event streams, often integrated with historical data. A Stream Processing Engine (SPE) makes it possible to execute the same types of queries and computations against real-time streaming data that were previously possible only on stored data. The goal of this article is to review the latest innovations in stream processing and describe its applications in the foreign exchange (FX) trading market.
Real-time analytical applications on high-volume event streams have been traditionally built using languages such as C++ and Java. Custom coding using low-level tools means expensive, long development and maintenance cycles. Stream Processing Engines (SPEs) provide much higher ease of use, flexibility, and extensibility compared to custom solutions.
Stream Processing: Combining Performance, Programmability and Persistence
Unlike any existing systems software, an SPE offers a unique combination of real-time performance, ease of programming, integrated access to live and historical data, and flexible storage options. Real-Time Performance: SPEs perform inbound processing where incoming event streams are processed in memory before being (optionally) stored in a database. This is in sharp contrast to the traditional outbound processing model of database management systems, where the data needs to be persisted before any processing can take place. These two models are illustrated in Figure 1.
Figure 1: Outbound vs. Inbound Processing
Ease of Programming with StreamSQL: SPEs use StreamSQL, a high-level language that extends the familiar SQL standard to perform complex operations on continuous data streams. StreamSQL extensions include rich windowing constructs (e.g., to define moving averages over time) and operators to filter, merge, combine, correlate and run user-defined analytical functions over streams. These operators have built-in mechanisms to manage stream disorder and late or missing data. A key feature of StreamSQL is that it can be used to access and manipulate both real-time (streaming) and historical (stored) data in a uniform manner.
Flexible Storage Model: SPEs provide a spectrum of integrated storage options for all data volume ranges and latency requirements. An SPE can store and query gigabytes of data with near-zero latency, or slice and dice terabytes of historical data spanning a period of months or years, all in a fully integrated fashion (within the same process). Furthermore, SPEs can access data that reside at external databases through connectivity standards (e.g., JDBC). Figure 2 shows the available storage options and the corresponding access latency vs. data capacity characteristics.
Figure 2: Persistence Options in a Stream Processing Engine
FX transaction volumes are continually increasing, and while the opportunity depth is high due to extreme liquidity, all-day trading hours, a large and diverse set of participants and profit potential even in falling markets, the opportunity windows are getting narrower due to increased automation and emerging algorithmic trading tools. These characteristics necessitate customizable real-time solutions that can be adapted fast?this is a scenario where the stream processing technology shines.
On the sell side, it is critical for FX institutions to continually optimize the overall price delivery from price sourcing, setting, and publishing to trade processing. Price quality is the key differentiator and is a function of speed given the high market volatility and increasing choices available to the buyers. Latency is a key consideration in both data cleaning and price setting, two fundamental pricing-engine tasks, which are commonly done in the stream and are thus in the critical path of value publishing. Even with manual operations, sub-second latencies are highly desirable.
On the buy side, the trend towards integrated access to multiple sell-side institutions and liquidity portals (e.g., FxConnect, Hotspot) creates new opportunities for temporal arbitrage and cross-market trading. With programmatic trading, the latency requirements get drastically smaller; milliseconds make big difference for those trying to "steal a pip". In particular, FX-based hedge funds are aggressively leveraging the inefficiencies by arbitraging price differences from different liquidity providers.
Besides latency, customizability and agility are big concerns, as in exchange markets. Existing black-box platforms provide only a few "knobs" and thus do not have the flexibility needed to fully leverage the intellectual power to create custom "secret sauce". Roll-your-own solutions address this problem, but are too expensive to build and evolve. When new trading opportunities arise, it is important to quickly revise the processes to capitalize on them before they disappear. As with the latency challenge, SPEs can offer significant value here, as they provide easily customizable platforms that allow developers to focus on the semantics of what they want to do, rather than deal with the complex "plumbing" issues.
Data validation and pricing are two applications that can make immediate use of stream processing. Bad data, such as "backwards" prices (Ask < Bid) and partial data (Bid and no Ask), needs to be detected and removed or corrected. There are usually multiple data sources that are not as reliable as each other, so one of them needs to be selected. One important selection criterion is staleness: although it is rarely possible to assess the real age of the values in the absence of timestamps, it is possible to track arrival rates and look for shifts or spikes in arrival rates, which indicate potential throttling problems.
Outlier detection, e.g., excessive movements in the bid/ask from preceding bids/asks, is also common. Outliers are either ignored or marked to suggest they are of a lower trust level. Big changes in spread may also indicate unreliable data and can be detected using time-localized comparisons, e.g., changes from last tick, or comparisons with moving averages.
Once the data is sanitized, the prices can be calculated. This process is complicated by various date and holiday adjustments, especially when composite crossing currencies (those that are not directly traded) are involved. For example, a composite such as GPD/USD can be derived using GPD/EUR, EUR/JPY, and JPY/USD.
On the buy side, whether one is fishing or downhill skiing, it is often useful to produce a "view" of the raw input streams that hides the details and complexities from the trader. A "Super-Feed" application can aggregate prices from various liquidity platforms and present a single best number. This view can also derive positions for composite crossing pairs as described above.
Alerting-style applications are also common: temporal patterns in the prices and derivatives (options, futures, etc.) can be tracked in real time and the trader alerted when a pattern of interest is identified. For example, an application can track a Moving Average Converge/Divergence (MACD) indicator, where two exponential moving averages with different speeds (e.g., a 10 minute and a 1-minute moving average) are computed in real time and an alert produced when the averages are about to crossover. Alternatively, the system can be instructed to automatically take buy/sell actions if a cross price hits or exceeds a threshold. The threshold might be a fixed value, changeable dynamically, or derived via MACD or other user-specified analytic. One can also employ a dynamic stop/loss limit to track the price and buy or sell at a pre-specified min/max price.
In a similar manner, an application can track trading performance in real time by computing the profit/loss over a window of recent transactions and trigger alerts if the performance is below a threshold.
Finally, stream processing engines simplify the integration of historic data to calculate a derived currency value, along with other attributes such as variance, actual volatility, and others. One can annualize the data, deliver the results to analysts and traders, and use the results in a particular trading or pricing scenario.
Real-time risk management operations can also benefit from stream processing. Since FX traders need to reduce portfolio risk with lots of moving variables (e.g., large globally dispersed big market open 24 hrs/day), risk values need to be calculated continually at precise levels to guide future trading actions. Here, auto-hedging can be used in conjunction with auto-trading by continually monitoring a risk exposure indicator for various (possibly composite) currency positions and buy/sell when the risk exceeds a customizable threshold.
Some aspects of auto-compliance exhibit similar characteristics; for example, cross currency positions can be tracked in real time to make sure compliance thresholds are met (e.g., IAS 139 caps the ability to net positions).
Fueled by system and protocol standardization, there is a big push towards automating all aspects of FX trading. Transaction volumes and complexity will continually increase and windows will get narrower as markets start to consolidate and new participants (e.g., hedge funds and proprietary traders), some of whom already have significant algo-trading experience in equities, enter the market.
Stream processing will also enable applications to easily analyze historical data and consult historical trends within real time queries. StreamSQL makes it possible to instantly switch processing from live to historical data, thereby simplifying the execution of sophisticated "What-if" and "Why-it-happened" scenarios. Moreover, increased automation will enable more fundamental analysis and risk-taking models that automatically respond to events from various electronic information sources (e.g., a text analysis query that tracks data releases or breaking news to derive cross asset impacts). Packaged black-box approaches will not support such complex models, which are likely to be heavily guided by "secret sauce".
In the long run, the FX market will look more and more like the exchange market, where stream processing has already proven its value within very short time. Stream processing engines promise to be a key component in next generation FX platforms due to their unmatched performance, flexibility, and agility advantages.
Dr. Ugur Cetintemel received his doctorate in computer science from the University of Maryland, College Park in 2001. He is currently an assistant professor at the department of Computer Science at Brown University. His work focuses on the architecture and performance of advanced information systems and databases. Cetintemel has published numerous papers in leading databases and systems conferences, primarily in the areas of data stream processing, distributed data storage, and replication. He won the prestigious CAREER award from the National Science Foundation in 2004. Cetintemel serves as a Technical Advisory Board member for StreamBase Systems.