Pipelining Vector-Based Statistical Functions: Q&A with McObject CEO and Co-Founder Steve Graves

mcobject-interview blog 2

Q&A with McObject CEO and Co-Founder Steve Graves

Founded by embedded database and real-time systems experts, McObject provides the eXtremeDB Financial Edition database system to build low latency, high scalability and reliability into real-time financial systems while harnessing the growing volume of data in capital markets. McObject counts among its customers industry leaders such TradeStation Technologies, NSE.IT, SunGard Kingstar, Transaction Network Services, Dalian Commodity Exchange, Financial Technologies of India Ltd. (FTIL), BAE Systems, Northrop Grumman, Siemens, Philips, EADS, F5 Networks, Motorola and Boeing. Based in Federal Way, WA, McObject is committed to providing innovative technology and first-rate services to customers and partners.

McObject CEO & Co-Founder,  Steve Graves, has answered some questions on programming technique of pipelining vector-based statistical functions. Please find the interview below.

Q: McObject has been talking about something called “pipelining vector-based statistical functions” to accelerate automated trading and other capital markets software. What is pipelining – a new kind of trading algorithm?

A: It is a programming technique that we exploit for accomplishing the tick data management that underlies algorithms (software programs) for trading. Capital markets produce huge amounts of data. Think of capturing every change in value for all the data points (BID, ASK, VOL, etc.) associated with a security, going back years, and you get an idea of the tick data volumes that exist. Traders race to roll out algorithms that analyze this data in new ways – often by identifying a trading opportunity or risk before other market participants – in order to gain a competitive edge.

The portions of an algorithm that retrieve, sort, and perform statistical operations on tick data are its data management functions. Rather than “re-invent the wheel” by creating these functions from scratch, engineers often turn to a software product called a database management system (DBMS) to provide them. McObject’s eXtremeDB Financial Edition DBMS provides a library of over 100 functions for working with tick data. They’re termed “vector-based” because they work efficiently with large sequences, or vectors, of data. “Pipelining” is a technique that forms these vector-based statistical functions into assembly lines of processing, so that the output of one function becomes the input of the next.

Q: Isn’t that how any software program works – Function A works on the data, then Function B works on the output of Function A, and so on until the program ends?

That is true when applications are working with one, or a few, elements of data.  And in that context it is usually the case that Function A calls Function B, passing the data through the call stack.  Pipelining is different in both ways.  First, as mentioned previously, vectors of tick data can extremely be extremely large (millions of elements, and more).  And, with pipelining, Function A does not call Function B.  Rather, the functions are executed in serial by the calling program: Function A, then Function B and so on.  Again, it helps to think of this as a conveyor belt on an assembly line. Data elements are what is carried on the assembly line.  The functions are stations on the assembly line that perform some operation/transformation on the data as it passes through that function.

Q:Why is this an advantage?

A: To understand the advantage, you have to understand the alternative.  Without the conveyor belt (pipelining), each station (function) would have to retrieve a part from a shelf, work on it, and put the part back on the shelf.  The next station (function) would take the same part off the shelf, work on it, and put the part back on the shelf, and so on for each station.  In programming terms, without pipelining, each function would have to retrieve the entire vector of data from storage, perform its operation on it and store the interim result back to temporary storage to be retrieved by the next function, and so on for each function.

Pipelining optimizes the location where the handoff from function to function takes place. Without pipelining, the temporary storage is usually main memory, though if the vector is really large it could be persistent storage.  In this case, memory or persistent storage are the “shelf.”  With pipelining, the vector is chunked into manageable pieces we call ‘tiles’ which are kept in the CPU L1/L2 cache.  In this case, tiles in the cache are the equivalent of parts on the conveyor belt.  Where tick data exists within a trading system’s hardware and software greatly affects the speed with which it can be processed. The worst possible place is in traditional storage media (hard disk or solid state drive). Crunching data that is stored there requires that it be fetched with a file I/O from disk, transferred into main memory and then into the DBMS’s cache, and finally transferred into the CPU cache of the chip that actually does the number crunching. In contrast, CPU cache is the best possible place to have needed tick data when the system needs to process it. When tick data is in CPU cache, it is literally at the doorstep of the system’s CPU cores.

Q: Once the tick data is in CPU cache, does pipelining continue to confer any advantage?

A: Yes – pipelining keeps tick data in CPU cache during the often multi-step process of “transforming” it into a needed result. The typical algorithm for trading or risk management uses more than one statistical function. In the absence of pipelining, tick data is fetched into CPU cache to be processed by function A, then the results are transferred into a temporary table residing in main memory; next, these results are fetched back into CPU cache, where function B does its work, and the new intermediate results go back into main memory, until they’re needed by function C, etc. These “round trips” between CPU cache and main memory impose latency, but are eliminated with the pipelining approach, which keeps even the intermediate results of processing in CPU cache until it is fully transformed. If you’re interested in a visual depiction of this, we’ve prepared a video using a real-life example of how pipelining is used in calculating the crossover points for 5-day and 21-day moving average closing prices.

 Q: In-memory database systems (IMDSs) have received a lot of attention recently. Is this faster?

A: Yes. McObject knows something about this, because the IMDS edition of eXtremeDB, introduced in 2001, was a pioneering in-memory database system, and is considered one of the fastest available. IMDSs introduced a streamlined design that stores all data in main memory, eliminating both the need to fetch data from disk or DBMS cache, and the latency entailed in managing that DBMS cache. This delivers performance an order of magnitude faster than traditional “on-disk” DBMSs.

But the capital markets sector demands a faster approach. Frankly speaking, the performance requirements of corporate applications – even real-time business analytics – cannot compare to demands of algorithmic trading, in which profitability can hinge on cutting milliseconds or even microseconds of latency. Pipelining vector-based statistical functions moves the locus of performance optimization away from main memory and onto the CPU itself. Think of this as on-chip database processing.

Q: How significant are the advantages of keeping more relevant data within the CPU cache, and avoiding latency-inducing “round trips” between CPU cache and main memory, when working with market data?

A: The Securities Technology Analysis Center’s STAC-M3 benchmark suite serves as the gold standard by which DBMSs can demonstrate tick data management performance. The suite’s tests are designed by trading companies to reflect real-world trading demands, and STAC audits vendors’ STAC-M3 implementations and results. McObject’s recent STAC-M3 using pipelining set new speed records for 9 of the STAC-M3’s 17 tests, and completed all 17 tests in 62% of the time of the previous record. Not only that, but these results were delivered using the SQL database programming language. All previous STAC-M3 implementations relied on low-level programming languages – generally considered faster – to interact with the DBMS. (McObject’s goal in using SQL was to show that with pipelining, traders can leverage widely-used and highly productive SQL while remaining competitive in terms of performance.)

McObject could provide additional evidence of pipelining’s speed advantages, from our internal benchmarking. However, the audited, published STAC-M3 results stand as the most impartial testament to these benefits.


Q: What else can a DBMS product such as eXtremeDB Financial Edition do to enhance trading algorithm performance?

A: In optimizing the location of relevant tick data – keeping it near the CPU cores – pipelining works hand-in-glove with eXtremeDB Financial Edition’s column-oriented data handling capability. Database systems organize data into tables consisting of rows and columns; traditional DBMSs are row-oriented, meaning that they fetch data into CPU cache for processing in a row-wise fashion. This approach fits poorly with tick data; vectors are naturally columnar. For example, all BIDs for a security over time typically occupy a single column in a table. Column-oriented handling enables the DBMS to bring just the column of BIDs into CPU cache, instead of entire rows in which BID may be the only item of interest. In this way, column-oriented data handling maximizes the proportion of relevant data in a given fetch, and avoids flooding the CPU cache with irrelevant data.

We’ve already touched on the question of in-memory vs. on-disk storage. Tick data has to be somewhere when it’s not in CPU cache, and main memory is the next best location. A DBMS should provide the capability to act as an IMDS, so that in-memory storage is an option. With eXtremeDB Financial Edition, we offer both in-memory and on-disk (that is, traditional disk + DBMS cache) storage. System designers can specify main memory storage for streaming, real-time tick data, while keeping less frequently accessed historical data on persistent media (disk, SSD or memory-tier flash storage).

Distributed query processing is another must-have DBMS capability when your goal is to minimize latency in tick data management. With this feature, a database is partitioned into multiple chunks or “shards”, with a database server associated with each shard.  Query processing is distributed across the database servers, leveraging multiple compute nodes, CPUs and/or CPU cores. Performance is accelerated via parallel execution of database operations and by harnessing the capabilities of many CPUs/CPU cores, on one or more compute nodes. In the STAC-M3 benchmark with eXtremeDB Financial Edition mentioned above, the database was portioned into 72 shards on a storage array, and distributed query processing was used with a single 24-core compute node. In another STAC-M3 implementation, the database was partitioned into 64 shards and distributed query processing was used with four 24-core compute nodes. As you see, shards, compute nodes, and CPU cores can be combined and calibrated for maximum performance and minimum latency.

In the big picture, trading organizations need to get the DBMS that minimizes latency while also delivering the flexibility and reliability that are also needed for data management. Your DBMS may be fast enough, but does it support SQL and Python, the use of which can shorten time-to-deployment while enlarging the pool of available developer talent? Does it enforce ACID (Atomic, Consistent, Isolated and Durable) transactions, which protect data integrity? It may seem strange for a vendor focused on high performance to say “performance isn’t everything” – but if you can get the necessary speed along with these other needed features and characteristics, you are that much ahead of other market participants who may not have them in their chosen DBMS.

For more information on McObject, visit http://financial.mcobject.com.


McObject is a sponsor of the upcoming conference & exhibition, Trading Show Chicago 2015. Trading Show Chicago will be taking place June 3 & 4, 2015 at Navy Pier in Chicago, IL. It  is the premier strategic conference for  big data in finance, quant, automated trading, exchange technology, and derivatives. For more information and to register for your free expo pass, please click here.

Representatives from McObject will be at booth #206 in the Exhibition Hall at Trading Show Chicago.

Leave a Reply

Your email address will not be published. Required fields are marked *