High Performance Complex Event Processing

June 20, 2012

Complex Event Processing

Complex Event Processing (CEP):: consists of processing many events happening across all the layers of an organization, identifying the most meaningful events within the event cloud, analyzing their impact, and taking subsequent action in real time.¹

Example Applications:
- Airbag deployment
- Credit card fraud detection
- Buy/sell signals
- Risk management
Example Financial Events:
- New product added
- Risk limit breached
- Order filled
- Fed announcement
- Company news reported

Sample Decision Flow

High Performance Complex Event Processor

Why High Performance

Software cost
Hardware costs
Support costs
Opportunity costs

CEP = KDB+

Complex Event Processor

Where to Focus

Q is handicapped
- Every operation checks the type of each operand
- Arithmetic operations must check for null and +-infinity
- Types are converted at runtime
Q is optimized
- Vector operations use cpu specific instruction sets
- Single threaded so no mutex needed
- High level routines are highly optimized
Maximize the time your code spends executing algorithms in C instead of Q
Minimize the amount of memory allocated and avoid removing elements from data structures
Prefer high level data-structures and native operators to atoms and hand-rolled functions

Use Native Left Join

lj:{
 $[`s=-2!y;aj[!+!y;x;0!y]; <1>
   .Q.ft[{ <2>
     $[&/j:(#y:. y)>i?:(!+i:!y)#x; <3>
      .Q.fl[x]y i; <4>
      +.[+x;(f;j);:; <6>
       .+.Q.fl[((f:!+y)#x:.Q.ff[x]y)j] y i j:&j] <5>
       ]}[;y]]x]}

check for sorted attribute and call aj
.Q.ft unkeys tables applies function and rekeys
check for case where all keys are found
just fill data with .Q.fl
.Q.ff adds all new columns with null values
fills only those rows with new data

Lightweight CEP Event

No key
- Every q-sql update performed unkeys the table, updates the table, then rekeys the table.
Not sorted
- We can ask for what we want
No filling
- Nulls are awesome
- Wait until business decision is needed
- Null trade price
- Null bid size vs no bid size

Setup

q)n:10000
q)sym:([]`u#sym:neg[n]?`4)
q)quote:sym!([]bs:n?100;bp:.1*n?100;ap:.1*n?100;as:n?100)
q)trade:sym!([]ts:n?100;tp:.1*n?100)
q)quote
sym | bs bp  ap  as
----| -------------
jkom| 75 4.8 3.9 23
deco| 51 2.1 7.3 6 
knoe| 63 2.4 1.6 19
ppbc| 64 1.6 7.5 5 
..
q)r:rand key quote
q)t:enlist r

Left Join

q)t lj quote
sym  bs bp  ap  as
------------------
lgda 85 3.2 9.2 42
q)r lj quote
sym| `lgda
bs | 85
bp | 3.2
ap | 9.2
as | 42

q)\ts do[1000;t lj quote]
25 1232j

q)\ts do[1000;r lj quote]
23 1040j

Faster Left Join

k).q.flj:{x,'y (!+!y)#x} / fast left join
k).q.dlj:{x, y (!+!y)#x} / dict left join

q)\ts do[1000;t lj quote]
25 1232j
q)\ts do[1000;t flj quote]
9 704j

q)\ts do[1000;r lj quote]
23 1040j
q)\ts do[1000;r dlj quote]
9 624j

Native Left Join

q)\ts do[1000;t lj quote]
25 1232j
q)\ts do[1000;t flj quote]
9 704j
q)\ts do[1000;t ,\: quote]
7 704j

q)\ts do[1000;r lj quote]
23 1040j
q)\ts do[1000;r dlj quote]
9 624j
q)\ts do[1000;r , quote]
9 624j

Use Views

Derived data requires computations
Three choices:
1. Update computation on every tick - performance degrades exactly when we need it
2. Recompute every time we need the data - can result in computations even when nothing has changed
3. Recompute on a timer - can produce stale results

Views Are The Solution

Only performs computations when the data has changed AND we need the result of the computation
Lazy evaluation/caching
Not memoization

Example

q)micro::update mp:(as;bs) wavg (bp;ap) from quote
q)micro
sym | bs bp  ap  as mp       
----| -----------------------
jkom| 75 4.8 3.9 23 4.644604 
deco| 51 2.1 7.3 6  3.448148 
knoe| 63 2.4 1.6 19 2.31028  
ppbc| 64 1.6 7.5 5  3.030303 
ikel| 54 0.2 8.4 12 0.3344262
..

Quote update only updates quote table
Recalc only gets called if quote table has changed
Calculations are vectorized
Depends on the frequency of read vs write

Use Foreign Keys

CEP events inevitably grow to dozens of fields

The Problem:

Linear search is slow - unless add `u#
Fields at end take longest time to find
Dict keeps re-allocating memory

The Solution:

Create a view with foreign keys to all reference tables
Native join with view on CEP event
Use foreign key lookups (.) instead of lj
Create functions for calculations with many throw away steps

Example Use of Foreign Keys

q)fkey::1!update sym,q:`quote$sym,t:`trade$sym from sym
q)r:r,fkey
q)t:t,\:fkey
q)select sym,q.as,q.bs,t.ts from t
sym  as bs ts
-------------
life 85 80 62
q)select sym,q.as,q.bs,t.ts from r
sym| `life
as | 85
bs | 80
ts | 62
q)select sym,q.as,q.bs,t.ts from t
sym  as bs ts
-------------
life 85 80 62

Use Native Column Select

q)1!select sym,bp,ap from quote
sym | bp  ap 
----| -------
jkom| 4.8 3.9
deco| 2.1 7.3
knoe| 2.4 1.6
ppbc| 1.6 7.5
ikel| 0.2 8.4
..

Can not select with key intact
We must rekey

Native Column Select

q)`bp`ap#/:quote
sym | bp  ap 
----| -------
jkom| 4.8 3.9
deco| 2.1 7.3
knoe| 2.4 1.6
ppbc| 1.6 7.5
ikel| 0.2 8.4
..

q)\ts do[1000;1!select sym,bp,ap from quote]
6 1184j
q)\ts do[1000;`bp`ap#/:quote]
3 704j

Summary

Writing Q with the familiar code adds many behind the scenes operations:

upd:{[t]
 t:t lj 1!select sym,bs,bp,ap,as from quote;
 t:t lj 1!select sym,ts,tp from trade;
 t:update mp:(as;bs;ts) wavg (bp;ap;tp) from t;
 t}

Unkeying then rekeying
‘friendly’ filling²
Asking for the obvious (key)
O(n) field/column lookups

Putting It All Together

nupd:{[t]
 t:t,\:fkey;
 t:update mp:(q.as;q.bs;t.ts) wavg (q.bp;q.ap;t.tp) from t;
 t}
dupd:{[r]
 r,:fkey;
 r:update mp:(q.as;q.bs;t.ts) wavg (q.bp;q.ap;t.tp) from r;
 r}

q)\ts do[1000;upd t]
65 1792j
q)\ts do[1000;nupd t]
21 912j
q)\ts do[1000;dupd r]
17 624j

nil ↩
KDB+ 3.0 has redefined lj to remove filling: lj:{.Q.ft[,\\:[;y]][x]} ↩

Share on

Twitter Facebook LinkedIn

Nick Psaris