Q for Quants

November 10, 2017

Is Q for Quants?

A Quant vs. Quant trading
Pricing vs. Backtesting
Compiled vs. Interpreted
Specialist vs. Generalist

Where is Q?

KDB+ is fast and flexible time series database/CEP engine

Order book recordings
Order/Execution recordings
Surveillance engines
Hedge fund historical databases
Bitcoin Mercantile Exchange

History of Q

APL -> A -> A+

life←{                                   ⍝ John Conway's "Game of Life".
    ↑1 ⍵∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂⍵  ⍝ Expression for next generation.
}

J -> K -> K-SQL -> Q (KDB+)

life:{3=a-x*4=a:2 sum/2(1 0 -1 rotate'\:)/x}

kdb+tic
kdb+taq
FD vs KX
FD + KX

Why is Q popular?

KDB+ is one of the fastest (if not the fastest) columnar database.
Computations are vectorized - thus reducing the interpreter overhead.
The functional paradigm of separating data and algorithms allows parallel execution.
There are distinct types for dates, times and datetimes, GUID
Databases are just directories, tables are sub-directories, columns are files.
Even integer and temporal types have null and infinite values.

知之为知之，不知为不知，是知也 – 孔子

To know what you know and what you do not know, that is true knowledge – Confucius

To know that we know what we know, and to know that we do not know what we do not know, that is true knowledge – Nicolaus Copernicus

My Experience with Q

Java -> Q backtest -> Q trading (2006-2009)
Q automated market making engine (2009-2012)
Q Tips (2012-2015)
Q CRB platform (2015-present)

Learning Outcomes

Students will be able to load CSV/fixed-width datasets into Q
Students will be able to list/differentiate each Q datatype
Students will know the four Q attributes and explain when to use each
Students will be able to perform simple q-SQL queries with aggregation
Students will be able to save datasets to disk

In summary, given a large dataset, students will be able to load, join, analyze and report their results.

I’ve found the best way to learn a new computer language is to learn by example. Please download the code and follow along.

Sample Financial Data

CME Datamine provides samples of their datasets

Click on “Access CME DataMine”
Click on “Find Time & Sales”
Click on “Sample Files”

Installing Q

Running Q

After downloading q and following the installation instructions:

mac

export QHOME=~/Downloads/q
cd  ~/Downloads/q4q-master
rlwrap ~/Downloads/q/m32/q bbo.q

rlwrap should be installed to enable command recall

windows

setx QHOME "C:\q"
cd %USERPROFILE%\AppData\Local\Temp\q4q-master\q4q-master
C:\q\w32\q bbo.q

downloading and unzipping on windows seems to repeat the last directory

What is Q?

Q reads right to left

q)prd 0N!1+x:til 10
0 1 2 3 4 5 6 7 8 9
3628800

variables can be assigned inline
0N! can be used to print intermediate values without side effects

Q computes on atoms and vectors

q)y*y:last x
81
q)x*x
0 1 4 9 16 25 36 49 64 81

Q is overloaded on type

q)5?10f  / generate 5 random floats between (0,10f)
7.291734 8.583736 4.149568 6.787018 9.346245
q)5?10 / generate 5 random long integers between  [0,10)
7 5 1 4 1

and sign

q)5?10     / with replacement
3 9 8 9 9
q)-5?10    / without replacement
7 6 3 4 8
q)0N?10
6 1 5 8 9 3 2 0 4 7

Q is vector functional

q)fact:{prd 1+til x} / x,y,z are automatically defined
q)fact 10
3628800

Q is concise

\begin{equation} \sigma = \sqrt{ \frac{1}{n} \sum \textstyle\frac{1}{2}\displaystyle \left( \log \frac{H_i}{L_i} \right)^2 - (2\log 2-1) \left( \log \frac{C_i}{O_i} \right)^2 } \end{equation}
```
/ garman klass volatility (% is division - not mod)
gk:{[o;h;l;c]sqrt avg (.5*x*x:log h%l)-(-1f+2f*log 2f)*x*x:log c%o}
```

Q is picky about white space and semicolons

/ pivot table
pivot:{[t]
 u:`\$string asc distinct last f:flip key t;
 pf:{x#(`\$string y)!z};
 p:?[t;();g!g:-1_ k;(pf;`u;last k:key f;last key flip value t)];
 p}

Q has tables as primary data structures ³

19.10.167:6000>tqmergeT[2017.07.31;`GEH0;-0Wn;0Wn]
date       sym  seq  t                     bsiz bid    siz prc    aggr ask    asiz 
-----------------------------------------------------------------------------------
07.31 GEH0 31   -0D07:10:30.515273472 79   9799.5                 9800   34   
07.31 GEH0 32   -0D07:01:32.062721024 37   9799.5                 9800   34   
07.31 GEH0 36   -0D06:59:59.882012928 37   9799.5                 9800.5 45   
07.31 GEH0 43   -0D06:59:59.834411776 313  9799.5                 9800.5 45   
07.31 GEH0 45   -0D06:59:59.834379776 313  9799.5                 9800   47   
07.31 GEH0 47   -0D06:59:59.836360406             2   9800   B                
07.31 GEH0 49   -0D06:59:59.817472512 313  9799.5                 9800   45   
07.31 GEH0 50   -0D06:59:59.788884224 314  9799.5                 9800   45   
07.31 GEH0 51   -0D06:59:59.785928192 319  9799.5                 9800   45   
07.31 GEH0 52   -0D06:59:59.784531200 287  9799.5                 9800   45   
07.31 GEH0 54   -0D06:59:59.783757568 240  9799.5                 9800   45   
07.31 GEH0 59   -0D06:59:59.756612864 240  9799.5                 9800   46   
..

Downloading Time and Sales Data

Loading ts.q automatically attempts to download time and sales data (stored in CSV files) for each of the available futures datasets:

$ q
KDB+ 3.5 2017.03.15 Copyright (C) 1993-2017 Kx Systems
m32/ 4()core 8192MB nick nicks-macbookpro.local 25.34.72.89 NONEXPIRE

q)\l ts.q
"http://www.cmegroup.com/market-data/datamine-historical-data/files/2012-11-05-corn-f..
"http://www.cmegroup.com/market-data/datamine-historical-data/files/2012-11-05-crude-..
"http://www.cmegroup.com/market-data/datamine-historical-data/files/2012-11-05-e-mini..
"http://www.cmegroup.com/market-data/datamine-historical-data/files/2012-11-05-euro-f..
"http://www.cmegroup.com/market-data/datamine-historical-data/files/2012-11-05-eurodo..
"http://www.cmegroup.com/market-data/datamine-historical-data/files/2012-11-05-gold-f..

Dictionaries

ts.q has also defined a dictionary mapping a symbol to the longer CSV file name strings.

q)d
corn      | "2012-11-05-corn-futures.csv"
crude     | "2012-11-05-crude-oil-futures.csv"
emini     | "2012-11-05-e-mini-s-p-futures.csv"
eurusd    | "2012-11-05-euro-fx-futures.csv"
eurodollar| "2012-11-05-eurodollar-futures.csv"
gold      | "2012-11-05-gold-futures.csv"

Q dictionaries are associative lists that are declared with the ! operator, and indexed with brackets or juxtaposition.

q)1 2 3!"abc"
1| a
2| b
3| c
q)d[`corn]
"2012-11-05-corn-futures.csv"
q)d`corn
"2012-11-05-corn-futures.csv"

Loading CSV files

q)read0 `\$d`corn
"T.Date,T.Time,Sequence,Session Ind,Symbol,C/P/F,Contract Delivery,Volume,Strike Price,T.Price,A/..
"20121105,16:00:12,3,E,C,F,1212,0,.0000000,738.6000000,,I,,,,,,,,,,20121104,CBT"
"20121105,16:02:45,14,E,C,F,1212,0,.0000000,739.0000000,,I,,,,,,,,,,20121104,CBT"
"20121105,16:04:17,19,E,C,F,1212,0,.0000000,738.6000000,,I,,,,,,,,,,20121104,CBT"
"20121105,16:04:25,20,E,C,F,1212,0,.0000000,739.0000000,,I,,,,,,,,,,20121104,CBT"
"20121105,16:06:02,29,E,C,F,1212,0,.0000000,738.6000000,,I,,,,,,,,,,20121104,CBT"
"20121105,16:06:54,30,E,C,F,1212,0,.0000000,736.4000000,,I,,,,,,,,,,20121104,CBT"
..

The read0 operator loads the data from the file as a list of strings
The \$ operator casts (or converts) between types. In this case, from a string to a symbol

q)d`corn
"2012-11-05-corn-futures.csv"
q)`\$d`corn
`2012-11-05-corn-futures.csv

q)t:(" VI   MI FCC         D ";1#",") 0: read0 `\$d`emini
q)t:(" VI   MI FCC         D ";1#",") 0: `\$d`emini
q)t
T.Time   Sequence Contract Delivery Volume T.Price A/B IND Entry Date
---------------------------------------------------------------------
16:02:57 11       2012.12           0      1405.75     I   2012.11.04
16:04:23 12       2012.12           0      1405.5      I   2012.11.04
16:22:24 29       2012.12           0      1406.75     I   2012.11.04
16:22:41 30       2012.12           0      1406.5      I   2012.11.04
16:22:50 35       2012.12           0      1405.75     I   2012.11.04
..

The 0: operator is the dyadic version of the monadic read0
The 0: operator can be supplied with a list of strings or the file itself
It allows us to supply the types: " VI MI FCC D "
and the delimiter: "," (1#"," treats the first row as column headers)
Ignored columns are indicated by the space characters

Types

num	char	size	literal	null	inf	name	num	char	size	literal	null	inf	name
1	b	1	0b			boolean	11	s	.	` \| `		symbol
2	g	16		0Ng		guid	12	p	8	dateDtimespan	0Np	0Wp	timestamp
3						unused	13	m	4	2000.01m	0Nm	0Wm	month
4	x	1	0x0			byte	14	d	4	2000.01.01	0Nd	0Wd	date
5	h	2	0h	0Nh		short	15	z	8	dateTtime	0Nz	0Wz	datetime
6	i	4	0i	0Ni	0Wi	int	16	n	8	00:00:00.000000000	0Nn	0Wn	timespan
7	j	8	0	0N	0W	long	17	u	4	00:00	0Nu	0Wu	minute
8	e	4	0e	0Ne	0we	real	18	v	4	00:00:00	0Nv	0Wv	second
9	f	8	0.0/0f	0n	0w	float	19	t	4	00:00:00.000	0Nt	0Wt	time
10	c	1	” “	” “		char	20						enum

Renaming Columns

q)t:`time`seq`expiry`qty`px`side`ind`date xcol t
q)t
time     seq expiry  qty px      side ind date
----------------------------------------------------
16:02:57 11  2012.12 0   1405.75      I   2012.11.04
16:04:23 12  2012.12 0   1405.5       I   2012.11.04
16:22:24 29  2012.12 0   1406.75      I   2012.11.04
16:22:41 30  2012.12 0   1406.5       I   2012.11.04
16:22:50 35  2012.12 0   1405.75      I   2012.11.04
..

The original data had column names with spaces
To use the data with q-SQL, we need to fix this
The xcol operator accepts a symbol list as the new column names

Using q-SQL

q)trade:select `p#expiry,seq,time+date,tp:px,ts:qty from t where null side, null ind
q)trade
expiry  seq  time                          tp      ts
-----------------------------------------------------
2012.12 1977 2012.11.04D17:00:00.000000000 1403    1
2012.12 1978 2012.11.04D17:00:00.000000000 1403    1
2012.12 1979 2012.11.04D17:00:00.000000000 1403    2
2012.12 1982 2012.11.04D17:00:00.000000000 1402.75 2
2012.12 1983 2012.11.04D17:00:00.000000000 1402.5  1
..

q-SQL allows database queries within your code
Many operations, such as time+date, leave the column title unchanged
Optionally, we may rename the column, tp:px
The where clause applies the filters from left to right
The `p# operator is applying the parted attribute to the seq column

Attributes

By default, searches on lists (and therefore dictionaries and table columns) use a linear search across all elements. Attributes can be applied to lists to speed up these searches.

There are four attributes: `s`u`p`g

sorted: asserts the data is sorted and permits q to perform a binary search: 1 1 2 3 3 3 3 4 5 (typically used on the time column of an RDB)
unique: creates an internal hashmap from each value to its array index: 1 10 9 12 -2 (typically used on dictionaries keys or the first column of a 1-column keyed table)
parted: creates an internal hashmap of each value to its first array index and element count: 5 5 5 1 1 2 2 4 4 4 (typically used on the sym column of an HDB)
grouped: creates an internal hashmap of each value to each of its array indices: 1 2 1 10 2 -1 2 1 (typically used on the sym column of an RDB)

Determining the key of a dataset is of vital importance. This column (or columns) should be the first few columns of the table. In this case, the expiry column uniquely identifies each future contract. Analysis of the data revealed that it was not stored in chronological order across all contracts - but only within each contract. Since each contract was stored in consecutive rows of the table, the `p attribute was the correct choice.

Downloading Best Bid and Offer Data

Loading bbo.q automatically downloads (and uncompresses) the BBO data (stored in fixed-width text files).

q)\l bbo.q
"http://www.cmegroup.com/market-data/datamine-historical-data/files/XCBT_C_FUT_110110.zip"
"http://www.cmegroup.com/market-data/datamine-historical-data/files/XNYM_CL_FUT_110110.zip"
"http://www.cmegroup.com/market-data/datamine-historical-data/files/XCME_ES_FUT_110110.zip"
"http://www.cmegroup.com/market-data/datamine-historical-data/files/XCME_EC_FUT_110110.zip"
"http://www.cmegroup.com/market-data/datamine-historical-data/files/XCME_ED_FUT_110110.zip"
"http://www.cmegroup.com/market-data/datamine-historical-data/files/XNYM_GC_FUT_110110.zip"
q)4#read0 `XNYM_GC_FUT_110110.TXT
"2011011013030900000010EGC F110100000       100136851 I          110109"
"2011011017001000000020EGC F110100002       100136971A M        M110109"
"2011011017001000000020EGC F110100001       100136911B M        M110109"
"2011011017001500000030EGC F110100002       100136981A M        M110109"

Configuration Files

Instead of polluting source code with column names, types and descriptions, this information has been stored in bbo.csv which we can easily load.

q)m:("HSHHJC*";1#",") 0: `:bbo.csv
q)m
id name    start end len typ description
-------------------------------------------------------------------------------------------------------
1  date    1     8   8   D   "YYYYMMDD- Day the trade or quote was entered"
2  time    9     14  6   V   "HHMMSS- Time the trade or quote was entered in the system"
3  seq     15    22  8   I   "######## - sequence the quote or trade was entered into the system"
4  session 23    23  1   C   "(R/E) Indicates the Regular (PIT) or Electronic (GLOBEX) trading session"
5  sym     24    26  3   S   "The product code"
..

type (a q keyword) is often replaced with typ, tipe, [a-z]type
sym is the most common name for a security identifiers and has special treatment in Q
session is a single text character, and should therefore be stored as a "C", and not an "S"

Ignoring Columns

We can make loading the file faster by ignore unused columns

This reduces the memory consumed and increases performance by avoiding parsing those columns

q)m:update typ:" " from m where not name in `expiry`seq`time`edate`side`px`pxdl`qty`ind`mq
q)select from m where not null typ
id name   start end len typ description
------------------------------------------------------------------------------------------------
time   9     14  6   V   "HHMMSS- Time the trade or quote was entered in the system"
seq    15    22  8   I   "######## - sequence the quote or trade was entered into the system"
expiry 28    31  4   M   "(YYMM) Indicates the month the contract expires"
qty    32    36  5   I   "Number of contracts available for trade or traded"
px     45    51  7   I   "Indicates actual price traded"
pxdl   52    52  1   H   "Decimal place indicator for traded price"
side   53    53  1   C   "Indicates for Bids (B) / Offers (A)"
ind    54    54  1   C   "Indicative Market Quotes ( I )"
mq     55    55  1   C   "Indicator for Market Quotes ( M )"
edate  65    70  6   D   "YYMMDD - Entry date of trade"

A common data transfer technique is to pass floats as integers with another integer indicating the number of decimals

Parsing Fixed-Width Files

The 0: operator can be passed a list of types and lengths (instead of the field delimiter)

q)m[`typ`len]
  V I       M I     I H C C C                 D
8 6 8 1 3 1 4 5 7 1 7 1 1 1 1 1 2 1 1 1 1 1 1 6

Fixed-with files are returned as a list of lists (because there are no column headers)

q)m[`typ`len] 0: `\$d`emini
13:02:40   16:18:13   16:18:14   16:29:28   16:30:35   16:30:58   16:31:01   16:31:38   16:31:41 ..
10         20         30         40         50         60         70         80         90       ..
2011.03    2011.03    2011.03    2011.03    2011.03    2011.03    2011.03    2011.03    2011.03  ..
0          0          0          0          0          0          0          0          0        ..
126750     126800     126750     126775     126825     126775     126825     126775     126725   ..
2          2          2          2          2          2          2          2          2        ..
                                 B                     B                     B                   ..
I          I          I          I          I          I          I          I          I        ..
                                                                                                 ..
2011.01.09 2011.01.09 2011.01.09 2011.01.09 2011.01.09 2011.01.09 2011.01.09 2011.01.09 2011.01.0..

Flipped Dictionaries of Lists

The exec operator can be used (instead of select) to return a list of values

q)exec name from m where not null typ
`time`seq`expiry`qty`px`pxdl`side`ind`mq`edate

Tables are flipped dictionaries of lists

q)t:flip (exec name from m where not null typ)!m[`typ`len] 0: `\$d`emini
q)t
time     seq expiry  qty px     pxdl side ind mq edate
-----------------------------------------------------------
13:02:40 10  2011.03 0   126750 2         I      2011.01.09
16:18:13 20  2011.03 0   126800 2         I      2011.01.09
16:18:14 30  2011.03 0   126750 2         I      2011.01.09
16:29:28 40  2011.03 0   126775 2    B    I      2011.01.09
16:30:35 50  2011.03 0   126825 2         I      2011.01.09
16:30:58 60  2011.03 0   126775 2    B    I      2011.01.09
16:31:01 70  2011.03 0   126825 2         I      2011.01.09
..

Updating Columns

The update operator adds new columns or modifies columns in-place
The xexp operator is a two-parameter variant of exp which allows the specification of the base
Multiplication (by .01 xexp pdxl) is faster than division (by 10 xexp pdxl)

q)t:update time+edate,px*.01 xexp pxdl from t
q)t
time                          seq expiry  qty px      pxdl side ind mq edate
---------------------------------------------------------------------------------
2011.01.09D13:02:40.000000000 10  2011.03 0   12.675  2         I      2011.01.09
2011.01.09D16:18:13.000000000 20  2011.03 0   12.68   2         I      2011.01.09
2011.01.09D16:18:14.000000000 30  2011.03 0   12.675  2         I      2011.01.09
2011.01.09D16:29:28.000000000 40  2011.03 0   12.6775 2    B    I      2011.01.09
2011.01.09D16:30:35.000000000 50  2011.03 0   12.6825 2         I      2011.01.09
..

Building a Trade Table

The side column distinguishes between Bid ("B"), Ask ("A") and Trades (" ")
The ind column distinguishes between Indicative ("I") and Firm (" ") prices
A trade table can be built from the non-indicative trade records

q)trade:select `p#expiry,seq,time,tp:px,ts:qty from t where null side, null ind
q)trade
expiry  seq time                          tp      ts
-----------------------------------------------------
2011.03 230 2011.01.09D17:00:00.000000000 12.6625 752
2011.03 250 2011.01.09D17:00:00.000000000 12.6625 6
2011.03 270 2011.01.09D17:00:00.000000000 12.665  5
2011.03 290 2011.01.09D17:00:00.000000000 12.665  6
2011.03 310 2011.01.09D17:00:00.000000000 12.665  1
..

Building a Rack

The mq column indicates a valid market quote
The time column only has resolution down to the second
The seq column was provided to pair associated Bid and Ask prices
Not every Bid record has an associated Ask record and vice versa
We first create a ‘rack’ which includes every distinct expiry and seq along with the associated time.

q)quote:select distinct expiry,seq,time from t where not null mq, not null side
q)quote
expiry  seq time
-----------------------------------------
2011.03 240 2011.01.09D17:00:00.000000000
2011.03 260 2011.01.09D17:00:00.000000000
2011.03 280 2011.01.09D17:00:00.000000000
2011.03 300 2011.01.09D17:00:00.000000000
2011.03 320 2011.01.09D17:00:00.000000000
..

Keyed Tables

When given a numeric left operand, the ! operator creates a keyed table with that many columns
As hinted by the use of the ! operator, a keyed table is actually a dictionary

q)2!select `p#expiry,seq,bs:qty,bp:px from t where side="B"
expiry  seq| bs  bp
-----------| -----------
2011.03 40 | 0   12.6775
2011.03 60 | 0   12.6775
2011.03 80 | 0   12.6775
2011.03 100| 0   12.6775
2011.03 240| 100 12.6625
..

Left Joins

The lj (left-join) operator joins two tables based on the key of the right operand
Any attribute on the first keyed column is used to improve performance

q)quote:quote lj 2!select `p#expiry,seq,bs:qty,bp:px from t where side="B"
q)quote:quote lj 2!select `p#expiry,seq,ap:px,as:qty from t where side="A"
q)quote
expiry  seq time                          bs  bp      ap     as
---------------------------------------------------------------
2011.03 240 2011.01.09D17:00:00.000000000 100 12.6625 12.665 73
2011.03 260 2011.01.09D17:00:00.000000000 94  12.6625 12.665 73
2011.03 280 2011.01.09D17:00:00.000000000 94  12.6625 12.665 68
2011.03 300 2011.01.09D17:00:00.000000000 94  12.6625 12.665 62
2011.03 320 2011.01.09D17:00:00.000000000 94  12.6625 12.665 61
..

Asof Join

The aj operator joins the most recent record of the right operand
The last specified join key is treated as the ‘time’ column and must be in ascending order (within the other specified join keys)
This is usually the time column, but is seq in this dataset
On large tables, the lack of an attribute on the first join key will result in extremely slow joins
aj is a 3 parameter function. For clarity, i have created a projection by supplying the first two parameters, then using that as a monadic function on the final table

q)taq:aj[`expiry`seq;trade] select `p#expiry,seq,bs,bp,ap,as from quote
q)taq
expiry  seq time                          tp      ts  bs  bp      ap     as
---------------------------------------------------------------------------
2011.03 230 2011.01.09D17:00:00.000000000 12.6625 752
2011.03 250 2011.01.09D17:00:00.000000000 12.6625 6   100 12.6625 12.665 73
2011.03 270 2011.01.09D17:00:00.000000000 12.665  5   94  12.6625 12.665 73
2011.03 290 2011.01.09D17:00:00.000000000 12.665  6   94  12.6625 12.665 68
2011.03 310 2011.01.09D17:00:00.000000000 12.665  1   94  12.6625 12.665 62
..

Aggregations

The xbar operator rounds data down to the nearest specified unit
The by q-SQL clause groups the data before operations are performed on each group

q)ohlc:select o:first tp,h:max tp,l:min tp,c:last tp by expiry,0D00:01 xbar time from taq
q)ohlc
expiry  time                         | o       h       l       c
-------------------------------------| -------------------------------
2011.03 2011.01.09D17:00:00.000000000| 12.6625 12.6675 12.6525 12.6575
2011.03 2011.01.09D17:01:00.000000000| 12.6575 12.6575 12.6475 12.65
2011.03 2011.01.09D17:02:00.000000000| 12.65   12.6525 12.6475 12.65
2011.03 2011.01.09D17:03:00.000000000| 12.65   12.65   12.645  12.6475
2011.03 2011.01.09D17:04:00.000000000| 12.65   12.6525 12.645  12.65
..

Pivoting

The .q4q.pivot function creates a grid of data based on the last column of the supplied keyed table
The new column headers are taken from the last keyed column

q).q4q.pivot select vol:sqrt[252*24*60]*.q4q.gk[o;h;l;c] by 0D02 xbar time,expiry from ohlc
time                         | 2011.03    2011.06    2011.09
-----------------------------| -------------------------------
01.09D16:00:00.000000000| 0.1053295  0.04650148
01.09D18:00:00.000000000| 0.08071778 0
01.09D20:00:00.000000000| 0.07622256 0
01.09D22:00:00.000000000| 0.04644545 0
01.10D00:00:00.000000000| 0.07905985 0
01.10D02:00:00.000000000| 0.1330154  0
01.10D04:00:00.000000000| 0.1124014  0
01.10D06:00:00.000000000| 0.1224625  0
01.10D08:00:00.000000000| 0.2600369  0.05781686 0.3431435
01.10D10:00:00.000000000| 0.2180361  0.03049376
01.10D12:00:00.000000000| 0.1405572  0.02547496
01.10D14:00:00.000000000| 0.1533374  0

Saving/Loading KDB+ Data

The set operator saves KDB+ variables as a single binary file
```
q)`:trade set trade
`:trade
```

The get operator loads a single binary KDB+ file into memory

q)get `:trade
expiry  seq time                          tp      ts
-----------------------------------------------------
2011.03 230 2011.01.09D17:00:00.000000000 12.6625 752
2011.03 250 2011.01.09D17:00:00.000000000 12.6625 6
2011.03 270 2011.01.09D17:00:00.000000000 12.665  5
2011.03 290 2011.01.09D17:00:00.000000000 12.665  6
2011.03 310 2011.01.09D17:00:00.000000000 12.665  1
..

Saving CSV Data

With a character left operand, the 0: operator converts a table into a delimited string

q)"," 0: trade
"expiry,seq,time,tp,ts"
"2011-03,230,2011-01-09D17:00:00.000000000,12.6625,752"
"2011-03,250,2011-01-09D17:00:00.000000000,12.6625,6"
"2011-03,270,2011-01-09D17:00:00.000000000,12.665,5"
"2011-03,290,2011-01-09D17:00:00.000000000,12.665,6"
"2011-03,310,2011-01-09D17:00:00.000000000,12.665,1"
"2011-03,330,2011-01-09D17:00:00.000000000,12.665,1"
..

With a symbol left operand, the 0: operator saves a list of strings as text in a file
```
q)`:trade.csv 0: "," 0: trade
`:trade.csv
```

Homework

Teams of 4/5 will submit a single .q file - that I will execute - which contains the following:

A block comment containing the banner printed when you run q. (20 pts)

/
KDB+ 3.5 2017.03.15 Copyright (C) 1993-2017 Kx Systems
m32/ 4()core 8192MB nick nicks-macbookpro.local 25.34.72.89 NONEXPIRE
\

A comment specifying the URL of a CSV or fixed-width dataset that is freely available on the web and has more than 10,000 rows. (20 pts)
Code using the provided .q4q.download function to download the dataset from within q. (5 pts)
Code to import the dataset into KDB+ using the 0: operator - making sure to load each column as the most appropriate data type. (10 pts)
A select statement to reshape the dataset - ensuring the identifying columns are in the front. (5 pts)
An update statement to apply an appropriate attribute (`s,~`u~,~`p~ or `g) to the first column and a comment explaining why you chose the attribute. (10 pts)
A query to computing an aggregating statistic using the by operator. (15 pts)
A statement generating a report by using the .q4q.pivot function on the previous query. (5 pts)
Code to save the report as a CSV to disk. (5 pts)
A block comment summarizing your findings. (5 pts)

windows users should install the proper version (32 vs 64) for their operating system - installed in “C:\Program Files\7-zip” ↩
If you decide to use this editor - you can connect to your q process by starting it with a port q ts.q -p 5000 ↩ ↩²
CME data from Market Microstructure and Algorithmic Trading taught by Robert Almgren and Duane Seppi ↩

Share on

Twitter Facebook LinkedIn

Nick Psaris