A toolkit to process NASDAQ TotalView-ITCH data.
MIT License
A toolkit to process NASDAQ TotalView-ITCH data for academic research.
Nasdaq TotalView-ITCH (βTotalViewβ) is a data feed used by professional traders to maintain a real-time view of market conditions. TotalView disseminates all quote and order activity for securities traded on the Nasdaq exchangeβseveral billion messages per dayβallowing users to reconstruct the limit order book for any security up to arbitrary depth with nanosecond precision. It is a unique data source for financial economists and engineers examining topics such as information flows through lit exchanges, optimal trading strategies, and the development of macro-level indicators from micro-level signals (e.g., a market turbulence warning).
While TotalView data is provided at no charge to academic researchers via the Historical TotalView-ITCH offering, the historical data offering uses a binary file specification that poses challenges for researchers. TotalViewITCH.jl is a pure Julia package developed to efficiently process historical data files for academic research purposes. The package consists of: (1) a core module to parse Historical TotalView binary file format messages (i.e., deserialization), (2) a module to reconstruct limit order books from parsed messages, and (3) a module to store processed data into a research-friendly format.
The package is not yet part of the general registry. You can install it from GitHub instead:
add https://github.com/cswaney/TotalViewITCH.jl.git
Usage is straightforward:
using TotalViewITCH: Parser, FileSystem, find
using Dates
parser = Parser{FileSystem}("./data/test")
parser("./data/bin/S031413-v41.txt", Date("2013-03-14"), ["A"], 4.1)
This example parses a raw ITCH file, S031213-v41.txt
, which happens to have
v4.1
formatting, and stores the extracted data (message, orderbooks, etc.) to
CSV files in ./data/test
. To process multiple tickers, simply add additional
tickers to the list:
parser("./data/bin/S031413-v41.txt", Date("2013-03-14"), ["A", "APPL"], 4.1)
Processing of multiple files (i.e., dates) should be performed with multiple processes or, better yet, using multiple jobs on a high-performance computing cluster.
The processed data can be loaded using your favorite data processing tools (e.g., DataFrames.jl
). For convenience, TotalViewITCHh provices a find
method to pull all
data associated with a ticker-date pair:
df = find(parser.backend, "messages", "A", Date("2013-03-14"))
This method isn't recommended for large-scale analysis, but works fine for exploring single ticker-dates.
[!TIP] For large-scale analyses, its recommended to convert the processed data to the Apache Parquet format and use tools such as Apache Spark.
TotalViewITCH.jl aims to support a variety data storage options via Backends
. A backend is a struct that knows how to read and write ITCH data stored in a particular format. The currently supported backends are FileSystem
and MongoDB
.
The FileSystem
backend stores data in CSV format. Output has the following directory structure:
test
|- messages
|- ticker=A
|- date=2013-03-14
|- partition.csv
|- orderbooks
|- noii
|- trades
This structure is convenient for parallelizing analyses performed at the ticker-date level.
MongoDB
For small to medium sized databases, TotalViewITCH
also provides a MongoDB backend. To set up a MongoDB database with Docker, run the following command in a terminal:
docker run -p 27017:27017 --volume path/to/data/db:/data/db mongo:latest
This command exposes the database to your local machine on port 27017
. Now you
can populate the database in Julia:
using TotalViewITCH: Parser, MongoDB
backend = MongoDB("mongodb://localhost:27017", "test")
parser = Parser{MongoDB}(backend)
parser("./data/bin/S031413-v41.txt", Date("2013-03-14"), ["A"], 4.1)
Coming soon π¦Ί π§ π¨
Coming soon π¦Ί π§ π¨
The default parsing method creates four tables/collections:
messages
: messages that reflect order book updates,orderbooks
: order book snapshots following each message,noii
: net order imbalance indicator messages,trades
: messages that indicate trades involving non-displayed orders,All records are stored in ascending temporal order, and all data is stored without modification, i.e., all fields adhere to the format described in the relevant TotalView specification.
messages
Each row of the messages
table indicates an update to the order book. The types of updates are:
A
or F
)X
)D
)U
)E
or C
)Note that replace orders are not split into their constituent add and delete orders in the database.
Field | Type | Description | Required? | Default |
---|---|---|---|---|
date | Date |
The file date (YYYY-MM-DD ). |
β | |
sec | Int |
The number of seconds since midnight. | β | |
nano | Int |
The number of nanoseconds since the most recent second. | β | |
type | Char |
The message type symbol as defined in TotalView specification. | β | |
ticker | String |
The stock ticker associated with the message. | β | |
side | Char |
The side of the order book affected by the message (B or S ). |
β | |
price | Int |
The price associated with an order update. | β | |
refno | Int |
A day-unique reference number associated with an original limit order. | β | |
newrefno | Int |
A day-unique reference number associated with a new limit order. | Missing |
|
mpid | String |
An optional market participant identifier. | Missing |
orderbooks
Each row the orderbooks
table represents a snapshot of the order book associated with an order book update. That is, the n
-th row of the orderbooks
table represents the state of the order book immediately following the update indicated by the n
-th row of the messages
table. The exact fields available depend on the number of levels of levels tracked during parsing, N
. For a given N
, prices and shares are recorded in order from best to worst offer for bids and asks, respectively.
Field | Type | Description | Required? | Default |
---|---|---|---|---|
date | Date |
The file date (YYYY-MM-DD ). |
β | |
sec | Int |
The number of seconds since midnight. | β | |
nano | Int |
The number of nanoseconds since the most recent second. | β | Missing |
bid_price_n
|
Int |
The offer price of the n -th best bid (N=1,..., N ). |
β | Missing |
ask_price_n
|
Int |
The offer price of the n -th best ask (N=1,..., N ). |
β | Missing |
bid_shares_n
|
Int |
The offer volume at the n -th best bid (N=1,..., N ). |
β | Missing |
ask_shares_n
|
Int |
The offer volume at the n -th best ask (N=1,..., N ). |
β | Missing |
noii
Net Order Imbalance Indicator (NOII) messages are disseminated prior to market open and close as well as during quote only periods. The noii
collection stores these messages for all tickers in a single file for each date.
Field | Type | Description | Required? | Default |
---|---|---|---|---|
date | Date |
The file date (YYYY-MM-DD ). |
β | |
sec | Int |
The number of seconds since midnight. | β | |
nano | Int |
The number of nanoseconds since the most recent second. | β | |
type | Char |
The cross type: opening (O ), close (C ) or halted (H ). |
β | |
ticker | String |
The stock ticker associated with the message. | β | |
paired | Int |
The number of shares matched at the current reference price. | β | |
imbalance | Int |
The number of shares not paired at the current reference price. | β | |
direction | Char |
The side of the imbalance (B , S , N or O ). |
β | |
far | Int |
A hypothetical clearing price for cross orders only. | β | |
near | Int |
A hypothetical clearing price for cross and continuous orders. | β | |
current | Int |
The price at which the imbalance is calculated. | β |
trades
Rows of the trades
collection reflect two types of trades that are not captured in the order book update: cross and non-cross trades. Non-cross trade messages "provide details for normal match events involving non-displayable order type"βi.e., hidden orders. Cross trade message (type=='Q'
) "indicate that Nasdaq has completed its cross process for a specific security". Neither trade type affects the state of the (visible) order book, but both should be included in volume calculations.
Field | Type | Description | Required? | Default |
---|---|---|---|---|
date | Date |
The file date (YYYY-MM-DD ). |
β | |
sec | Int |
The number of seconds since midnight. | β | |
nano | Int |
The number of nanoseconds since the most recent second. | β | |
type | Char |
The type of trade: hidden (P ) or cross (Q ). |
β | |
ticker | String |
The stock ticker associated with the trade. | β | |
refno | Int |
A day-unique reference number associated with an original limit order. | Hidden trades only. | Missing |
matchno | Int |
A day-unique reference number associated with the trade or cross. | β | |
side | Char |
The type of non-display order matched (B of S ). |
Hidden trades only. | Missing |
price | Int |
The price of the cross. | Cross trades only. | Missing |
shares | Int |
The number of shares traded. | β | |
cross | Int |
The cross type: opening (O ), close (C ), halted (H ) or intrday (I ). |
β |
TotalViewITCH.jl
supports versions 4.1
and 5.0
of the TotalView-ITCH file
specificiation. The parser processes all message types required to reconstruct
limit order books as well as several types that do not impact the order book.
Message Type | Symbol | Supported? | Notes |
---|---|---|---|
Timestamp | T | 4.1 | Message type only exists for v4.1 . |
System | S | β | |
Market Participant | L | ||
Trade Action | H | β | |
Reg SHO | Y | ||
Stock Directory | R | ||
Add | A | β | |
Add w/ MPID | F | β | |
Execute | E | β | |
Execute w/ Price | C | β | |
Cancel | X | β | |
Delete | D | β | |
Replace | U | β | |
Cross Trade | Q | β | Ignored by order book updates. |
Trade | P | β | Ignored by order book updates. |
Broken Trade | B | Ignored by order book updates. | |
NOII | I | β | |
RPII | N |
We plan to process and record the following additional message types:
[!WARNING] Note that the format of the database is not stable and will likely change in the near future.
There are no plans to support the following message categories:
This package is intended to be a community resource for researchers working with TotalViewITCH. If you find a bug, have a suggestion or otherwise wish to contribute to the package, please feel free to create an issue.