IBKR Quant Blog


1 2 3 4 5 2 36


Quant

Feature Engineering in Stock Price Estimation Modeling System


By The Alpha Scientist

 

This post is going to delve into the mechanics of feature engineering for the sorts of time series data that you may use as part of a stock price estimation modeling system.

I'll cover the basic concept, then offer some useful Python code recipes for transforming your raw source data into features which can be fed directly into a ML algorithm or ML pipeline.

Anyone who has dabbled with any systems-based trading or charting already has experience with simple forms of feature engineering, whether or not they realized it. For instance:

  • Converting a series of asset prices into percent change values is a simple form of feature engineering
  • Charting prices vs. a moving average is an implicit form of feature engineering
  • Any technical indicator (RSI, MACD, etc...) are also forms of feature engineering
  • The process takes in one or more columns of "raw" input data (e.g., OHLC price data, 10-Q financials, social media sentiment, etc...) and converts it into many columns of engineered features

 

Motivation

I believe (and I don't think I'm alone here!) that feature engineering is the most under-appreciated part of the art of machine learning. It's certainly the most time consuming and tedious, but it's creative and "fun" (for those who like getting their hands dirty with data, anyway...).

Feature engineering is also one of the key areas where those with domain expertise can shine. Those whose expertise in investing is greater than their skill in machine learning will find that feature engineering will allow them to make use of that domain expertise.

Feature engineering is a term of art for data science and machine learning which refers to pre-processing and transforming raw data into a form which is more easily used by machine learning algorithms. Much like industrial processing can extract pure gold from trace elements within raw ore, feature engineering can extract valuable "alpha" from very noisy raw data.

Gold

You have to dig through a lot of dirt to find gold.

Principles and guidelines

Feature engineering is fundamentally a creative process which should not be overly constrained by rules or limits.

However, I do believe there are a few guidelines to be followed:

  • No peeking: Peeking (into the future) is the "original sin" of feature engineering (and estimation modeling in general). It refers to using information about the future (or information which would not yet be known by us...) to engineer a piece of data
  • This can be obvious, like using next_12_months_returns. However, it's most often quite subtle, like using the mean or standard deviation across the full time period to normalize data points (which implicitly leaks future information into our features. The test is whether you would be able to get the exact same value if you were calculating the data point at that point in time rather than today
  • Only the knowable: A corollary to the above, you also need to be honest about what you would have known at the time, not just what had happened at the time. For instance, short borrowing data is reported by exchanges with a considerable time lag. You would want to stamp the feature with the date on which you would have known it
  • Complete the table: Many machine learning algorithms expect that every input feature will have a value (of a certain type) for each observation. If you envision a spreadsheet where each feature is a column and each observation is a row, there should be a value in each cell of the table. Quite often, some features in the table will naturally update themselves more frequently than others
  • Price data updates almost continuously, while short inventory, analyst estimates, or EBITDA tend to update every few weeks or months. In these cases, we'll use a scheme like last observation carried forward (LOCF) to always have a value for each feature in the naturally lower frequency columns. Of course, we will be careful to avoid inadvertent peeking!
  • Avoid false ordinality: Finally, it's extremely important to represent features in a way that captures ordinality only if it has meaning. For instance, it's usually a bad idea to represent "day of the week" as an integer 1 to 7 since this implicitly tells the model to treat Friday as very similar to Thursday, but "a little more". It would also say that Sunday and Monday are totally different (if Sunday =7 and Monday =1). We could miss all manner of interesting patterns in the data

 

Getting Started

I will begin by extracting some toy data into a dataframe using free data from quandl:

First, we'll make a utility function which downloads one or more symbols from quandl and returns the adjusted OHLC data (I generally find adjusted data to be best).

Python

With the data collected, we can create a new dataframe called "features" which will be used to compile all of the features we engineer. Good practice is to create this dataframe with an index from your downloaded data, since you should only have new feature values as you have new primary source data.

As the simple example below illustrates, we can then construct features from the data and store into multiple feature columns. Note that there will often be null values inserted if the formula doesn't produce valid values for each row index.

Python

Side note: I favor following a bland naming convention like f01, f02, etc... for each feature (and then documenting what each feature represents...) rather than using descriptive column names. My reasons for this are three-fold:

  1. Descriptive names tend to be long and cumbersome to use,
  2. They're rarely truly self-describing, and
  3. It's often useful to create an abstraction to conceal from the modeler (either yourself or someone else) what each represents. Think of it like a blind taste test.

Following this basic code pattern, we can generate infinite variations into our features. This is where your domain expertise and analytical creativity come into play!

My suggestion is to make sure you have a reasonable hypothesis before you create any feature, but don't be afraid to try many variations on a theme. There is much to be learned from trying several flavors of feature transformations out.

 

In the next post, the author will demonstrate a series of "recipes" for some of the transforms he has found useful - especially when using linear or quasi-linear models - extract meaningful relationships. Stay tuned for examples with pandas+numpy

----------------

About The Alpha Scientist

I'm Chad, aka The Alpha Scientist. I've created The Alpha Scientist blog to explore the intersection of my two professional passions: locating "alpha" in market inefficiencies and applying data science methods. If you've found this post useful, please follow @data2alpha on Twitter and forward to a friend or colleague who may also find this topic interesting. https://alphascientist.com/

 

This article is from The Alpha Scientist and is being posted with The Alpha Scientist’s permission. The views expressed in this article are solely those of the author and/or The Alpha Scientist and IB is not endorsing or recommending any investment or trading discussed in the article. This material is for information only and is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad-based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation by IB to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.


20715




Quant

Deep Learning - Artificial Neural Network Using TensorFlow In Python - Part 3


By Umesh Palai

 

Get started with TensorFlow and Python with the first and second article of this series.

Cost function

We use cost function to optimize the model. The cost function is used to generate a measure of deviation between the network’s predictions and the actual observed training targets. For regression problems, the mean squared error (MSE) function is commonly used. MSE computes the average squared deviation between predictions and targets.

Quant
 

Optimizer

The optimizer takes care of the necessary computations that are used to adapt the network’s weight and bias variables during training. Those computations invoke the calculation of gradients that indicate the direction in which the weights and biases have to be changed during training in order to minimize the network’s cost function. The development of stable and speedy optimizers is a major field in neural network and deep learning research.

Quant

In this model we use Adam (Adaptive Moment Estimation) Optimizer, which is an extension of the stochastic gradient descent, is one of the default optimizers in deep learning development.

Fitting the neural network

Now we need to fit the neural network that we have created to our train datasets. After having defined the placeholders, variables, initializers, cost functions and optimizers of the network, the model needs to be trained. Usually, this is done by mini batch training. During mini batch training random data samples of n = batch_size are drawn from the training data and fed into the network. The training dataset gets divided into n / batch_size batches that are sequentially fed into the network. At this point the placeholders X and Y come into play. They store the input and target data and present them to the network as inputs and targets.

A sampled data batch of X flows through the network until it reaches the output layer. There, TensorFlow compares the models estimation against the actual observed targets Y in the current batch. Afterwards, TensorFlow conducts an optimization step and updates the networks parameters, corresponding to the selected learning scheme. After having updated the weights and biases, the next batch is sampled and the process repeats itself. The procedure continues until all batches have been presented to the network. One full sweep over all batches is called an epoch.

The training of the network stops once the maximum number of epochs is reached or another stopping criterion defined by the user applies. We stop the training network when epoch reaches 10.

Quant

With this, our artificial neural network has been compiled.

Now that the neural network has been compiled, we can use the predict() method for making estimations. We pass X_test as its argument and store the result in a variable named pred. We then convert pred data in to dataframe and saved in another variable called y_pred. We then convert y_pred to store binary values by storing the condition y_pred >0.5. Now, the variable y_pred stores either True or False depending on whether the predicted value was greater or less than 0.5.

Quant

Next, we create a new column in the dataframe dataset with the column header ‘y_pred’ and store NaN values in the column. We then store the values of y_pred into this new column, starting from the rows of the test dataset. This is done by slicing the dataframe using the iloc method as shown in the code below. We then drop all the NaN values from dataset and store them in a new dataframe named trade_dataset.

Quant
 

Computing Strategy Returns

We can compute the returns of the strategy. We will be taking a long position when the predicted value of y is true and will take a short position when the predicted signal is False.

We first compute the returns that the strategy will earn if a long position is taken at the end of today, and squared off at the end of the next day. We start by creating a new column named ‘Tomorrows Returns’ in the trade_dataset and store in it a value of 0. We use the decimal notation to indicate that floating point values will be stored in this new column. Next, we store in it the log returns of today, i.e. logarithm of the closing price of today divided by the closing price of yesterday. Next, we shift these values upwards by one element so that tomorrow’s returns are stored against the prices of today.

Quant

Next, we will compute the Strategy Returns. We create a new column under the header ‘Strategy_Returns’ and initialize it with a value of 0 to indicate storing floating point values. By using the np.where() function, we then store the value in the column ‘Tomorrows Returns’ if the value in the ‘y_pred’ column stores True (a long position), else we would store negative of the value in the column ‘Tomorrows Returns’ (a short position); into the ‘Strategy Returns’ column.

Quant

We now compute the cumulative returns for both the market and the strategy. These values are computed using the cumsum() function.

Quant

 

Plotting The Graph Of Returns

We will now plot the market returns and our strategy returns to visualize how our strategy is performing against the market. We then use the plot function to plot the graphs of Market Returns and Strategy Returns using the cumulative values stored in the dataframe trade_dataset. We then create the legend and show the plot using the legend() and show() functions respectively. The plot shown below is the output of the code. The green line represents the returns generated using the strategy and the red line represents the market returns.

Quant

Quant

Conclusion

The objective of this project is to make you understand how to build an artificial neural network using TensorFlow in Python.

My advice is to use more than 100,000 data points when you are building Artificial Neural Network or any other Deep Learning model that will be most effective. This model was developed on daily prices to make you understand how to build the model. It is advisable to use the minute or tick data for training the model.

Now you can build your own Artificial Neural Network in Python and start trading using the power and intelligence of your machines.

You can also download the Pyhon code and dataset from my github a/c

 

Disclaimer: All investments and trading in the stock market involve risk. Any decisions to place trades in the financial markets, including trading in stock or options or other financial instruments is a personal decision that should only be made after thorough research, including a personal risk and financial assessment and the engagement of professional assistance to the extent you believe necessary. The trading strategies or related information mentioned in this article is for informational purposes only.

If you want to learn more about  Decision Tree and Neural Network methods in trading strategies, or to download the code in this article, visit QuantInsti website and the educational offerings at their Executive Programme in Algorithmic Trading (EPAT™).

This article is from QuantInsti and is being posted with QuantInsti’s permission. The views expressed in this article are solely those of the author and/or QuantInsti and IB is not endorsing or recommending any investment or trading discussed in the article. This material is for information only and is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad-based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation by IB to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.


20856




Quant

Join IBKR for a Free Webinar with qplum - Technology Stack of a Systematic Trade Execution Engine


Tuesday, December 18, 2018 12:00 PM EST
 

Register

 

qplum - Technology Stack of a Systematic Trade Execution Engine

 

In this webinar, Hardik Patel will discuss the systematic trading infrastructure that qplum uses to efficiently execute its moderately active, quantitative strategies. He will give a detailed architectural blueprint of the trade execution engine. He will also share how working with brokers like Interactive Brokers can cut trading costs, reduce slippage, and increase transparency.

 

Speaker: Hardik Patel, Machine Learning Engineer at qplum

Sponsored by:  qplum

 

Information posted on IBKR Quant that is provided by third-parties and not by Interactive Brokers does NOT constitute a recommendation by Interactive Brokers that you should contract for the services of that third party. Third-party participants who contribute to IBKR Quant are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results


20897




Quant

Machine Learning Basics


By Shagufta Tahsildar

Quant

We have been learning since the dawn of time. From the basics of talking, walking and eating to learning more advanced skills like cooking, dancing or singing. But in today’s world, learning is not just limited to humans. As machines have taken over many of our manual tasks, they’ve also developed the ability to learn. According to a new research report, the Machine Learning market size is expected to grow from USD 1.41 Billion in 2017 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1%.

To learn more about the development of the field of Machine Learning, you can refer to this blog.

In this blog post, we will be reviewing fundamental machine learning topics for beginners and professionals alike that covers  the machine learning process and more.

What is Machine Learning?

Machine Learning, as the name suggests, provides machines with the ability to learn autonomously based on experiences, observations and analyzing patterns within a given data set without explicitly programming. When we write a program or a code for some specific purpose, we are actually writing a definite set of instructions which the machine will follow. Whereas in machine learning, we input a data set through which the machine will learn by identifying and analyzing the patterns in the data set and learn to take decisions autonomously based on its observations and learnings from the dataset.

Timeline of Machine Learning

An article on machine learning basics would be incomplete without covering the history of machine learning. Below, we’ve covered a brief history highlighting critical events.

Quant

 

The difference between Machine Learning, Artificial Intelligence and Deep Learning

While learning about machine learning basics, one often confuses Machine Learning, Artificial Intelligence and Deep Learning. The below diagram clears the concept of machine learning.

Quant

 

How do Machines learn?

Well, the simpler answer is, just like humans do! First, we receive information and attempt to store it so that we may recognize and contextualize what we processed at a later time. In addition, past experiences help us to making decisions in the future. Our brain trains itself by identifying features and patterns in knowledge/data received, enabling ourselves to successfully identify or distinguish information.

Similarly, we feed knowledge/data to the machine; this data is divided into two parts — training data and testing data. The machine learns the patterns and features from the training data and trains itself to identify, classify and predict new data. We use the testing data to measure the machine’s accuracy. Here’s a basic machine learning example:

You want to predict whether the next day is going to be rainy or sunny. Generally, we will do this by looking at a combination of data like the weather conditions of the past few days and present data such as wind direction, cloud formation etc. Had it been raining for the past few days, we would predict that it would rain for the next day too based on the pattern and vice versa. Similarly, we feed the past few days’ weather data along with the present data to the machine. The machine will analyze the patterns and eventually predict the weather for the next day.

 

Classification of Machine Learning Algorithms

Machine Learning algorithms can be classified into:

  1. Supervised Algorithms – Linear Regression, Logistic Regression, Support Vector Machine (SVM), Decision Trees, Random Forest
  2. Unsupervised Algorithms – K Means Clustering.
  3. Reinforcement Algorithm

 

Visit QuantInsti website to learn more about the different types of Machine Learning Algorithms and their application.

 

 

Disclaimer: All investments and trading in the stock market involve risk. Any decisions to place trades in the financial markets, including trading in stock or options or other financial instruments is a personal decision that should only be made after thorough research, including a personal risk and financial assessment and the engagement of professional assistance to the extent you believe necessary. The trading strategies or related information mentioned in this article is for informational purposes only.

 

If you want to learn more about  quant methods in trading strategies, or to download sample machine learning code to train and test your algos, visit QuantInsti website and the educational offerings at their Executive Programme in Algorithmic Trading (EPAT™).

This article is from QuantInsti and is being posted with QuantInsti’s permission. The views expressed in this article are solely those of the author and/or QuantInsti and IB is not endorsing or recommending any investment or trading discussed in the article. This material is for information only and is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad-based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation by IB to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.


20987




Quant

Samssara Capital - Statistics: The Missing Link between Technical Analysis and Algorithmic Trading


Learn how statistical tools like PCA, ARCH, and GARCH are applied in algorithmic trading with this webinar recording:

View the Recording

 

Quant

 

Description

Trading leveraged derivatives using only technical analysis or speculative analysis can lead to windfall losses for even the most disciplined trader and investor. Statistics are often an ignored area of work when it comes to derivatives trading. The webinar will focus upon how volatility can be used for dynamically adjusting stop orders. It will talk about how correlation is an essential method to diversify the class of derivatives being traded or hedged. It will focus on co-integration as a key method to distinguish a mean reverting time series to a non-mean reverting time series. It will touch upon other essential time series econometrics like OU process, VRT as well as statistical tools like PCA, ARCH, GARCH etc. which are essential for derivatives pricing and forecasting the volatility.

Speaker: Manish Jalan, MD and CEO, Samssara Capital

 

Information posted on IBKR Quant that is provided by third-parties and not by Interactive Brokers does NOT constitute a recommendation by Interactive Brokers that you should contract for the services of that third party. Third-party participants who contribute to IBKR Quant are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.


20911




1 2 3 4 5 2 36

Disclosures

We appreciate your feedback. If you have any questions or comments about IBKR Quant Blog please contact ibkrquant@ibkr.com.

The material (including articles and commentary) provided on IBKR Quant Blog is offered for informational purposes only. The posted material is NOT a recommendation by Interactive Brokers (IB) that you or your clients should contract for the services of or invest with any of the independent advisors or hedge funds or others who may post on IBKR Quant Blog or invest with any advisors or hedge funds. The advisors, hedge funds and other analysts who may post on IBKR Quant Blog are independent of IB and IB does not make any representations or warranties concerning the past or future performance of these advisors, hedge funds and others or the accuracy of the information they provide. Interactive Brokers does not conduct a "suitability review" to make sure the trading of any advisor or hedge fund or other party is suitable for you.

Securities or other financial instruments mentioned in the material posted are not suitable for all investors. The material posted does not take into account your particular investment objectives, financial situations or needs and is not intended as a recommendation to you of any particular securities, financial instruments or strategies. Before making any investment or trade, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice. Past performance is no guarantee of future results.

Any information provided by third parties has been obtained from sources believed to be reliable and accurate; however, IB does not warrant its accuracy and assumes no responsibility for any errors or omissions.

Any information posted by employees of IB or an affiliated company is based upon information that is believed to be reliable. However, neither IB nor its affiliates warrant its completeness, accuracy or adequacy. IB does not make any representations or warranties concerning the past or future performance of any financial instrument. By posting material on IB Quant Blog, IB is not representing that any particular financial instrument or trading strategy is appropriate for you.