Close
The page header's logo
About
FAQ
Home
Login
USC Login
Register
0
Selected 
Invert selection
Deselect all
Deselect all
 Click here to refresh results
 Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Finding technical trading rules in high-frequency data by using genetic programming
(USC Thesis Other) 

Finding technical trading rules in high-frequency data by using genetic programming

doctype icon
play button
PDF
 Download
 Share
 Open document
 Flip pages
 More
 Download a page range
 Download transcript
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content  

FINDING TECHNICAL TRADING RULES IN HIGH-FREQUENCY DATA BY USING
GENETIC PROGRAMMING

by
Cheng Liu



A Thesis Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHEREN CALIFORNIA
In Partial Fulfillment of the Requirements for the Degree
MASTER OF SCIENCE
(APPLIED MATHEMATICS)




August 2014








Copyright 2014                                                                                                        Cheng Liu

i

To my dear parents - for their forever true love and faith to me
To my close friends and professors for support
Paul, Cheng Liu

 
ii




Table of Contents


Abstract ............................................................................................................................. iii
Introduction ....................................................................................................................... 1
Genetic Algorithms (GA) .................................................................................................. 2
Genetic Programming .............................................................................................. 3
Applying GP to Finding Trading Rules........................................................................... 6
Definition of structures of trading rules ................................................................. 6
Mathematical Models ............................................................................................... 7
Implementation of Genetic Programming .............................................................. 9
Data and Numerical Simulations ............................................................................11
Analysis of Results .................................................................................................. 12
Further Discussion .................................................................................................. 19
Summary .......................................................................................................................... 21
References ........................................................................................................................ 22

 
iii

Abstract
 I use genetic programming to find technical trading rules of S&P 500 index, using one-minute
high frequency intraday data during about one and half year. The model in this paper also
considers short sell when necessary. Without or with very low transaction fee, the model finds
several rules that provide positive excess return, i.e. return over return of passive strategy (buy
and hold). While when the transaction cost is high enough, there is no rule that can generate
positive excess return. And when transaction cost is greater, it is very hard to apply the model to
high frequency data.  
Keywords: Genetic Programming; Tree; High Frequency; Technical Trading Rules;
Excess Return; S&P 500 index












1



 Introduction
In today’s study of technical analysis of stock market, the demands are tremendously
growing for real-time analysis as well as short-term response to minute-based or even second
financial data. Trading rules are located with the aid of more swift methods of analysis. Given
that genetic algorithms have been widely treated as a useful and helpful technique in finding
trading rules, this article further exploit this algorithm with some improvement on minute-based
high frequency data set to make it ‘swift’. Technical analysis uses information of historical prices
movements to forecast future price trends, which is a method of evaluating securities by
analyzing statistical data from market such as past prices and volume. This approach, despite
does not measure intrinsic value of a security or other asset warranty, is still now widely used by
investment professionals like both hedge fund and individual investors for making trading
decisions. Although it has long history, technical analysis and its claims have been regarded
traditionally with suspicion. For the famous Efficient Market Hypothesis (EMH), technical
analysis was considered fully useless if the market is completely efficient. However, with
accumulating evidence that financial markets may be less efficient than was used to be believed,
there is a renewal of academic and industrial interests in forecasting techniques sparkling. The
research on technical trading rules has undergone steps back and forth. In 1960s and early 1970s,
several great scientists such as (Alexander, 1961), (Fama, 1966) held the attitudes that technical
trading rules are useless undertaking or non-profitable method of analysis based on case studies
on filter rules in Dow Jones and Standard & Poor’s stock. Until 1990s, scientists such as (Brock,
1992) took back the technical trading rules onto the stage. The analysis is considered for the
2

Standard & Poor’s composite index (S&P 500).  Even though price data available ahead of the
test has been used, this article still digs deeper into those minute-based data just because of the
concept of real-time analysis and more and more popular high-frequency financial research. With
the development of electronic trading, daily data are not adequate for institutional investors who
take advantage of everything related to trading. High frequency data not only implies the amount
of data, but also indicates the short period of test and data collection. These data sets further help
to find trading rules in such a ‘swift’ scenario. On the other hand, short sell become easy after the
expired of old uptick rule with the replacement of alternative uptick rule. This rule remarks the
revival of short selling. It provides such an opportunity to take short selling into account in
algorithm trading. Based on genetic algorithms, we can test whether there is effective way to find
profitable trading rules.

Genetic Algorithms (GA)
The Genetic Algorithms is a stochastic global search method that mimics natural
evolution. It applies a certain principle of survival to select the best generation from offspring
generated by random parents. Gas work on a group or population of potential solution to search
the local optimal which approximates to the ideal solution hopefully. At each generation, new
offspring will be created by the process of selecting according to the fitness or recombination
from outstanding parents. This process leads the evolution of populations that are better adaptive
to environment. This process mimics natural adaptation so call genetic algorithms. The simple
genetic algorithm (SGA) is given by (Goldberg, 1989) which is used as the basic components of
the GA. The following is a pseudo-code outline of the SGA:

3




Procedure GA
Begin
          t=0;
          Initialize P(t)
          Evaluate P(t);
          While not finished
                                       Begin
                                                  t=t+1
                                                  Select P(t) from P(t-1)
                                                  Reproduce pairs in P(t)
                                                  Evaluate P(t);
                                        End
End
Table 1: Simple Genetic Algorithm

 Genetic Programming
GAs operate on a number of potential solutions i.e. population, which usually is
composed of 100-500 individuals according to scale of problems. There are several methods to
represent individuals like binary string. However, for technical trading rules, real-value
representation is not enough since trading rules include both real and Boolean functions. The
populations could be represented in the tree-like structures, each nodes with successors provides
the arguments for the functions of the nodes, while the terminal nodes i.e. nodes without
successors accord to input value as arguments. The entire structure can evaluate recursively by
evaluating the root node of the tree.  This method is developed by (Koza, 1992), which is an
extension of traditional genetic algorithms. It is also called genetic programming (GP). This
method breaks the restriction of fixed length representation of traditional genetic structures. The
functions set is chosen according to the specific problem. A closure property is satisfied by
4

genetic programming that this property guarantees that all possible recombination and
combination of subtree are well defined. Furthermore, genetic programming also keep the
structure of population. Random trees build the initial population. The root node is chosen
randomly from functions of the same types. Each arguments are also select from legal category
with respect to the functions.  
In the genetic programming, there are two every important concepts.
Definition  
 Crossover: Crossover is applied on an individual by simply switching one of its nodes with
another node from another individual in the population.
In the genetic programming, crossover operator recombines two individuals by replacing random
part of subtree. The operator guarantees to choose the same type node according to the functions
on the nodes to maintain the structure.
Definition
 Mutation: Mutation affects an individual in the population, which can replace a whole node in
the selected individual, or it can replace just the node's information.
And mutation are used to generate tree to replace the second parent, which provide the diversity
of population. The mutation will be set in a fix probability.  
(Koza, 1992) showed the effectiveness of genetic programming. There is more than 99%
probability to find the correct solution by only searching less than 160,000 individuals in the
population of 2
64
. That’s why this method is very attractive for optimization of trading rules
searching.  
The following is the recombination or crossover process of evaluation:  

5

Parent1 ;                                           and

                                          < >


                             average          price       price               max

Parent 2:                                                                                            and
                                                                                   

                                                                                              < +


                                                                               min                   price  average           max  
Offspring:                               and

                                     
                                     < <


                   average           price         min              price    

                     50                                    data    
6

Applying GP to Finding Trading Rules
In this dissertation, I used genetic algorithm to find technical trading rules, actually moving
average (MA), for S&P 500. The destination of this algorithm is to find a criteria to make
decision in the stock markets. Each individual or chromosome in the genetic algorithm represents
a random generated technical trading rule at the initial points of algorithms. Therefore, as the
most important thing, building blocks of trading rules as the genomes in the algorithms become
the threshold.
The structure of trading rules containing past prices, numerical or Boolean values, logical
functions and all possible combination are the chromosomes in the algorithms. This kind of
restriction guarantees the trading strategy is well defined.

 Definition of structures of trading rules
For the functions, we consider real and Boolean. As in real functions, we can define moving
average functions (MA), which is a function of time and past prices. We can also have maximum
and minimum functions of time. Besides, norm function is also a good idea. In the real functions,
real-valued operators are necessary like +, −,∗,÷. For Boolean functions, logical functions and,
or, if-else, not are included. In addition, ≤, ≥ true, false are also in the group.  
Real Functions MA, numerical values,
Max, Min, norm, lag
Real Operations +, −,∗,÷
Boolean Functions ≤, ≥, true, false, and, if-else

Table 2. Category of functions in structure of trading rules  



7

 Mathematical Models
 The goal of this problems is to pursue excess return over passive investment strategy: buy-and-
hold and market risk-free return. What this paper wants to prove or discuss is that is there any
trading rule, at least combined by real and Boolean functions, that provides stable excess return
in stock market, especially for composite index S&P 500. Let’s begin by a single trade: define 𝑝 𝑏
and 𝑝 𝑠 as the price of buy and sell respectly. Consider one-way transaction fee as 𝑐 , 𝑟 as the
return of this single trade, then  
𝑟 =
(1 − 𝑐 )𝑝 𝑠 (1 + 𝑐 )𝑝 𝑏 − 1 = 𝑒 𝑙𝑜𝑔 𝑝 𝑠 𝑝 𝑏 +𝑙𝑜𝑔 1−𝑐 1+𝑐 − 1
In the market, we consider two signals, bullish or bearish. If our forecast shows bullish, what we
should do is to long and close short positions. And similar if the market shows bearish, we
should short and close long positions. Short sell is allowed in US market and it is even widely
supported by alternative uptick rule approved by SEC after Feb 24, 2010 which replaces the old
uptick rules. This makes short sell very easy to implement especially for very liquidity composite
stock index like S&P 500 since there is very low probability for a composite index decreases
more than 10% in a minute or a day even a month. Hence there are four signals of trading: buy-
to-open, buy-to-close, sell-to-open and sell-to-close we can take.  
For time 𝑡 , let 𝑝 𝑡 be the close price of 𝑡 𝑡 ℎ
minute, if it is in long position, then
𝑟 𝑙 (𝑡 ) = 𝑙𝑜𝑔 𝑝 𝑡 𝑝 𝑡 −1

is the continuous compounded return per minute at 𝑡 . Also let 𝑟 𝑠 (𝑡 ) be the return of short position
of time t as the same,
𝑟 𝑠 (𝑡 ) = 𝑙𝑜𝑔 𝑝 𝑡 −1
𝑝 𝑡
8

Since it is not possible to be the status of buy and sell at the same time, we define two indicator
functions: 𝐼 𝑏 (𝑡 ) and 𝐼 𝑠 (𝑡 ) equals to 1 if a rule shows signal of buy and sell accordingly. And
furthermore, it is not hard to see the relationship: 𝐼 𝑏 (𝑡 ) × 𝐼 𝑠 (𝑡 ) = 0  ∀ 𝑡 .  If 𝐼 𝑏 (𝑡 ) = 1, then we
can do buy-to-open and buy-to-close the previous sell-to-open position, and the similarly for
𝐼 𝑠 (𝑡 ) = 1, then we can sell-to-close the previous buy-to-open position and sell-to-open.
Therefore, we can define model the continuously compounded return of a set of trades:
𝑟 = ∑ 𝑟 𝑙 (𝑡 )𝐼 𝑏 (𝑡 ) + ∑ 𝑟 𝑠 (𝑡 )𝐼 𝑠 (𝑡 )
𝑇 𝑡 =𝑡 1
+ 2
𝑇 𝑡 =𝑡 1
𝑁𝑙𝑜𝑔 1 − 𝑐 1 + 𝑐
Where 𝑁 be the total numbers of trade points and 𝑇 is the last time.
For the passive strategy—buy and then hold until last day, it is very easy to calculate following
above method:
𝑟 𝑝𝑎𝑠𝑠𝑖𝑣𝑒 = 𝑙𝑜𝑔 𝑝 𝑇 𝑃 𝑡 1
+ 𝑙𝑜𝑔 1 − 𝑐 1 + 𝑐
Where 𝑃 𝑇 is the close price of last time and 𝑃 𝑡 1
is the close price of initial time.
Therefore, the excess return is
r
excess
= 𝑟 − 𝑟 𝑝𝑎𝑠𝑠𝑖𝑣𝑒
And the final return 𝑅 is  
𝑅 = 𝑒 𝑟 𝑒𝑥𝑐𝑒𝑠𝑠 − 1
While since 𝑒 𝑡 − 1 is monotonic function, we can directly consider r for convenience.
So the problems become an optimization problem in very brief form:
𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒        𝑅 𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑡𝑜    𝑁 , 𝑐 , 𝑝 𝑡         ∀ 𝑡 ∈ (𝑡 1
, 𝑇 )

To solve this problem, we need to search through trading rules to get local optimal because
obviously this is a nonlinear optimization which is even not convex. We could not also go
9

through all trading rules since they are infinite if without any restriction. Therefore, genetic
programming is so suitable for this problem.  

 Implementation of Genetic Programming
 According to results of (Sweeney, 1988), institutional investors could get on-way transaction
fee at 0.1% − 0.2%. Traders from brokerage trading floors can even get lower. In todays, 21
st

Century, with the development of high-frequency trading and much more competitive
environment of stock brokerages, the transaction fee for large size trades has decreased
significantly. Most prime brokerages like Interactive Brokers and TD Ameritrade provide fixed
trading rates like $ 0.40 per 100 shares for hedge fund companies. For SPY , that is in a range of
0.001% - 0.002%. According to the results of (Brogaard, 2014), the mean value of transaction
cost of FSTE 250 is 0.15% per day with 102 stocks and 9,129 million shares. I will test all of
those rates including free transaction fee. Since the intraday data is one-minute data of S&P 500
which is high-frequency financial data, the cost of transaction will significantly influent the
results if the numbers of trade are very large because the variance of stock index in minutes
won’t be very large. Sometimes the profit from volatility in minutes without using leverage
cannot cover transaction fee.  
 First generate an initial population of trading rules randomly for one trial. In the pre-decided
training period, the rules are applied to the one-minute data of S&P 500. Then the genetic
algorithms will create a new generation of rules by recombining parents’ rules. The best rules i.e.
the rules provides maximum excess return will be applied to selection period, which is for
validation of inferred trading rules. If the new generation or the new rule provides higher excess
return than old ones, it will be saved. If there is no improvement in the selection period for
10

certain pre-determined numbers generations, the evolution will be terminated. Then the best rules
will be used in test period. If no rules can beat the passive strategy or even risk-free compound,
this trial will be failed and a new trial will start.
 The following is the process of genetic algorithms for one trial in finding trading rules
1
:

One trial of the genetic programming to find trading rules

Step 1
Create a random rule
Compute the excess return of this rule in training period  
Then do 500 times as initial population

Step 2
Apply the rule with the highest excess return to selection period and compute the relative
excess return.
Save the best one as the initial best rule.

Step3
Pick two best rules randomly as parents.
Crossover: creating a new rule by breaking the parents apart randomly and recombination
Mutation: change the parts of the offspring randomly
Compute the excess rule and save the new rules to replace the old ones
Do 500 times
This is one generation

Step 4
Calculate the excess return of the best rule in selection period. If there is improvement on
return, save the new rule as the best one. Otherwise go back to step 3.
And also if there is no improvement for 30 generation or in total 50 generation, stop the trial.


If there is a rule generated by this process, it will be applied to test period which is fixed so as to
be easy to compare.


                                               
1
I modify the code for my model and data with the permission of (Allen, 1999). All code files and log files
are open to get. If necessary, please email to ask: liu210@usc.edu
11


 Data and Numerical Simulations
 I use one-minute data of S&P 500 index from July 1, 2012 to April 20, 2014 which is provided
by Wharton Research Data Service (WRDS). This period represents post-crisis ages of U.S.
stocks.  

Figure 1. The trend graph of SPY from July 1
st
, 2012 to Aril 20
th
, 2014

Since the prices of SPY varies from 130 to 190, I chose the first day as base to normalize the
price. Consider the moving average of 365 minutes. Then dividing prices from 366
th
minute to
last time by the MA (365). Then the normalized price will be around 1 and hence the usable time
will cover all the timeline.
 I set the transaction fee from 0, 0.0005%, 0.001%. For each situation, there were 25
independent trials based on 5 different training periods and following selection periods. Each
training period lasts for one month. And selection period lasts for one week. For each pair of
12

training period and selection period, I proceed 5 trails independently. After finding rules, I will
test them in the same test period which lasts for one year.

 Analysis of Results
 The results are quite different when transaction fee are not the same.
 When 𝑐 = 0%, i.e. trading is free of cost, the model can find many rules that provide great
excess return. And the trading frequency are very high.  
Table of Results for 1st training period
Rules Passive Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Average
Compound 0.1831 0.3668 0.3668 0.3668 0.3659 0.3740 0.3681
Return 0.2009 0.7331 0.7331 0.7331 0.7315 0.7456 0.7353
Trades/Day 0 185.74 185.74 185.74 185.72 185.78 185.744

Table of Results for 2nd training period
Rules Passive Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Average
Compound 0.1831 0.3668 0.3949 0.1172 0.3728 0.3293 0.3162
Return 0.2009 0.7331 0.7825 0.3503 0.7435 0.6693 0.6476
Trades/Day 0 185.74 139.12 143.38 185.74 115.50 153.896

Table of Results for 3rd training period
Rules Passive Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Average
Compound 0.1831 0.4264 0.1299 -0.0031 0.1340 0.1233 0.1621
Return 0.2009 0.8395 0.3675 0.1972 0.3731 0.3585 0.4123
Trades/Day 0 115.58 62.14 128 31.68 126.44 92.788

Table of Results for 4th training period
Rules Passive Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Average
Compound 0.1831 0.3466 0.3659 0.3467 -0.1713 -0.0804 0.1615
Return 0.2009 0.6984 0.7315 0.6986 0.0119 0.1082 0.4114
Trades/Day 0 156.36 185.72 156.38 75.76 77.84 130.412

13

Table of Results for 5th training period
Rules Passive Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Average
Compound 0.1831 0.3411 0.3408 0.1821 0.3389 0.3421 0.3090
Return 0.2009 0.6891 0.6886 0.4408 0.6854 0.6908 0.6357
Trades/Day 0 137.2 137.2 146.54 137.22 137.22 139.076

We can see from above results that when trading is free of cost, the trading frequency is very
high, most rules provide more than 30% excess return i.e. around 50% total return. It’s rare to
see some rules lose to the passive strategy. However, zero transaction fee is not practical except
paper money.  
Besides, genetic programming also output rules in log files. Here is one simple example of them:
Id = 1388, fitness = 0.0057, birth = 2, time = 1140517.22:49:14 (parents: 475 and 1224)
0  Boolean  >
 1  Real  maximum
  2  Variable  price
  3  Real  constant  1.0046
 4  Real  data
  5  Variable  price
This rule equals to
𝑑𝑎𝑡𝑎 (𝑝𝑟𝑖𝑐𝑒 ) < max (𝑝𝑟𝑖𝑐𝑒 , 1.0046)
Which means that whether current price is less than the bigger one of 1.0046 and current price
and the price in the rule represents the normalized price.  
The result if is presented in the form of tree:


14


>


Maximum                                 data


Price                    1.0046                              price
Figure 2. The trading rule showed in the form of tree.


Figure 3. Trading rule and normalized price
15


Through the figure above can see when the trade will be executed. The red line is the trading
rule, so when the price is below the red line, the status stays in long position, when price goes
over the red line, the status turns to short position.  
While some rules are very complicate:
Id = 1258, fitness = 0.0057, birth = 2, time = 1140517.22:46:33 (parents: 977 and 718)
0  Boolean  <
 1  Real  data
  2  Variable  price
 3  Real  moving-average
  4  Variable  price
  5  Real  lag
   6  Real  +
    7  Real  data
     8  Variable  price
    9  Real  constant  0.8495
   10  Real  norm
    11  Real  *
     12  Real  constant  1.5848
     13  Real  data
      14  Variable  price
    15  Real  *
     16  Real  *
16

      17  Real  data
       18  Variable  price
      19  Real  constant  1.1515
     20  Real  constant  1.1120
This rule equals to:
𝑑𝑎𝑡𝑎 (𝑝𝑟𝑖𝑐𝑒 ) < 𝑀𝐴 (𝑝𝑟𝑖𝑐𝑒 , 𝑙𝑎𝑔 (0.304332 × 𝑝𝑟𝑖𝑐𝑒 , 𝑝𝑟𝑖𝑐𝑒 + 0.8495))
Show in the form of tree:
<

Data        MA

Price       Price               lag

                                         +                norm

                                      0.8459     data              *            *

                                                                                    Price         *     1.1120        1.5848        data

                                                                                     Data                1.1515                           price

                        price
17

When 𝑐 = 0.0005%, results are not very stable, i.e. not all trials can generate rules that provide
positive excess return. Basically majority rules can generate positive total return. Besides, the
number of trading is very large due to the low cost of trading.
Table of Results for 1
st
training period
Rules Passive Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Average
Compound 0.1831 -0.1179 -0.0235 -0.1120 -0.1188 -0.1333 -0.1011
Return 0.2009 0.0674 0.1730 0.0737 0.0664 0.0511 0.0855
Trades/Day 0 185.73 166.19 185.75 185.72 186 181.878

Table of Results for 2
nd
training period
Rules Passive Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Average
Compound 0.1831 0.0391 0.0391 0.0336 0.0391 0.0402 0.0382
Return 0.2009 0.2488 0.2488 0.2420 0.2488 0.2502 0.2477
Trades/Day 0 114.27 114.27 115.44 114.27 114.29 114.508

Table of Results for 3
rd
training period
Rules Passive Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Average
Compound 0.1831 -0.0321 -0.1085 0.0587 -0.1586 -0.0077 -0.0497
Return 0.2009 0.1630 0.0755 0.2735 0.0248 0.1917 0.1427
Trades/Day 0 1.33 36.28 32.43 6.80 2.17 15.802

Table of Results for 4
th
training period
Rules Passive Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Average
Compound 0.1831 -0.2839 -0.2836 -0.0620 -0.2839 -0.1096 -0.2046
Return 0.2009 -0.0959 -0.0956 0.1287 -0.0959 0.0763 -0.0213
Trades/Day 0 77.83 77.84 156.39 77.83 127.86 103.55

Table of Results for 5
th
training period
Rules Passive Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Average
Compound 0.1831 0.1041 -0.0488 -0.0437 0.1671 -0.0160 0.0326
Return 0.2009 0.3327 0.1437 0.1496 0.4194 0.1819 0.2407
Trades/Day 0 114.32 156.97 156.33 100.05 137.23 132.98

18

When 𝑐 = 0.001%, the results became extremely bad. Therefore, I increase the generation of
evaluation to 100 so as to find better results hopefully. As seen from the last situation, many
trials show the same return, it is because they are already the best rules, i.e. the optimal solution
in global area. With higher transaction cost, no rules are found that have positive excess return.
Even only few rules could provide positive total return.  
Table of Results for 1st training period
Rules Passive Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Average
Compound 0.1831 -0.6035 -0.6027 -0.6027 -0.6027 -0.5779 -0.5979
Return 0.2009 -0.3432 -0.3427 -0.3427 -0.3427 -0.3262 -0.3395
Trades/Day 0 185.72 185.73 185.73 185.73 168.09 182.2

Table of Results for 2nd training period
Rules Passive Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Average
Compound 0.1831 -0.2586 -0.3000 -0.2724 -0.2034 -0.1813 -0.2431
Return 0.2009 -0.0727 -0.1103 -0.0854 -0.0201 0.0018 -0.0582
Trades/Day 0 6.18 7.56 4.22 4.27 5.61 5.568

Table of Results for 3rd training period
Rules Passive Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Average
Compound 0.1831 NaN -0.0142 -0.4249 -0.0379 -0.0979 -0.1437
Return 0.2009 NaN 0.1840 -0.2148 0.1563 0.0889 0.0402
Trades/Day 0 0 1.92 8.86 0.65 9.22 5.1625

Table of Results for 4th training period
Rules Passive Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Average
Compound 0.1831 -0.1875 -0.2843 -0.3206 -0.3297 -0.3521 -0.2948
Return 0.2009 -0.0044 -0.0962 -0.1285 -0.1364 -0.1555 -0.1057
Trades/Day 0 6.35 4.41 0.47 6.35 2.14 3.944



19

Table of Results for 5th training period
Rules Passive Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Average
Compound 0.1831 -0.3752 -0.5968 -0.0916 -0.3742 -0.3742 -0.3624
Return 0.2009 -0.1748 -0.3388 0.0958 -0.1740 -0.1740 -0.1641
Trades/Day 0 137.21 185.75 100.08 137.23 137.23 139.5

When c=0.1%, the model has to evaluate over 200 generation to find a rule which is even not
surly for each trial. And the only rule founded still lose to passive strategy. In other training
periods, the model cannot find even one rule.
Table of Results for 1
st
training period
Rules Passive Trial 5
Compound 0.1811 -0.0269
Return 0.1985 0.1667
Trades/Day 0 0.05

We can see that when transaction cost is even higher, the number of trade became very small, i.e.
the trading frequency is very low like only one trade for a month. Actually high frequency data is
not very meaningful in this situation. Conversely, daily data or other long term period data will
be better. In other way, high-frequency data shows the short-term trend of market, moving
average is a trend-following investment method, which means that in high-frequency data
analysis, the numbers of trading will be a lot. If the transaction cost is relative high, it is not
meaningful to proceed high frequency trading.  

 Further Discussion
 As above analysis, this genetic algorithms trading model is not ready to use in real life
commercial purpose. Because it is too sensitive to transaction cost. Besides, genetic
programming is not very stable since its initial population are generated randomly. Compared to
20

(Allen, 1999), I took short position in account and proceed in high-frequency financial data set.
While when facing to high-frequency intraday data, institutional investor will use leverage to
overcome the cost of transaction fee. Besides, the computation also ignore the payment of
dividends, which may lead seasonal disorder to the results. We can see from above that different
training period generate quite different results. If the training period is in a bearish trend, the
rules of course is not suitable for bullish trend. While similarly, if the training period is in the
bullish trend, it will have bad performance during the selection and test period. Moreover, the
ignorance will underestimate the return of passive strategy.  
 Apart from above, this paper only consider moving average and other simple function of price.
There are also other common technical analysis index like MACD, RSI, KDJ and so on. Those
indexes are more complicate and harder to be part of tree structure in the genetic programming.  
 Moreover, this model could be also used in other liquidity market like future market, forex
market and other composite index. The model in this paper also do not consider leverage. When
using leverage, the performance of algorithm with high trading fee will have great improvement
since any tiny volatility could be multiplied by 10x or more. Of course the cost of leverage
should be also consider in this case.  


21

Summary
I use genetic programming to find technical trading rules of S&P 500 index, using one-
minute high frequency intraday data during about one and half year. The model in this paper also
considers short sell when necessary. Without or with very low transaction fee, the model finds
several rules that provide positive excess return, i.e. return over return of passive strategy (buy
and hold). While when the transaction cost is high enough, there is no rule that can generate
positive excess return. And when transaction cost is greater, it is very hard to apply the model to
high frequency data. But it can work well on daily data.  
The model I build is a very direct one. I use limited information as inputs of the
algorithm. Besides, the parameters of model are not necessary optimal. There is much potential
for this model.
Furthermore, in practical investment, especially about high-frequency data or trading, the
requirement of speed of algorithm is very high. The genetic programming is efficient but not fast
enough. When the training period is long, the speed decrease obviously. Besides, the processes
cost at least more than one minute. If trying to use it on real-time model, it has to be optimized to
be faster.  
In conclude, the model could find profitable rule when transaction cost is very low. While
when the cost became high, there is barely no rule that provides positive excess return. As I see,
when the cost is very high, the model should be applied to longer-term period data rather than
high frequency data.  



22


References
Alexander, S. (1961). Price Movements in speculative markets: trends or random walks.
Industrial Management Review 2, 7-26.
Allen, F. (1999). Using Genetic Algorithms to Find Technical Trading Rules. Journal of
Financial Economics 51, 245-271.
Bauer, R. J. (1994). Genetic Algorithms and Investment Strategies. New York: Wiley.
Brock, W. L. (1992). Simple technical trading rules and the stochastic properties of stock returns.
Journal of Finance 47, 1731-1764.
Brogaard, J. (2014). High-Frequency Trading and the Excution Cost of Institutional Investors.
The Financial Review 49, 345-369.
Dacorogna, M. M., U. A. Muller and R. B. Olsen. (2001). An introduction to high-frequency
Finance. San Diego: Academic Press.
Fama, E. B. (1966). Filter rules and stock market trading. Security prices: a supplement. Journal
of Business 39, 226-241.
Gencay, R. (1998). The predictability of Security Returns with Simple Technical Trading Rules.
Journal of Empirical Finance 5, 347-359.
Goldberg, D. E. (1989). Genetic Algorithms in Search, Omptimization and Machine Learning.
Addison Wesley Publishing Company.
Koza. (1992). Genetic Programming: On the Programming of Computers by Means of Natural
Selection. Cambridge: MIT Press.
Stephen Boyd, Lieven Vandenberghe. (2004). Convex Optimization. Canbridge: Cambridge
University Press.
23

Sweeney, R. (1988). Some new filter rule tests: methods and results. Journal of Financial and
Quantitative Analysis 23, 285-300. 
Abstract (if available)
Abstract I use genetic programming to find technical trading rules of S&P 500 index, using one‐minute high frequency intraday data during about one and half years. The model in this paper also considers short sell when necessary. Without or with very low transaction fee, the model finds several rules that provide positive excess return, i.e. return over return of passive strategy (buy and hold). While when the transaction cost is high enough, there is no rule that can generate positive excess return. And when transaction cost is greater, it is very hard to apply the model to high frequency data. 
Linked assets
University of Southern California Dissertations and Theses
doctype icon
University of Southern California Dissertations and Theses 
Action button
Conceptually similar
Improvement of binomial trees model and Black-Scholes model in option pricing
PDF
Improvement of binomial trees model and Black-Scholes model in option pricing 
Asset price dynamics simulation and trading strategy
PDF
Asset price dynamics simulation and trading strategy 
Supervised learning algorithms on factors impacting retweet
PDF
Supervised learning algorithms on factors impacting retweet 
High-frequency Kelly criterion and fat tails: gambling with an edge
PDF
High-frequency Kelly criterion and fat tails: gambling with an edge 
Elements of dynamic programming: theory and application
PDF
Elements of dynamic programming: theory and application 
An application of Markov chain model in board game revised
PDF
An application of Markov chain model in board game revised 
The application of machine learning in stock market
PDF
The application of machine learning in stock market 
Application of statistical learning on breast cancer dataset
PDF
Application of statistical learning on breast cancer dataset 
Interval arithmetic and an application in finance
PDF
Interval arithmetic and an application in finance 
Uniform distribution of sequences: Transcendental number and U.D. mod 1
PDF
Uniform distribution of sequences: Transcendental number and U.D. mod 1 
A nonlinear pharmacokinetic model used in calibrating a transdermal alcohol transport concentration biosensor data analysis software
PDF
A nonlinear pharmacokinetic model used in calibrating a transdermal alcohol transport concentration biosensor data analysis software 
An FDTD model for low and high lightning generated electromagnetic fields
PDF
An FDTD model for low and high lightning generated electromagnetic fields 
Identifying important microRNAs in progression of breast cancer
PDF
Identifying important microRNAs in progression of breast cancer 
Linear filtering and estimation in conditionally Gaussian multi-channel models
PDF
Linear filtering and estimation in conditionally Gaussian multi-channel models 
Generalized Taylor effect for main financial markets
PDF
Generalized Taylor effect for main financial markets 
Coexistence in two type first passage percolation model: expansion of a paper of O. Garet and R. Marchand
PDF
Coexistence in two type first passage percolation model: expansion of a paper of O. Garet and R. Marchand 
Statistical inference of stochastic differential equations driven by Gaussian noise
PDF
Statistical inference of stochastic differential equations driven by Gaussian noise 
Large deviations rates in a Gaussian setting and related topics
PDF
Large deviations rates in a Gaussian setting and related topics 
Second order in time stochastic evolution equations and Wiener chaos approach
PDF
Second order in time stochastic evolution equations and Wiener chaos approach 
Structured two-stage population model with migration between multiple locations in a periodic environment
PDF
Structured two-stage population model with migration between multiple locations in a periodic environment 
Action button
Asset Metadata
Creator Liu, Cheng (author) 
Core Title Finding technical trading rules in high-frequency data by using genetic programming 
Contributor Electronically uploaded by the author (provenance) 
School College of Letters, Arts and Sciences 
Degree Master of Science 
Degree Program Applied Mathematics 
Publication Date 07/24/2014 
Defense Date 06/23/2014 
Publisher University of Southern California (original), University of Southern California. Libraries (digital) 
Tag excess return,genetic programming,high frequency,OAI-PMH Harvest,S,technical trading rules,Tree 
Format application/pdf (imt) 
Language English
Advisor Lototsky, Sergey V. (committee chair), Mancera, Ricardo (committee member), Sacker, Robert (committee member) 
Creator Email liu.paul813@gmail.com 
Permanent Link (DOI) https://doi.org/10.25549/usctheses-c3-448943 
Unique identifier UC11287043 
Identifier etd-LiuCheng-2742.pdf (filename),usctheses-c3-448943 (legacy record id) 
Legacy Identifier etd-LiuCheng-2742.pdf 
Dmrecord 448943 
Document Type Thesis 
Format application/pdf (imt) 
Rights Liu, Cheng 
Type texts
Source University of Southern California (contributing entity), University of Southern California Dissertations and Theses (collection) 
Access Conditions The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law.  Electronic access is being provided by the USC Libraries in agreement with the a... 
Repository Name University of Southern California Digital Library
Repository Location USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
excess return
genetic programming
high frequency
technical trading rules