How To Calculate Technical Indicators For Trading Using Pandas

FXC911 – Forex & Crypto News & Insights

Who Can Follow This Article?

  • No need to be an expert

  • Some knowledge about classic technical indicators (RSI, moving average ...)

  • Maybe you are already experienced in basic Pandas functionalities

  • A quick idea on machine learning predictions in the Forex or stocks Chaos

What Is Included In This Article?

  • Download prices data from an online source

  • Load the data into our framework

  • Check and Clean data content

  • Quick statistical analysis to reveal hidden patterns

  • Is it possible to make predictions based on technical indicators? Stats will answer

Dow Jones Industrial Average (DJIA)

The US30 Index, often referred to as the Dow Jones Industrial Average (DJIA), is one of the most well-known stock market indices in the United States. Here’s a brief overview:

  1. Composition: The US30 Index tracks 30 of the largest and most influential publicly traded companies in the U.S. It includes major corporations across various industries, such as technology, finance, healthcare, and consumer goods.

  2. Historical Significance: Established in 1896 by Charles Dow, it was one of the first indices to represent the overall performance of the U.S. stock market. The DJIA has a long history and is often used as a barometer for the health of the U.S. economy.

  3. Price-Weighted: Unlike market-capitalization-weighted indices, the DJIA is price-weighted. This means that stocks with higher prices have more influence on the index's movement than those with lower prices.

  4. Purpose: Investors and analysts use the DJIA to gauge the overall performance of the stock market and the U.S. economy. It provides a snapshot of how the largest companies in the country are performing.

  5. Updates: The components of the DJIA are periodically reviewed and adjusted. Changes are made to reflect shifts in the economy and industry dynamics.

Historical Data Feed

Dow Jones Dataset

Above dataset is a CSV file containing candlestick data for the USA30 index (also known as the Dow Jones Industrial Average, DJIA) on a 4-hour interval basis, covering the period from August 24, 2023, to August 24, 2024. Here's a summary of the dataset:

1. Columns:

  1. Local time: The timestamp of each data point in the format DD.MM.YYYY HH:MM:SS.SSS GMT+0330.

  2. Open: The opening price of the index for the 4-hour interval.

  3. High: The highest price of the index during the 4-hour interval.

  4. Low: The lowest price of the index during the 4-hour interval.

  5. Close: The closing price of the index for the 4-hour interval.

  6. Volume: The trading volume during the 4-hour interval.

2. Dataset Details:

  • Number of Entries: 2,197 rows

  • Data Types:

    • The Local time column is of type object (string).

    • The Open, High, Low, Close, and Volume columns are of type float64.

Prerequisite

python -m pip install pandas 
python -m pip install numpy
python -m pip install pandas_ta

These commands are used to install Python packages via pip, the package installer for Python. Here’s what each one does:

  1. python -m pip install pandas: This installs the pandas library, which is a powerful data analysis and manipulation library for Python. It provides data structures like DataFrames and Series, and functions to manipulate and analyze data.

  2. python -m pip install numpy: This installs the numpy library, which is fundamental for numerical computing in Python. It provides support for arrays, matrices, and many mathematical functions to operate on these data structures.

  3. python -m pip install pandas_ta: This installs the pandas_ta library, which is a technical analysis library for pandas. It adds a wide range of technical indicators and trading strategy tools to pandas, making it useful for financial and trading applications.

These packages are often used together for data analysis and financial modeling.

Loading And Preparing Data

import pandas as pd
df = pd.read_csv("USA30.IDXUSD_Candlestick_4_Hour_BID_24.08.2023-24.08.2024.csv")
df.tail()

1. What this code does:

The code reads a CSV file containing candlestick data (price data for financial instruments over 4-hour intervals) into a DataFrame and then displays the last 5 rows of this data. The purpose of this code snippet is likely to check the last few entries in the dataset to see if they were loaded correctly or to inspect the most recent data.

2. Result

	Local time	Open	High	Low	Close	Volume
2192	23.08.2024 07:30:00.000 GMT+0330	40804.999	40842.599	40803.739	40819.099	0.1609
2193	23.08.2024 11:30:00.000 GMT+0330	40817.099	40899.051	40808.063	40894.503	0.2315
2194	23.08.2024 15:30:00.000 GMT+0330	40895.063	41207.739	40839.787	41030.763	3.3244
2195	23.08.2024 19:30:00.000 GMT+0330	41034.727	41212.239	40834.751	41164.203	3.5005
2196	23.08.2024 23:30:00.000 GMT+0330	41166.215	41180.727	41153.275	41162.775	0.0983

3. Code Breakdown

The code you provided is written in Python and uses the pandas library, which is commonly used for data manipulation and analysis. Here's a breakdown of what each line does:

  1. import pandas as pd: This line imports the pandas library and gives it the alias pd. This is a standard convention to make the code more concise when using pandas functions.

  2. df = pd.read_csv("USA30.IDXUSD_Candlestick_4_Hour_BID_24.08.2023-24.08.2024.csv"): This line reads a CSV (Comma-Separated Values) file named "USA30.IDXUSD_Candlestick_4_Hour_BID_24.08.2023-24.08.2024.csv" into a DataFrame (df). A DataFrame is a two-dimensional, size-mutable, and heterogeneous tabular data structure in pandas, similar to a table in a database or an Excel spreadsheet. The pd.read_csv() function is used to load the data from the CSV file into this DataFrame.

  3. df.tail(): This line returns the last 5 rows of the DataFrame df by default. The tail() function is useful to quickly inspect the end of a dataset, which might show you the most recent data points or the conclusion of the dataset.

Check If Any Zero Volume Are Available

# Find the indices where 'Volume' is 0
indexZeros = df[df['Volume'] == 0].index

# Drop the rows where 'Volume' is 0
df.drop(indexZeros, inplace=True)

# Correctly select rows where 'Volume' is 0 (if needed for verification)
zero_volume_rows = df[df['Volume'] == 0]

# If you want to print or see the result of the selection
print(zero_volume_rows)

1. What this code does:

  • The code filters and removes all rows where the Volume is 0 from the DataFrame df.

  • The last line would check for any remaining rows where Volume is 0 (there should be none after the drop operation).

2. Result

Local time	Open	High	Low	Close	Volume

3. Code Breakdown

  • indexZeros = df[df['Volume'] == 0].index:

    • This line identifies all rows in the DataFrame df where the value in the Volume column is 0.

    • df['Volume'] == 0 creates a boolean mask where each entry is True if the corresponding row's Volume is 0, and False otherwise.

    • df[df['Volume'] == 0] filters the DataFrame to only include rows where the Volume is 0.

    • The .index attribute retrieves the indices (row labels) of these filtered rows.

    • The resulting indices (where Volume is 0) are stored in the variable indexZeros.

  • df.drop(indexZeros, inplace=True):

    • This line removes all rows from the DataFrame df that have indices listed in indexZeros.

    • indexZeros contains the indices of all rows where Volume is 0.

    • The inplace=True argument means that the DataFrame df is modified directly, without creating a copy.

    • As a result, all rows with Volume equal to 0 are permanently removed from df.

  • df.loc[(df['Volume'] == 0)]:

    • This line is used to select and display any rows in the DataFrame df where the Volume column is still 0.

    • df['Volume'] == 0 is a boolean condition that checks which rows have a Volume of 0.

    • df.loc[(df['Volume'] == 0)] uses .loc[] to filter the DataFrame based on this condition.

    • After the previous drop operation, ideally, there should be no rows where Volume is 0. Thus, this line is likely used to confirm that all such rows were correctly removed.

Check for missing values

df.isna().sum()

1. What this code does:

df.isna().sum() returns a Series where each index is a column name from df, and each value is the count of missing values in that column.

2. Result

Local time    0
Open          0
High          0
Low           0
Close         0
Volume        0
dtype: int64

3. Code Breakdown

Here's a step-by-step explanation of what each part does:

  1. df.isna():

    • The isna() function (also available as isnull()) is a pandas method used on a DataFrame (df) to identify missing values.

    • It returns a DataFrame of the same shape as df, where each element is a boolean (True or False):

      • True indicates that the value is missing (NaN).

      • False indicates that the value is not missing.

  2. .sum():

    • The sum() function, when applied to a DataFrame with boolean values (True as 1 and False as 0), sums up the True values along the specified axis.

    • By default, sum() operates column-wise (i.e., along axis=0).

    • Therefore, df.isna().sum() calculates the total number of missing values (NaNs) in each column of the DataFrame df.

Imports and Data Preparation

1. Imports:

import numpy as np
import pandas_ta as ta
from scipy.stats import linregress
  • numpy: Used for numerical operations, particularly for handling arrays.

  • pandas_ta: A library that provides technical analysis indicators for pandas DataFrames.

  • scipy.stats.linregress: A function used to perform linear regression and compute the slope.

2. Calculating Technical Indicators:

df['ATR'] = df.ta.atr(length=20)
df['RSI'] = df.ta.rsi()
df['Average'] = df.ta.midprice(length=1)  # midprice
df['MA40'] = df.ta.sma(length=40)
df['MA80'] = df.ta.sma(length=80)
df['MA160'] = df.ta.sma(length=160)
  • ATR (Average True Range): Measures market volatility over the past 20 periods.

  • RSI (Relative Strength Index): Measures the speed and change of price movements, default length is 14 periods if not specified.

  • Average: Calculates the midpoint price of each period (default length is 1, so it will be the average of the high and low for each period).

  • MA40, MA80, MA160: Simple Moving Averages over 40, 80, and 160 periods, respectively.

3. Define get_slope Function:

def get_slope(array):
    y = np.array(array)
    x = np.array(array)
    slope, intercept, r_value, p_value, std_err = linregress(x, y)
    return slope
  • This function calculates the slope of a line fitted to the given array using linear regression. However, there's a logical error here: both x and y are set to the same array, so this will always result in a slope of 1. Instead, x should be an index range or a sequence of numbers representing the positions of the elements in the array.

4. Set Rolling Window Size:

backrollingN = 6
  • Defines the size of the rolling window for applying the get_slope function.

5. Calculate Slopes:

df['slopeMA40'] = df['MA40'].rolling(window=backrollingN).apply(get_slope, raw=True)
df['slopeMA80'] = df['MA80'].rolling(window=backrollingN).apply(get_slope, raw=True)
df['slopeMA160'] = df['MA160'].rolling(window=backrollingN).apply(get_slope, raw=True)
df['AverageSlope'] = df['Average'].rolling(window=backrollingN).apply(get_slope, raw=True)
df['RSISlope'] = df['RSI'].rolling(window=backrollingN).apply(get_slope, raw=True)
  • For each of the columns (MA40, MA80, MA160, Average, RSI), a rolling window of size backrollingN is applied.

  • The get_slope function is used to compute the slope of the moving average, average price, or RSI over each window.

  • raw=True ensures that the rolling window provides raw NumPy arrays to the get_slope function.

6. Display the DataFrame:

df.tail()
  • Displays the last few rows of the DataFrame to inspect the results.

7. Result

	Local time	Open	High	Low	Close	Volume	ATR	RSI	Average	MA40	MA80	MA160	slopeMA40	slopeMA80	slopeMA160	AverageSlope	RSISlope
2192	23.08.2024 07:30:00.000 GMT+0330	40804.999	40842.599	40803.739	40819.099	0.1609	124.245864	55.787964	40823.169	40700.3338	40033.09270	40138.774363	1.0	1.0	1.0	1.0	1.0
2193	23.08.2024 11:30:00.000 GMT+0330	40817.099	40899.051	40808.063	40894.503	0.2315	122.582971	60.139694	40853.557	40722.4529	40057.39270	40138.476237	1.0	1.0	1.0	1.0	1.0
2194	23.08.2024 15:30:00.000 GMT+0330	40895.063	41207.739	40839.787	41030.763	3.3244	134.851422	66.547493	41023.763	40746.5160	40080.90565	40140.769987	1.0	1.0	1.0	1.0	1.0
2195	23.08.2024 19:30:00.000 GMT+0330	41034.727	41212.239	40834.751	41164.203	3.5005	146.983251	71.396852	41023.495	40772.9939	40108.13175	40143.642337	1.0	1.0	1.0	1.0	1.0
2196	23.08.2024 23:30:00.000 GMT+0330	41166.215	41180.727	41153.275	41162.775	0.0983	141.006689	71.277773	41167.001	40798.7129	40136.31395	40146.574663	1.0	1.0	1.0	1.0	1.0

Target Categories

#Target flexible way
pipdiff = 100 #for TP
SLTPRatio = 2 #pipdiff/Ratio gives SL

def mytarget(barsupfront, df1):
    length = len(df1)
    high = list(df1['High'])
    low = list(df1['Low'])
    close = list(df1['Close'])
    open = list(df1['Open'])
    trendcat = [None] * length
    
    for line in range (0,length-barsupfront-2):
        valueOpenLow = 0
        valueOpenHigh = 0
        for i in range(1,barsupfront+2):
            value1 = open[line+1]-low[line+i]
            value2 = open[line+1]-high[line+i]
            valueOpenLow = max(value1, valueOpenLow)
            valueOpenHigh = min(value2, valueOpenHigh)

            if ( (valueOpenLow >= pipdiff) and (-valueOpenHigh <= (pipdiff/SLTPRatio)) ):
                trendcat[line] = 1 #-1 downtrend
                break
            elif ( (valueOpenLow <= (pipdiff/SLTPRatio)) and (-valueOpenHigh >= pipdiff) ):
                trendcat[line] = 2 # uptrend
                break
            else:
                trendcat[line] = 0 # no clear trend
            
    return trendcat


# mytarget(barsfront to take into account, dataframe)
df['mytarget'] = mytarget(16, df)
df.head()

Let's break down and interpret the given Python code, which is used to determine trend categories based on historical price data.

1. Code Overview:

Parameters:

  • pipdiff = 500 * 1e-5: This sets the threshold for the price difference to 0.025. This is used to determine the point at which a trend is considered significant.

  • SLTPRatio = 2: This ratio is used to calculate the Stop Loss (SL) level based on the pipdiff.

Function: mytarget(barsupfront, df1)

  1. Input Parameters:

    • barsupfront: This parameter defines how many bars (data points) ahead to consider for trend analysis.

    • df1: A DataFrame containing historical price data with columns 'High', 'Low', 'Close', and 'Open'.

  2. Initial Setup:

    • length = len(df1): The number of rows in the DataFrame.

    • high, low, close, open: Lists containing the high, low, close, and open prices from the DataFrame.

    • trendcat: A list initialized with None values to store the trend category for each row.

  3. Main Loop:

    • The loop iterates from 0 to length - barsupfront - 2. This ensures that we have enough data points for analysis within the bounds of the DataFrame.

    • valueOpenLow and valueOpenHigh are initialized to track the maximum and minimum values of differences between the current open price and the future low and high prices.

  4. Inner Loop:

    • Iterates from 1 to barsupfront + 1. For each iteration:

      • value1 is the difference between the current open price and the future low price.

      • value2 is the difference between the current open price and the future high price.

      • valueOpenLow tracks the maximum of value1 across all iterations.

      • valueOpenHigh tracks the minimum of value2 across all iterations.

  5. Trend Detection:

    • If valueOpenLow >= pipdiff and -valueOpenHigh <= (pipdiff / SLTPRatio), it suggests a downtrend, and trendcat[line] is set to 1.

    • If valueOpenLow <= (pipdiff / SLTPRatio) and -valueOpenHigh >= pipdiff, it suggests an uptrend, and trendcat[line] is set to 2.

    • If neither condition is met, it indicates no clear trend, and trendcat[line] is set to 0.

  6. Return Value:

    • The function returns the trendcat list, which contains the trend category for each row in the DataFrame.

2. Result

Local time	Open	High	Low	Close	Volume	ATR	RSI	Average	MA40	MA80	MA160	slopeMA40	slopeMA80	slopeMA160	AverageSlope	RSISlope	mytarget
0	23.08.2023 23:30:00.000 GMT+0330	34475.959	34509.449	34455.409	34474.699	0.2835	NaN	NaN	34482.429	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0
1	24.08.2023 03:30:00.000 GMT+0330	34474.999	34526.497	34454.469	34503.439	0.3250	NaN	NaN	34490.483	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0
2	24.08.2023 07:30:00.000 GMT+0330	34503.447	34556.797	34443.209	34458.499	0.3351	NaN	NaN	34500.003	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0
3	24.08.2023 11:30:00.000 GMT+0330	34459.499	34493.297	34387.799	34423.999	0.6830	NaN	NaN	34440.548	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0
4	24.08.2023 15:30:00.000 GMT+0330	34423.229	34680.469	34260.499	34264.289	6.8789	NaN	NaN	34470.484	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0

Quick Analysis Example

import matplotlib.pyplot as plt
fig = plt.figure(figsize = (15,20))
ax = fig.gca()
df_model= df[['Volume', 'ATR', 'RSI', 'Average', 'MA40', 'MA80', 'MA160', 'slopeMA40', 'slopeMA80', 'slopeMA160', 'AverageSlope', 'RSISlope', 'mytarget']] 
df_model.hist(ax = ax)
plt.show()

This code snippet creates a figure and plots histograms for multiple columns from the DataFrame df. Each histogram shows the distribution of values in the specified columns, helping visualize the spread and frequency of the data for each feature.

Can We Use The RSI Alone As a Trend Indicator?

from matplotlib import pyplot
df_up=df.RSI[ df['mytarget'] == 2 ]
df_down=df.RSI[ df['mytarget'] == 1 ]
df_unclear=df.RSI[ df['mytarget'] == 0 ]
pyplot.hist(df_unclear, bins=100, alpha=0.5, label='unclear')
pyplot.hist(df_down, bins=100, alpha=0.5, label='down')
pyplot.hist(df_up, bins=100, alpha=0.5, label='up')

pyplot.legend(loc='upper right')
pyplot.show()

This code generates histograms to visualize the distribution of RSI values based on trend categories from your DataFrame. By plotting histograms for RSI values corresponding to uptrends, downtrends, and unclear trends, you can compare the distributions and observe how RSI values vary across different trend categories.

Visit me on:

Last updated