How To Calculate Technical Indicators For Trading Using Pandas
FXC911 – Forex & Crypto News & Insights
Who Can Follow This Article?
No need to be an expert
Some knowledge about classic technical indicators (RSI, moving average ...)
Maybe you are already experienced in basic Pandas functionalities
A quick idea on machine learning predictions in the Forex or stocks Chaos
What Is Included In This Article?
Download prices data from an online source
Load the data into our framework
Check and Clean data content
Quick statistical analysis to reveal hidden patterns
Is it possible to make predictions based on technical indicators? Stats will answer
Dow Jones Industrial Average (DJIA)

The US30 Index, often referred to as the Dow Jones Industrial Average (DJIA), is one of the most well-known stock market indices in the United States. Here’s a brief overview:
Composition: The US30 Index tracks 30 of the largest and most influential publicly traded companies in the U.S. It includes major corporations across various industries, such as technology, finance, healthcare, and consumer goods.
Historical Significance: Established in 1896 by Charles Dow, it was one of the first indices to represent the overall performance of the U.S. stock market. The DJIA has a long history and is often used as a barometer for the health of the U.S. economy.
Price-Weighted: Unlike market-capitalization-weighted indices, the DJIA is price-weighted. This means that stocks with higher prices have more influence on the index's movement than those with lower prices.
Purpose: Investors and analysts use the DJIA to gauge the overall performance of the stock market and the U.S. economy. It provides a snapshot of how the largest companies in the country are performing.
Updates: The components of the DJIA are periodically reviewed and adjusted. Changes are made to reflect shifts in the economy and industry dynamics.
Historical Data Feed
Above dataset is a CSV file containing candlestick data for the USA30 index (also known as the Dow Jones Industrial Average, DJIA) on a 4-hour interval basis, covering the period from August 24, 2023, to August 24, 2024. Here's a summary of the dataset:
1. Columns:
Local time: The timestamp of each data point in the format
DD.MM.YYYY HH:MM:SS.SSS GMT+0330.Open: The opening price of the index for the 4-hour interval.
High: The highest price of the index during the 4-hour interval.
Low: The lowest price of the index during the 4-hour interval.
Close: The closing price of the index for the 4-hour interval.
Volume: The trading volume during the 4-hour interval.
2. Dataset Details:
Number of Entries: 2,197 rows
Data Types:
The
Local timecolumn is of typeobject(string).The
Open,High,Low,Close, andVolumecolumns are of typefloat64.
Prerequisite
python -m pip install pandas 
python -m pip install numpy
python -m pip install pandas_taThese commands are used to install Python packages via pip, the package installer for Python. Here’s what each one does:
python -m pip install pandas: This installs thepandaslibrary, which is a powerful data analysis and manipulation library for Python. It provides data structures like DataFrames and Series, and functions to manipulate and analyze data.python -m pip install numpy: This installs thenumpylibrary, which is fundamental for numerical computing in Python. It provides support for arrays, matrices, and many mathematical functions to operate on these data structures.python -m pip install pandas_ta: This installs thepandas_talibrary, which is a technical analysis library forpandas. It adds a wide range of technical indicators and trading strategy tools topandas, making it useful for financial and trading applications.
These packages are often used together for data analysis and financial modeling.
Loading And Preparing Data
import pandas as pd
df = pd.read_csv("USA30.IDXUSD_Candlestick_4_Hour_BID_24.08.2023-24.08.2024.csv")
df.tail()1. What this code does:
The code reads a CSV file containing candlestick data (price data for financial instruments over 4-hour intervals) into a DataFrame and then displays the last 5 rows of this data. The purpose of this code snippet is likely to check the last few entries in the dataset to see if they were loaded correctly or to inspect the most recent data.
2. Result
	Local time	Open	High	Low	Close	Volume
2192	23.08.2024 07:30:00.000 GMT+0330	40804.999	40842.599	40803.739	40819.099	0.1609
2193	23.08.2024 11:30:00.000 GMT+0330	40817.099	40899.051	40808.063	40894.503	0.2315
2194	23.08.2024 15:30:00.000 GMT+0330	40895.063	41207.739	40839.787	41030.763	3.3244
2195	23.08.2024 19:30:00.000 GMT+0330	41034.727	41212.239	40834.751	41164.203	3.5005
2196	23.08.2024 23:30:00.000 GMT+0330	41166.215	41180.727	41153.275	41162.775	0.09833. Code Breakdown
The code you provided is written in Python and uses the pandas library, which is commonly used for data manipulation and analysis. Here's a breakdown of what each line does:
import pandas as pd: This line imports thepandaslibrary and gives it the aliaspd. This is a standard convention to make the code more concise when usingpandasfunctions.df = pd.read_csv("USA30.IDXUSD_Candlestick_4_Hour_BID_24.08.2023-24.08.2024.csv"): This line reads a CSV (Comma-Separated Values) file named"USA30.IDXUSD_Candlestick_4_Hour_BID_24.08.2023-24.08.2024.csv"into a DataFrame (df). A DataFrame is a two-dimensional, size-mutable, and heterogeneous tabular data structure inpandas, similar to a table in a database or an Excel spreadsheet. Thepd.read_csv()function is used to load the data from the CSV file into this DataFrame.df.tail(): This line returns the last 5 rows of the DataFramedfby default. Thetail()function is useful to quickly inspect the end of a dataset, which might show you the most recent data points or the conclusion of the dataset.
Check If Any Zero Volume Are Available
# Find the indices where 'Volume' is 0
indexZeros = df[df['Volume'] == 0].index
# Drop the rows where 'Volume' is 0
df.drop(indexZeros, inplace=True)
# Correctly select rows where 'Volume' is 0 (if needed for verification)
zero_volume_rows = df[df['Volume'] == 0]
# If you want to print or see the result of the selection
print(zero_volume_rows)1. What this code does:
The code filters and removes all rows where the Volume is 0 from the DataFrame
df.The last line would check for any remaining rows where
Volumeis0(there should be none after thedropoperation).
2. Result
Local time	Open	High	Low	Close	Volume
3. Code Breakdown
indexZeros = df[df['Volume'] == 0].index:This line identifies all rows in the DataFrame
dfwhere the value in theVolumecolumn is0.df['Volume'] == 0creates a boolean mask where each entry isTrueif the corresponding row'sVolumeis0, andFalseotherwise.df[df['Volume'] == 0]filters the DataFrame to only include rows where theVolumeis0.The
.indexattribute retrieves the indices (row labels) of these filtered rows.The resulting indices (where
Volumeis0) are stored in the variableindexZeros.
df.drop(indexZeros, inplace=True):This line removes all rows from the DataFrame
dfthat have indices listed inindexZeros.indexZeroscontains the indices of all rows whereVolumeis0.The
inplace=Trueargument means that the DataFramedfis modified directly, without creating a copy.As a result, all rows with
Volumeequal to0are permanently removed fromdf.
df.loc[(df['Volume'] == 0)]:This line is used to select and display any rows in the DataFrame
dfwhere theVolumecolumn is still0.df['Volume'] == 0is a boolean condition that checks which rows have aVolumeof0.df.loc[(df['Volume'] == 0)]uses.loc[]to filter the DataFrame based on this condition.After the previous
dropoperation, ideally, there should be no rows whereVolumeis0. Thus, this line is likely used to confirm that all such rows were correctly removed.
Check for missing values
df.isna().sum()1. What this code does:
df.isna().sum() returns a Series where each index is a column name from df, and each value is the count of missing values in that column.
2. Result
Local time    0
Open          0
High          0
Low           0
Close         0
Volume        0
dtype: int643. Code Breakdown
Here's a step-by-step explanation of what each part does:
df.isna():The
isna()function (also available asisnull()) is apandasmethod used on a DataFrame (df) to identify missing values.It returns a DataFrame of the same shape as
df, where each element is a boolean (TrueorFalse):Trueindicates that the value is missing (NaN).Falseindicates that the value is not missing.
.sum():The
sum()function, when applied to a DataFrame with boolean values (Trueas 1 andFalseas 0), sums up theTruevalues along the specified axis.By default,
sum()operates column-wise (i.e., along axis=0).Therefore,
df.isna().sum()calculates the total number of missing values (NaNs) in each column of the DataFramedf.
Imports and Data Preparation
1. Imports:
import numpy as np
import pandas_ta as ta
from scipy.stats import linregressnumpy: Used for numerical operations, particularly for handling arrays.pandas_ta: A library that provides technical analysis indicators forpandasDataFrames.scipy.stats.linregress: A function used to perform linear regression and compute the slope.
2. Calculating Technical Indicators:
df['ATR'] = df.ta.atr(length=20)
df['RSI'] = df.ta.rsi()
df['Average'] = df.ta.midprice(length=1)  # midprice
df['MA40'] = df.ta.sma(length=40)
df['MA80'] = df.ta.sma(length=80)
df['MA160'] = df.ta.sma(length=160)ATR(Average True Range): Measures market volatility over the past 20 periods.RSI(Relative Strength Index): Measures the speed and change of price movements, default length is 14 periods if not specified.Average: Calculates the midpoint price of each period (default length is 1, so it will be the average of the high and low for each period).MA40,MA80,MA160: Simple Moving Averages over 40, 80, and 160 periods, respectively.
3. Define get_slope Function:
get_slope Function:def get_slope(array):
    y = np.array(array)
    x = np.array(array)
    slope, intercept, r_value, p_value, std_err = linregress(x, y)
    return slopeThis function calculates the slope of a line fitted to the given
arrayusing linear regression. However, there's a logical error here: bothxandyare set to the samearray, so this will always result in a slope of 1. Instead,xshould be an index range or a sequence of numbers representing the positions of the elements in the array.
4. Set Rolling Window Size:
backrollingN = 6Defines the size of the rolling window for applying the
get_slopefunction.
5. Calculate Slopes:
df['slopeMA40'] = df['MA40'].rolling(window=backrollingN).apply(get_slope, raw=True)
df['slopeMA80'] = df['MA80'].rolling(window=backrollingN).apply(get_slope, raw=True)
df['slopeMA160'] = df['MA160'].rolling(window=backrollingN).apply(get_slope, raw=True)
df['AverageSlope'] = df['Average'].rolling(window=backrollingN).apply(get_slope, raw=True)
df['RSISlope'] = df['RSI'].rolling(window=backrollingN).apply(get_slope, raw=True)For each of the columns (
MA40,MA80,MA160,Average,RSI), a rolling window of sizebackrollingNis applied.The
get_slopefunction is used to compute the slope of the moving average, average price, or RSI over each window.raw=Trueensures that the rolling window provides raw NumPy arrays to theget_slopefunction.
6. Display the DataFrame:
df.tail()Displays the last few rows of the DataFrame to inspect the results.
7. Result
	Local time	Open	High	Low	Close	Volume	ATR	RSI	Average	MA40	MA80	MA160	slopeMA40	slopeMA80	slopeMA160	AverageSlope	RSISlope
2192	23.08.2024 07:30:00.000 GMT+0330	40804.999	40842.599	40803.739	40819.099	0.1609	124.245864	55.787964	40823.169	40700.3338	40033.09270	40138.774363	1.0	1.0	1.0	1.0	1.0
2193	23.08.2024 11:30:00.000 GMT+0330	40817.099	40899.051	40808.063	40894.503	0.2315	122.582971	60.139694	40853.557	40722.4529	40057.39270	40138.476237	1.0	1.0	1.0	1.0	1.0
2194	23.08.2024 15:30:00.000 GMT+0330	40895.063	41207.739	40839.787	41030.763	3.3244	134.851422	66.547493	41023.763	40746.5160	40080.90565	40140.769987	1.0	1.0	1.0	1.0	1.0
2195	23.08.2024 19:30:00.000 GMT+0330	41034.727	41212.239	40834.751	41164.203	3.5005	146.983251	71.396852	41023.495	40772.9939	40108.13175	40143.642337	1.0	1.0	1.0	1.0	1.0
2196	23.08.2024 23:30:00.000 GMT+0330	41166.215	41180.727	41153.275	41162.775	0.0983	141.006689	71.277773	41167.001	40798.7129	40136.31395	40146.574663	1.0	1.0	1.0	1.0	1.0Target Categories
#Target flexible way
pipdiff = 100 #for TP
SLTPRatio = 2 #pipdiff/Ratio gives SL
def mytarget(barsupfront, df1):
    length = len(df1)
    high = list(df1['High'])
    low = list(df1['Low'])
    close = list(df1['Close'])
    open = list(df1['Open'])
    trendcat = [None] * length
    
    for line in range (0,length-barsupfront-2):
        valueOpenLow = 0
        valueOpenHigh = 0
        for i in range(1,barsupfront+2):
            value1 = open[line+1]-low[line+i]
            value2 = open[line+1]-high[line+i]
            valueOpenLow = max(value1, valueOpenLow)
            valueOpenHigh = min(value2, valueOpenHigh)
            if ( (valueOpenLow >= pipdiff) and (-valueOpenHigh <= (pipdiff/SLTPRatio)) ):
                trendcat[line] = 1 #-1 downtrend
                break
            elif ( (valueOpenLow <= (pipdiff/SLTPRatio)) and (-valueOpenHigh >= pipdiff) ):
                trendcat[line] = 2 # uptrend
                break
            else:
                trendcat[line] = 0 # no clear trend
            
    return trendcat
# mytarget(barsfront to take into account, dataframe)
df['mytarget'] = mytarget(16, df)
df.head()Let's break down and interpret the given Python code, which is used to determine trend categories based on historical price data.
1. Code Overview:
Parameters:
pipdiff = 500 * 1e-5: This sets the threshold for the price difference to 0.025. This is used to determine the point at which a trend is considered significant.SLTPRatio = 2: This ratio is used to calculate the Stop Loss (SL) level based on thepipdiff.
Function: mytarget(barsupfront, df1)
mytarget(barsupfront, df1)Input Parameters:
barsupfront: This parameter defines how many bars (data points) ahead to consider for trend analysis.df1: A DataFrame containing historical price data with columns'High','Low','Close', and'Open'.
Initial Setup:
length = len(df1): The number of rows in the DataFrame.high,low,close,open: Lists containing the high, low, close, and open prices from the DataFrame.trendcat: A list initialized withNonevalues to store the trend category for each row.
Main Loop:
The loop iterates from
0tolength - barsupfront - 2. This ensures that we have enough data points for analysis within the bounds of the DataFrame.valueOpenLowandvalueOpenHighare initialized to track the maximum and minimum values of differences between the current open price and the future low and high prices.
Inner Loop:
Iterates from
1tobarsupfront + 1. For each iteration:value1is the difference between the current open price and the future low price.value2is the difference between the current open price and the future high price.valueOpenLowtracks the maximum ofvalue1across all iterations.valueOpenHightracks the minimum ofvalue2across all iterations.
Trend Detection:
If
valueOpenLow >= pipdiffand-valueOpenHigh <= (pipdiff / SLTPRatio), it suggests a downtrend, andtrendcat[line]is set to1.If
valueOpenLow <= (pipdiff / SLTPRatio)and-valueOpenHigh >= pipdiff, it suggests an uptrend, andtrendcat[line]is set to2.If neither condition is met, it indicates no clear trend, and
trendcat[line]is set to0.
Return Value:
The function returns the
trendcatlist, which contains the trend category for each row in the DataFrame.
2. Result
Local time	Open	High	Low	Close	Volume	ATR	RSI	Average	MA40	MA80	MA160	slopeMA40	slopeMA80	slopeMA160	AverageSlope	RSISlope	mytarget
0	23.08.2023 23:30:00.000 GMT+0330	34475.959	34509.449	34455.409	34474.699	0.2835	NaN	NaN	34482.429	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0
1	24.08.2023 03:30:00.000 GMT+0330	34474.999	34526.497	34454.469	34503.439	0.3250	NaN	NaN	34490.483	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0
2	24.08.2023 07:30:00.000 GMT+0330	34503.447	34556.797	34443.209	34458.499	0.3351	NaN	NaN	34500.003	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0
3	24.08.2023 11:30:00.000 GMT+0330	34459.499	34493.297	34387.799	34423.999	0.6830	NaN	NaN	34440.548	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0
4	24.08.2023 15:30:00.000 GMT+0330	34423.229	34680.469	34260.499	34264.289	6.8789	NaN	NaN	34470.484	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0Quick Analysis Example
import matplotlib.pyplot as plt
fig = plt.figure(figsize = (15,20))
ax = fig.gca()
df_model= df[['Volume', 'ATR', 'RSI', 'Average', 'MA40', 'MA80', 'MA160', 'slopeMA40', 'slopeMA80', 'slopeMA160', 'AverageSlope', 'RSISlope', 'mytarget']] 
df_model.hist(ax = ax)
plt.show()This code snippet creates a figure and plots histograms for multiple columns from the DataFrame df. Each histogram shows the distribution of values in the specified columns, helping visualize the spread and frequency of the data for each feature.

Can We Use The RSI Alone As a Trend Indicator?
from matplotlib import pyplot
df_up=df.RSI[ df['mytarget'] == 2 ]
df_down=df.RSI[ df['mytarget'] == 1 ]
df_unclear=df.RSI[ df['mytarget'] == 0 ]
pyplot.hist(df_unclear, bins=100, alpha=0.5, label='unclear')
pyplot.hist(df_down, bins=100, alpha=0.5, label='down')
pyplot.hist(df_up, bins=100, alpha=0.5, label='up')
pyplot.legend(loc='upper right')
pyplot.show()This code generates histograms to visualize the distribution of RSI values based on trend categories from your DataFrame. By plotting histograms for RSI values corresponding to uptrends, downtrends, and unclear trends, you can compare the distributions and observe how RSI values vary across different trend categories.

Visit me on:
🚀 Finance & Trading Insights: fxc911.ir
🐦 Twitter: x.com/fxc911_official/
Last updated