How To Calculate Technical Indicators For Trading Using Pandas
FXC911 – Forex & Crypto News & Insights
Who Can Follow This Article?
No need to be an expert
Some knowledge about classic technical indicators (RSI, moving average ...)
Maybe you are already experienced in basic Pandas functionalities
A quick idea on machine learning predictions in the Forex or stocks Chaos
What Is Included In This Article?
Download prices data from an online source
Load the data into our framework
Check and Clean data content
Quick statistical analysis to reveal hidden patterns
Is it possible to make predictions based on technical indicators? Stats will answer
Dow Jones Industrial Average (DJIA)

The US30 Index, often referred to as the Dow Jones Industrial Average (DJIA), is one of the most well-known stock market indices in the United States. Here’s a brief overview:
Composition: The US30 Index tracks 30 of the largest and most influential publicly traded companies in the U.S. It includes major corporations across various industries, such as technology, finance, healthcare, and consumer goods.
Historical Significance: Established in 1896 by Charles Dow, it was one of the first indices to represent the overall performance of the U.S. stock market. The DJIA has a long history and is often used as a barometer for the health of the U.S. economy.
Price-Weighted: Unlike market-capitalization-weighted indices, the DJIA is price-weighted. This means that stocks with higher prices have more influence on the index's movement than those with lower prices.
Purpose: Investors and analysts use the DJIA to gauge the overall performance of the stock market and the U.S. economy. It provides a snapshot of how the largest companies in the country are performing.
Updates: The components of the DJIA are periodically reviewed and adjusted. Changes are made to reflect shifts in the economy and industry dynamics.
Historical Data Feed
Above dataset is a CSV file containing candlestick data for the USA30 index (also known as the Dow Jones Industrial Average, DJIA) on a 4-hour interval basis, covering the period from August 24, 2023, to August 24, 2024. Here's a summary of the dataset:
1. Columns:
Local time: The timestamp of each data point in the format
DD.MM.YYYY HH:MM:SS.SSS GMT+0330
.Open: The opening price of the index for the 4-hour interval.
High: The highest price of the index during the 4-hour interval.
Low: The lowest price of the index during the 4-hour interval.
Close: The closing price of the index for the 4-hour interval.
Volume: The trading volume during the 4-hour interval.
2. Dataset Details:
Number of Entries: 2,197 rows
Data Types:
The
Local time
column is of typeobject
(string).The
Open
,High
,Low
,Close
, andVolume
columns are of typefloat64
.
Prerequisite
python -m pip install pandas
python -m pip install numpy
python -m pip install pandas_ta
These commands are used to install Python packages via pip, the package installer for Python. Here’s what each one does:
python -m pip install pandas
: This installs thepandas
library, which is a powerful data analysis and manipulation library for Python. It provides data structures like DataFrames and Series, and functions to manipulate and analyze data.python -m pip install numpy
: This installs thenumpy
library, which is fundamental for numerical computing in Python. It provides support for arrays, matrices, and many mathematical functions to operate on these data structures.python -m pip install pandas_ta
: This installs thepandas_ta
library, which is a technical analysis library forpandas
. It adds a wide range of technical indicators and trading strategy tools topandas
, making it useful for financial and trading applications.
These packages are often used together for data analysis and financial modeling.
Loading And Preparing Data
import pandas as pd
df = pd.read_csv("USA30.IDXUSD_Candlestick_4_Hour_BID_24.08.2023-24.08.2024.csv")
df.tail()
1. What this code does:
The code reads a CSV file containing candlestick data (price data for financial instruments over 4-hour intervals) into a DataFrame and then displays the last 5 rows of this data. The purpose of this code snippet is likely to check the last few entries in the dataset to see if they were loaded correctly or to inspect the most recent data.
2. Result
Local time Open High Low Close Volume
2192 23.08.2024 07:30:00.000 GMT+0330 40804.999 40842.599 40803.739 40819.099 0.1609
2193 23.08.2024 11:30:00.000 GMT+0330 40817.099 40899.051 40808.063 40894.503 0.2315
2194 23.08.2024 15:30:00.000 GMT+0330 40895.063 41207.739 40839.787 41030.763 3.3244
2195 23.08.2024 19:30:00.000 GMT+0330 41034.727 41212.239 40834.751 41164.203 3.5005
2196 23.08.2024 23:30:00.000 GMT+0330 41166.215 41180.727 41153.275 41162.775 0.0983
3. Code Breakdown
The code you provided is written in Python and uses the pandas library, which is commonly used for data manipulation and analysis. Here's a breakdown of what each line does:
import pandas as pd
: This line imports thepandas
library and gives it the aliaspd
. This is a standard convention to make the code more concise when usingpandas
functions.df = pd.read_csv("USA30.IDXUSD_Candlestick_4_Hour_BID_24.08.2023-24.08.2024.csv")
: This line reads a CSV (Comma-Separated Values) file named"USA30.IDXUSD_Candlestick_4_Hour_BID_24.08.2023-24.08.2024.csv"
into a DataFrame (df
). A DataFrame is a two-dimensional, size-mutable, and heterogeneous tabular data structure inpandas
, similar to a table in a database or an Excel spreadsheet. Thepd.read_csv()
function is used to load the data from the CSV file into this DataFrame.df.tail()
: This line returns the last 5 rows of the DataFramedf
by default. Thetail()
function is useful to quickly inspect the end of a dataset, which might show you the most recent data points or the conclusion of the dataset.
Check If Any Zero Volume Are Available
# Find the indices where 'Volume' is 0
indexZeros = df[df['Volume'] == 0].index
# Drop the rows where 'Volume' is 0
df.drop(indexZeros, inplace=True)
# Correctly select rows where 'Volume' is 0 (if needed for verification)
zero_volume_rows = df[df['Volume'] == 0]
# If you want to print or see the result of the selection
print(zero_volume_rows)
1. What this code does:
The code filters and removes all rows where the Volume is 0 from the DataFrame
df
.The last line would check for any remaining rows where
Volume
is0
(there should be none after thedrop
operation).
2. Result
Local time Open High Low Close Volume
3. Code Breakdown
indexZeros = df[df['Volume'] == 0].index
:This line identifies all rows in the DataFrame
df
where the value in theVolume
column is0
.df['Volume'] == 0
creates a boolean mask where each entry isTrue
if the corresponding row'sVolume
is0
, andFalse
otherwise.df[df['Volume'] == 0]
filters the DataFrame to only include rows where theVolume
is0
.The
.index
attribute retrieves the indices (row labels) of these filtered rows.The resulting indices (where
Volume
is0
) are stored in the variableindexZeros
.
df.drop(indexZeros, inplace=True)
:This line removes all rows from the DataFrame
df
that have indices listed inindexZeros
.indexZeros
contains the indices of all rows whereVolume
is0
.The
inplace=True
argument means that the DataFramedf
is modified directly, without creating a copy.As a result, all rows with
Volume
equal to0
are permanently removed fromdf
.
df.loc[(df['Volume'] == 0)]
:This line is used to select and display any rows in the DataFrame
df
where theVolume
column is still0
.df['Volume'] == 0
is a boolean condition that checks which rows have aVolume
of0
.df.loc[(df['Volume'] == 0)]
uses.loc[]
to filter the DataFrame based on this condition.After the previous
drop
operation, ideally, there should be no rows whereVolume
is0
. Thus, this line is likely used to confirm that all such rows were correctly removed.
Check for missing values
df.isna().sum()
1. What this code does:
df.isna().sum()
returns a Series where each index is a column name from df
, and each value is the count of missing values in that column.
2. Result
Local time 0
Open 0
High 0
Low 0
Close 0
Volume 0
dtype: int64
3. Code Breakdown
Here's a step-by-step explanation of what each part does:
df.isna()
:The
isna()
function (also available asisnull()
) is apandas
method used on a DataFrame (df
) to identify missing values.It returns a DataFrame of the same shape as
df
, where each element is a boolean (True
orFalse
):True
indicates that the value is missing (NaN
).False
indicates that the value is not missing.
.sum()
:The
sum()
function, when applied to a DataFrame with boolean values (True
as 1 andFalse
as 0), sums up theTrue
values along the specified axis.By default,
sum()
operates column-wise (i.e., along axis=0).Therefore,
df.isna().sum()
calculates the total number of missing values (NaN
s) in each column of the DataFramedf
.
Imports and Data Preparation
1. Imports:
import numpy as np
import pandas_ta as ta
from scipy.stats import linregress
numpy
: Used for numerical operations, particularly for handling arrays.pandas_ta
: A library that provides technical analysis indicators forpandas
DataFrames.scipy.stats.linregress
: A function used to perform linear regression and compute the slope.
2. Calculating Technical Indicators:
df['ATR'] = df.ta.atr(length=20)
df['RSI'] = df.ta.rsi()
df['Average'] = df.ta.midprice(length=1) # midprice
df['MA40'] = df.ta.sma(length=40)
df['MA80'] = df.ta.sma(length=80)
df['MA160'] = df.ta.sma(length=160)
ATR
(Average True Range): Measures market volatility over the past 20 periods.RSI
(Relative Strength Index): Measures the speed and change of price movements, default length is 14 periods if not specified.Average
: Calculates the midpoint price of each period (default length is 1, so it will be the average of the high and low for each period).MA40
,MA80
,MA160
: Simple Moving Averages over 40, 80, and 160 periods, respectively.
3. Define get_slope
Function:
get_slope
Function:def get_slope(array):
y = np.array(array)
x = np.array(array)
slope, intercept, r_value, p_value, std_err = linregress(x, y)
return slope
This function calculates the slope of a line fitted to the given
array
using linear regression. However, there's a logical error here: bothx
andy
are set to the samearray
, so this will always result in a slope of 1. Instead,x
should be an index range or a sequence of numbers representing the positions of the elements in the array.
4. Set Rolling Window Size:
backrollingN = 6
Defines the size of the rolling window for applying the
get_slope
function.
5. Calculate Slopes:
df['slopeMA40'] = df['MA40'].rolling(window=backrollingN).apply(get_slope, raw=True)
df['slopeMA80'] = df['MA80'].rolling(window=backrollingN).apply(get_slope, raw=True)
df['slopeMA160'] = df['MA160'].rolling(window=backrollingN).apply(get_slope, raw=True)
df['AverageSlope'] = df['Average'].rolling(window=backrollingN).apply(get_slope, raw=True)
df['RSISlope'] = df['RSI'].rolling(window=backrollingN).apply(get_slope, raw=True)
For each of the columns (
MA40
,MA80
,MA160
,Average
,RSI
), a rolling window of sizebackrollingN
is applied.The
get_slope
function is used to compute the slope of the moving average, average price, or RSI over each window.raw=True
ensures that the rolling window provides raw NumPy arrays to theget_slope
function.
6. Display the DataFrame:
df.tail()
Displays the last few rows of the DataFrame to inspect the results.
7. Result
Local time Open High Low Close Volume ATR RSI Average MA40 MA80 MA160 slopeMA40 slopeMA80 slopeMA160 AverageSlope RSISlope
2192 23.08.2024 07:30:00.000 GMT+0330 40804.999 40842.599 40803.739 40819.099 0.1609 124.245864 55.787964 40823.169 40700.3338 40033.09270 40138.774363 1.0 1.0 1.0 1.0 1.0
2193 23.08.2024 11:30:00.000 GMT+0330 40817.099 40899.051 40808.063 40894.503 0.2315 122.582971 60.139694 40853.557 40722.4529 40057.39270 40138.476237 1.0 1.0 1.0 1.0 1.0
2194 23.08.2024 15:30:00.000 GMT+0330 40895.063 41207.739 40839.787 41030.763 3.3244 134.851422 66.547493 41023.763 40746.5160 40080.90565 40140.769987 1.0 1.0 1.0 1.0 1.0
2195 23.08.2024 19:30:00.000 GMT+0330 41034.727 41212.239 40834.751 41164.203 3.5005 146.983251 71.396852 41023.495 40772.9939 40108.13175 40143.642337 1.0 1.0 1.0 1.0 1.0
2196 23.08.2024 23:30:00.000 GMT+0330 41166.215 41180.727 41153.275 41162.775 0.0983 141.006689 71.277773 41167.001 40798.7129 40136.31395 40146.574663 1.0 1.0 1.0 1.0 1.0
Target Categories
#Target flexible way
pipdiff = 100 #for TP
SLTPRatio = 2 #pipdiff/Ratio gives SL
def mytarget(barsupfront, df1):
length = len(df1)
high = list(df1['High'])
low = list(df1['Low'])
close = list(df1['Close'])
open = list(df1['Open'])
trendcat = [None] * length
for line in range (0,length-barsupfront-2):
valueOpenLow = 0
valueOpenHigh = 0
for i in range(1,barsupfront+2):
value1 = open[line+1]-low[line+i]
value2 = open[line+1]-high[line+i]
valueOpenLow = max(value1, valueOpenLow)
valueOpenHigh = min(value2, valueOpenHigh)
if ( (valueOpenLow >= pipdiff) and (-valueOpenHigh <= (pipdiff/SLTPRatio)) ):
trendcat[line] = 1 #-1 downtrend
break
elif ( (valueOpenLow <= (pipdiff/SLTPRatio)) and (-valueOpenHigh >= pipdiff) ):
trendcat[line] = 2 # uptrend
break
else:
trendcat[line] = 0 # no clear trend
return trendcat
# mytarget(barsfront to take into account, dataframe)
df['mytarget'] = mytarget(16, df)
df.head()
Let's break down and interpret the given Python code, which is used to determine trend categories based on historical price data.
1. Code Overview:
Parameters:
pipdiff = 500 * 1e-5
: This sets the threshold for the price difference to 0.025. This is used to determine the point at which a trend is considered significant.SLTPRatio = 2
: This ratio is used to calculate the Stop Loss (SL) level based on thepipdiff
.
Function: mytarget(barsupfront, df1)
mytarget(barsupfront, df1)
Input Parameters:
barsupfront
: This parameter defines how many bars (data points) ahead to consider for trend analysis.df1
: A DataFrame containing historical price data with columns'High'
,'Low'
,'Close'
, and'Open'
.
Initial Setup:
length = len(df1)
: The number of rows in the DataFrame.high
,low
,close
,open
: Lists containing the high, low, close, and open prices from the DataFrame.trendcat
: A list initialized withNone
values to store the trend category for each row.
Main Loop:
The loop iterates from
0
tolength - barsupfront - 2
. This ensures that we have enough data points for analysis within the bounds of the DataFrame.valueOpenLow
andvalueOpenHigh
are initialized to track the maximum and minimum values of differences between the current open price and the future low and high prices.
Inner Loop:
Iterates from
1
tobarsupfront + 1
. For each iteration:value1
is the difference between the current open price and the future low price.value2
is the difference between the current open price and the future high price.valueOpenLow
tracks the maximum ofvalue1
across all iterations.valueOpenHigh
tracks the minimum ofvalue2
across all iterations.
Trend Detection:
If
valueOpenLow >= pipdiff
and-valueOpenHigh <= (pipdiff / SLTPRatio)
, it suggests a downtrend, andtrendcat[line]
is set to1
.If
valueOpenLow <= (pipdiff / SLTPRatio)
and-valueOpenHigh >= pipdiff
, it suggests an uptrend, andtrendcat[line]
is set to2
.If neither condition is met, it indicates no clear trend, and
trendcat[line]
is set to0
.
Return Value:
The function returns the
trendcat
list, which contains the trend category for each row in the DataFrame.
2. Result
Local time Open High Low Close Volume ATR RSI Average MA40 MA80 MA160 slopeMA40 slopeMA80 slopeMA160 AverageSlope RSISlope mytarget
0 23.08.2023 23:30:00.000 GMT+0330 34475.959 34509.449 34455.409 34474.699 0.2835 NaN NaN 34482.429 NaN NaN NaN NaN NaN NaN NaN NaN 0.0
1 24.08.2023 03:30:00.000 GMT+0330 34474.999 34526.497 34454.469 34503.439 0.3250 NaN NaN 34490.483 NaN NaN NaN NaN NaN NaN NaN NaN 0.0
2 24.08.2023 07:30:00.000 GMT+0330 34503.447 34556.797 34443.209 34458.499 0.3351 NaN NaN 34500.003 NaN NaN NaN NaN NaN NaN NaN NaN 0.0
3 24.08.2023 11:30:00.000 GMT+0330 34459.499 34493.297 34387.799 34423.999 0.6830 NaN NaN 34440.548 NaN NaN NaN NaN NaN NaN NaN NaN 0.0
4 24.08.2023 15:30:00.000 GMT+0330 34423.229 34680.469 34260.499 34264.289 6.8789 NaN NaN 34470.484 NaN NaN NaN NaN NaN NaN NaN NaN 0.0
Quick Analysis Example
import matplotlib.pyplot as plt
fig = plt.figure(figsize = (15,20))
ax = fig.gca()
df_model= df[['Volume', 'ATR', 'RSI', 'Average', 'MA40', 'MA80', 'MA160', 'slopeMA40', 'slopeMA80', 'slopeMA160', 'AverageSlope', 'RSISlope', 'mytarget']]
df_model.hist(ax = ax)
plt.show()
This code snippet creates a figure and plots histograms for multiple columns from the DataFrame df
. Each histogram shows the distribution of values in the specified columns, helping visualize the spread and frequency of the data for each feature.

Can We Use The RSI Alone As a Trend Indicator?
from matplotlib import pyplot
df_up=df.RSI[ df['mytarget'] == 2 ]
df_down=df.RSI[ df['mytarget'] == 1 ]
df_unclear=df.RSI[ df['mytarget'] == 0 ]
pyplot.hist(df_unclear, bins=100, alpha=0.5, label='unclear')
pyplot.hist(df_down, bins=100, alpha=0.5, label='down')
pyplot.hist(df_up, bins=100, alpha=0.5, label='up')
pyplot.legend(loc='upper right')
pyplot.show()
This code generates histograms to visualize the distribution of RSI
values based on trend categories from your DataFrame. By plotting histograms for RSI
values corresponding to uptrends, downtrends, and unclear trends, you can compare the distributions and observe how RSI
values vary across different trend categories.

Visit me on:
🚀 Finance & Trading Insights: fxc911.ir
🐦 Twitter: x.com/fxc911_official/
Last updated