How To Calculate Technical Indicators For Trading Using Pandas

FXC911 – Forex & Crypto News & Insights

Who Can Follow This Article?

  • No need to be an expert

  • Some knowledge about classic technical indicators (RSI, moving average ...)

  • Maybe you are already experienced in basic Pandas functionalities

  • A quick idea on machine learning predictions in the Forex or stocks Chaos

What Is Included In This Article?

  • Download prices data from an online source

  • Load the data into our framework

  • Check and Clean data content

  • Quick statistical analysis to reveal hidden patterns

  • Is it possible to make predictions based on technical indicators? Stats will answer

Dow Jones Industrial Average (DJIA)

The US30 Index, often referred to as the Dow Jones Industrial Average (DJIA), is one of the most well-known stock market indices in the United States. Here’s a brief overview:

  1. Composition: The US30 Index tracks 30 of the largest and most influential publicly traded companies in the U.S. It includes major corporations across various industries, such as technology, finance, healthcare, and consumer goods.

  2. Historical Significance: Established in 1896 by Charles Dow, it was one of the first indices to represent the overall performance of the U.S. stock market. The DJIA has a long history and is often used as a barometer for the health of the U.S. economy.

  3. Price-Weighted: Unlike market-capitalization-weighted indices, the DJIA is price-weighted. This means that stocks with higher prices have more influence on the index's movement than those with lower prices.

  4. Purpose: Investors and analysts use the DJIA to gauge the overall performance of the stock market and the U.S. economy. It provides a snapshot of how the largest companies in the country are performing.

  5. Updates: The components of the DJIA are periodically reviewed and adjusted. Changes are made to reflect shifts in the economy and industry dynamics.

Historical Data Feed

Dow Jones Dataset

Above dataset is a CSV file containing candlestick data for the USA30 index (also known as the Dow Jones Industrial Average, DJIA) on a 4-hour interval basis, covering the period from August 24, 2023, to August 24, 2024. Here's a summary of the dataset:

1. Columns:

  1. Local time: The timestamp of each data point in the format DD.MM.YYYY HH:MM:SS.SSS GMT+0330.

  2. Open: The opening price of the index for the 4-hour interval.

  3. High: The highest price of the index during the 4-hour interval.

  4. Low: The lowest price of the index during the 4-hour interval.

  5. Close: The closing price of the index for the 4-hour interval.

  6. Volume: The trading volume during the 4-hour interval.

2. Dataset Details:

  • Number of Entries: 2,197 rows

  • Data Types:

    • The Local time column is of type object (string).

    • The Open, High, Low, Close, and Volume columns are of type float64.

Prerequisite

These commands are used to install Python packages via pip, the package installer for Python. Here’s what each one does:

  1. python -m pip install pandas: This installs the pandas library, which is a powerful data analysis and manipulation library for Python. It provides data structures like DataFrames and Series, and functions to manipulate and analyze data.

  2. python -m pip install numpy: This installs the numpy library, which is fundamental for numerical computing in Python. It provides support for arrays, matrices, and many mathematical functions to operate on these data structures.

  3. python -m pip install pandas_ta: This installs the pandas_ta library, which is a technical analysis library for pandas. It adds a wide range of technical indicators and trading strategy tools to pandas, making it useful for financial and trading applications.

These packages are often used together for data analysis and financial modeling.

Loading And Preparing Data

1. What this code does:

The code reads a CSV file containing candlestick data (price data for financial instruments over 4-hour intervals) into a DataFrame and then displays the last 5 rows of this data. The purpose of this code snippet is likely to check the last few entries in the dataset to see if they were loaded correctly or to inspect the most recent data.

2. Result

3. Code Breakdown

The code you provided is written in Python and uses the pandas library, which is commonly used for data manipulation and analysis. Here's a breakdown of what each line does:

  1. import pandas as pd: This line imports the pandas library and gives it the alias pd. This is a standard convention to make the code more concise when using pandas functions.

  2. df = pd.read_csv("USA30.IDXUSD_Candlestick_4_Hour_BID_24.08.2023-24.08.2024.csv"): This line reads a CSV (Comma-Separated Values) file named "USA30.IDXUSD_Candlestick_4_Hour_BID_24.08.2023-24.08.2024.csv" into a DataFrame (df). A DataFrame is a two-dimensional, size-mutable, and heterogeneous tabular data structure in pandas, similar to a table in a database or an Excel spreadsheet. The pd.read_csv() function is used to load the data from the CSV file into this DataFrame.

  3. df.tail(): This line returns the last 5 rows of the DataFrame df by default. The tail() function is useful to quickly inspect the end of a dataset, which might show you the most recent data points or the conclusion of the dataset.

Check If Any Zero Volume Are Available

1. What this code does:

  • The code filters and removes all rows where the Volume is 0 from the DataFrame df.

  • The last line would check for any remaining rows where Volume is 0 (there should be none after the drop operation).

2. Result

3. Code Breakdown

  • indexZeros = df[df['Volume'] == 0].index:

    • This line identifies all rows in the DataFrame df where the value in the Volume column is 0.

    • df['Volume'] == 0 creates a boolean mask where each entry is True if the corresponding row's Volume is 0, and False otherwise.

    • df[df['Volume'] == 0] filters the DataFrame to only include rows where the Volume is 0.

    • The .index attribute retrieves the indices (row labels) of these filtered rows.

    • The resulting indices (where Volume is 0) are stored in the variable indexZeros.

  • df.drop(indexZeros, inplace=True):

    • This line removes all rows from the DataFrame df that have indices listed in indexZeros.

    • indexZeros contains the indices of all rows where Volume is 0.

    • The inplace=True argument means that the DataFrame df is modified directly, without creating a copy.

    • As a result, all rows with Volume equal to 0 are permanently removed from df.

  • df.loc[(df['Volume'] == 0)]:

    • This line is used to select and display any rows in the DataFrame df where the Volume column is still 0.

    • df['Volume'] == 0 is a boolean condition that checks which rows have a Volume of 0.

    • df.loc[(df['Volume'] == 0)] uses .loc[] to filter the DataFrame based on this condition.

    • After the previous drop operation, ideally, there should be no rows where Volume is 0. Thus, this line is likely used to confirm that all such rows were correctly removed.

Check for missing values

1. What this code does:

df.isna().sum() returns a Series where each index is a column name from df, and each value is the count of missing values in that column.

2. Result

3. Code Breakdown

Here's a step-by-step explanation of what each part does:

  1. df.isna():

    • The isna() function (also available as isnull()) is a pandas method used on a DataFrame (df) to identify missing values.

    • It returns a DataFrame of the same shape as df, where each element is a boolean (True or False):

      • True indicates that the value is missing (NaN).

      • False indicates that the value is not missing.

  2. .sum():

    • The sum() function, when applied to a DataFrame with boolean values (True as 1 and False as 0), sums up the True values along the specified axis.

    • By default, sum() operates column-wise (i.e., along axis=0).

    • Therefore, df.isna().sum() calculates the total number of missing values (NaNs) in each column of the DataFrame df.

Imports and Data Preparation

1. Imports:

  • numpy: Used for numerical operations, particularly for handling arrays.

  • pandas_ta: A library that provides technical analysis indicators for pandas DataFrames.

  • scipy.stats.linregress: A function used to perform linear regression and compute the slope.

2. Calculating Technical Indicators:

  • ATR (Average True Range): Measures market volatility over the past 20 periods.

  • RSI (Relative Strength Index): Measures the speed and change of price movements, default length is 14 periods if not specified.

  • Average: Calculates the midpoint price of each period (default length is 1, so it will be the average of the high and low for each period).

  • MA40, MA80, MA160: Simple Moving Averages over 40, 80, and 160 periods, respectively.

3. Define get_slope Function:

  • This function calculates the slope of a line fitted to the given array using linear regression. However, there's a logical error here: both x and y are set to the same array, so this will always result in a slope of 1. Instead, x should be an index range or a sequence of numbers representing the positions of the elements in the array.

4. Set Rolling Window Size:

  • Defines the size of the rolling window for applying the get_slope function.

5. Calculate Slopes:

  • For each of the columns (MA40, MA80, MA160, Average, RSI), a rolling window of size backrollingN is applied.

  • The get_slope function is used to compute the slope of the moving average, average price, or RSI over each window.

  • raw=True ensures that the rolling window provides raw NumPy arrays to the get_slope function.

6. Display the DataFrame:

  • Displays the last few rows of the DataFrame to inspect the results.

7. Result

Target Categories

Let's break down and interpret the given Python code, which is used to determine trend categories based on historical price data.

1. Code Overview:

Parameters:

  • pipdiff = 500 * 1e-5: This sets the threshold for the price difference to 0.025. This is used to determine the point at which a trend is considered significant.

  • SLTPRatio = 2: This ratio is used to calculate the Stop Loss (SL) level based on the pipdiff.

Function: mytarget(barsupfront, df1)

  1. Input Parameters:

    • barsupfront: This parameter defines how many bars (data points) ahead to consider for trend analysis.

    • df1: A DataFrame containing historical price data with columns 'High', 'Low', 'Close', and 'Open'.

  2. Initial Setup:

    • length = len(df1): The number of rows in the DataFrame.

    • high, low, close, open: Lists containing the high, low, close, and open prices from the DataFrame.

    • trendcat: A list initialized with None values to store the trend category for each row.

  3. Main Loop:

    • The loop iterates from 0 to length - barsupfront - 2. This ensures that we have enough data points for analysis within the bounds of the DataFrame.

    • valueOpenLow and valueOpenHigh are initialized to track the maximum and minimum values of differences between the current open price and the future low and high prices.

  4. Inner Loop:

    • Iterates from 1 to barsupfront + 1. For each iteration:

      • value1 is the difference between the current open price and the future low price.

      • value2 is the difference between the current open price and the future high price.

      • valueOpenLow tracks the maximum of value1 across all iterations.

      • valueOpenHigh tracks the minimum of value2 across all iterations.

  5. Trend Detection:

    • If valueOpenLow >= pipdiff and -valueOpenHigh <= (pipdiff / SLTPRatio), it suggests a downtrend, and trendcat[line] is set to 1.

    • If valueOpenLow <= (pipdiff / SLTPRatio) and -valueOpenHigh >= pipdiff, it suggests an uptrend, and trendcat[line] is set to 2.

    • If neither condition is met, it indicates no clear trend, and trendcat[line] is set to 0.

  6. Return Value:

    • The function returns the trendcat list, which contains the trend category for each row in the DataFrame.

2. Result

Quick Analysis Example

This code snippet creates a figure and plots histograms for multiple columns from the DataFrame df. Each histogram shows the distribution of values in the specified columns, helping visualize the spread and frequency of the data for each feature.

Can We Use The RSI Alone As a Trend Indicator?

This code generates histograms to visualize the distribution of RSI values based on trend categories from your DataFrame. By plotting histograms for RSI values corresponding to uptrends, downtrends, and unclear trends, you can compare the distributions and observe how RSI values vary across different trend categories.

Visit me on:

Last updated