Thursday, February 15, 2024

CHAPTER 14 Building a Machine Learning Forecasting Model in Python: A Step-by-Step Guide

Here's a basic framework for building a forecasting model using machine learning in Python:

  1. Data Collection and Preparation:

    • Collect relevant data for your forecasting task. This could be historical data of the metric you want to forecast.
    • Preprocess the data, handling missing values, outliers, and encoding categorical variables if necessary.
    • Split the data into training and testing sets.
  2. Feature Engineering:

    • Extract relevant features from the data that can help improve the forecasting accuracy.
    • Features could include lagged values, rolling statistics, seasonality indicators, etc.
  3. Model Selection:

    • Choose appropriate machine learning algorithms for your forecasting task. Common choices include:
      • Linear Regression
      • Decision Trees
      • Random Forests
      • Gradient Boosting Machines
      • Long Short-Term Memory (LSTM) Networks (for time series forecasting)
    • You may also consider ensemble methods or stacking multiple models for better performance.
  4. Model Training:

    • Train your selected models using the training data.
    • Tune hyperparameters using techniques like cross-validation or grid search to optimize model performance.
  5. Model Evaluation:

    • Evaluate the trained models using appropriate metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), etc.
    • Compare the performance of different models to select the best one.
  6. Model Deployment:

    • Deploy the selected model for forecasting new data.
    • Monitor and update the model as needed.
MODEL WITH RANDOM SALES DATA:

Let's create a simple forecasting model and predict the sales price of an instrument for the next 30 years based on random sales price values for the last 10 years. We'll use linear regression for simplicity:
Sample Code:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Generating random sales price values for the last 10 years
np.random.seed(0)
years = np.arange(2014, 2024)
sales_price = np.random.randint(500, 1500, size=10)

# Creating a DataFrame for the historical sales data
data = pd.DataFrame({'Year': years, 'Sales_Price': sales_price})

# Plotting the historical sales data
plt.figure(figsize=(10, 6))
plt.plot(data['Year'], data['Sales_Price'], marker='o', linestyle='-')
plt.title('Historical Sales Price of Instrument')
plt.xlabel('Year')
plt.ylabel('Sales Price')
plt.grid(True)
plt.show()

# Creating features and target variable
X = data[['Year']]
y = data['Sales_Price']

# Creating a linear regression model
model = LinearRegression()

# Fitting the model
model.fit(X, y)

# Predicting sales price for the next 30 years
future_years = np.arange(2024, 2054).reshape(-1, 1)
future_sales_price = model.predict(future_years)

# Creating DataFrame for future predictions
future_data = pd.DataFrame({'Year': future_years.flatten(), 'Sales_Price': future_sales_price})

# Plotting historical and predicted sales data
plt.figure(figsize=(10, 6))
plt.plot(data['Year'], data['Sales_Price'], marker='o', linestyle='-', label='Historical Data')
plt.plot(future_data['Year'], future_data['Sales_Price'], marker='o', linestyle='--', color='red', label='Predicted Data')
plt.title('Sales Price of Instrument (Historical and Predicted)')
plt.xlabel('Year')
plt.ylabel('Sales Price')
plt.legend()
plt.grid(True)
plt.show()


Explanation:

let's break down the code line by line: python Copy code import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression These lines import necessary libraries: numpy for numerical operations, pandas for data manipulation, matplotlib.pyplot for plotting, and LinearRegression from sklearn.linear_model for fitting a linear regression model. python Copy code np.random.seed(0) years = np.arange(2014, 2024) sales_price = np.random.randint(500, 1500, size=10) Here, we set a random seed for reproducibility using np.random.seed(0). We then generate an array years containing years from 2014 to 2023, and generate random sales prices between 500 and 1500 for 10 years using np.random.randint(). python Copy code data = pd.DataFrame({'Year': years, 'Sales_Price': sales_price}) We create a DataFrame data using pd.DataFrame(), where we store the years and corresponding sales prices generated in the previous step. python Copy code plt.figure(figsize=(10, 6)) plt.plot(data['Year'], data['Sales_Price'], marker='o', linestyle='-') plt.title('Historical Sales Price of Instrument') plt.xlabel('Year') plt.ylabel('Sales Price') plt.grid(True) plt.show() This section plots the historical sales data. We create a figure with a size of 10x6 inches using plt.figure(figsize=(10, 6)). We then plot the sales price data against years using plt.plot(). The marker='o' and linestyle='-' arguments specify that markers should be used and lines should connect the markers. We set the title, labels for x and y axes, enable grid, and display the plot using plt.show(). python Copy code X = data[['Year']] y = data['Sales_Price'] Here, we create the feature matrix X and the target variable y. X contains the 'Year' column from the DataFrame, while y contains the 'Sales_Price' column. python Copy code model = LinearRegression() We instantiate a linear regression model using LinearRegression(). python Copy code model.fit(X, y) We fit the linear regression model to the data using the fit() method. This step trains the model on the historical data. python Copy code future_years = np.arange(2024, 2054).reshape(-1, 1) future_sales_price = model.predict(future_years) We generate future years from 2024 to 2053 using np.arange(). We reshape it to a column vector using reshape(-1, 1). Then, we use the trained model to predict the sales prices for these future years using the predict() method. python Copy code future_data = pd.DataFrame({'Year': future_years.flatten(), 'Sales_Price': future_sales_price}) We create a DataFrame future_data containing the future years and predicted sales prices. python Copy code plt.figure(figsize=(10, 6)) plt.plot(data['Year'], data['Sales_Price'], marker='o', linestyle='-', label='Historical Data') plt.plot(future_data['Year'], future_data['Sales_Price'], marker='o', linestyle='--', color='red', label='Predicted Data') plt.title('Sales Price of Instrument (Historical and Predicted)') plt.xlabel('Year') plt.ylabel('Sales Price') plt.legend() plt.grid(True) plt.show() Finally, we plot both historical and predicted sales prices on the same graph. We create a new figure, plot historical data, plot predicted data, set title, labels, legend, enable grid, and display the plot.


OUTPUT:

MODEL WITH USER DEFINED SALES DATA:

import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression # Prompting the user to input sales prices for the last 10 years sales_data = [] for year in range(2014, 2024): sales_price = float(input(f"Enter sales price for year {year}: ")) sales_data.append((year, sales_price)) # Creating a DataFrame for the historical sales data data = pd.DataFrame(sales_data, columns=['Year', 'Sales_Price']) # Plotting the historical sales data plt.figure(figsize=(10, 6)) plt.plot(data['Year'], data['Sales_Price'], marker='o', linestyle='-') plt.title('Historical Sales Price of Instrument') plt.xlabel('Year') plt.ylabel('Sales Price') plt.grid(True) plt.show() # Creating features and target variable X = data[['Year']] y = data['Sales_Price'] # Creating a linear regression model model = LinearRegression() # Fitting the model model.fit(X, y) # Predicting sales price for the next 30 years future_years = np.arange(2024, 2054).reshape(-1, 1) future_sales_price = model.predict(future_years) # Creating DataFrame for future predictions future_data = pd.DataFrame({'Year': future_years.flatten(), 'Sales_Price': future_sales_price}) # Plotting historical and predicted sales data plt.figure(figsize=(10, 6)) plt.plot(data['Year'], data['Sales_Price'], marker='o', linestyle='-', label='Historical Data') plt.plot(future_data['Year'], future_data['Sales_Price'], marker='o', linestyle='--', color='red', label='Predicted Data') plt.title('Sales Price of Instrument (Historical and Predicted)') plt.xlabel('Year') plt.ylabel('Sales Price') plt.legend() plt.grid(True) plt.show()

OUTPUT:

Enter sales price for year 2014: 450
Enter sales price for year 2015: 470
Enter sales price for year 2016: 478
Enter sales price for year 2017: 500
Enter sales price for year 2018: 560
Enter sales price for year 2019: 580
Enter sales price for year 2020: 590
Enter sales price for year 2021: 600
Enter sales price for year 2022: 603
Enter sales price for year 2023: 605






No comments:

Post a Comment

CHAPTER 18 EXPLORING THERMODYNAMICS WITH PYTHON: UNDERSTANDING CARNOT'S THEOREM AND MORE

  Python is a versatile programming language that can be used to simulate and analyze various physical phenomena, including thermal physics ...