Python Tutorial: Using Lasso to Predict Stock Prices

Seng Wee Ngui
4 min readMar 9, 2023

--

In this tutorial, we will learn how to use Lasso, a linear regression technique, to predict stock prices using historical data. We will be using the yfinance library to download Google stock prices.

Prerequisites

To follow along with this tutorial, you will need to have the following installed:

  • Python 3
  • numpy library
  • pandas library
  • scikit-learn library
  • yfinance library

You can install these libraries using pip, by running the following command in your terminal:

pip install numpy pandas scikit-learn yfinance

Pros and Cons of Lasso

Pros

  • Lasso can perform feature selection by shrinking the coefficients of less important features to zero.
  • Lasso can handle high-dimensional data, where the number of features is greater than the number of observations.
  • Lasso can improve the interpretability of a model by reducing the number of features.

Cons

  • Lasso may not perform well if the number of predictors is much larger than the number of observations, or if there is high multicollinearity between the predictors.
  • Lasso can be sensitive to outliers in the data.
  • Lasso may not be able to select the "correct" set of predictors in some cases, as it can only select from the available set of predictors.

Cases where Lasso should be used

  1. Feature selection: Lasso can be used in medical research to identify the most important biomarkers for a particular disease, where the dataset has a large number of biomarkers and we want to identify the most important ones for our analysis.
  2. Multicollinearity: Lasso can be used in finance to predict stock prices, where there is high correlation between the predictors, such as interest rates, inflation rates, and GDP growth rates. Lasso can shrink the coefficients of less important predictors to zero, reducing the effect of multicollinearity and improving the accuracy of the model.
  3. Interpretability: Lasso can be used in marketing research to understand the relationship between advertising spending and sales revenue. By reducing the number of advertising channels to the most effective ones, Lasso can improve the interpretability of the model and help marketers to make better decisions.

Cases where Lasso should not be used

  1. Small sample size: Lasso may not perform well in drug discovery, where the number of samples is much smaller than the number of molecular descriptors. In such cases, the model may be overfitting the data, leading to poor performance on new drugs.
  2. Non-linear relationships: Lasso may not be suitable for predicting the stock prices of companies whose business models are highly dependent on social media sentiment. In such cases, the relationship between social media sentiment and stock prices may not be linear, and Lasso may not be able to capture the underlying patterns in the data.
  3. Outliers: Lasso may not be suitable for predicting the credit risk of borrowers who have a history of bankruptcy or foreclosure. In such cases, the dataset may contain outliers, and Lasso may not be the best choice for modeling the data.

Downloading the Data

First, let's download the data using yfinance. We will be using historical data for Google stock prices from January 1st, 2010 to December 31st, 2019.

import yfinance as yf

# Download the data
stock_data = yf.download("GOOGL", start="2010-01-01", end="2019-12-31")

# Print the first 5 rows of the data
print(stock_data.head())

This will output the first 5 rows of the downloaded data.

Preparing the Data

Next, we need to prepare the data for our Lasso model. We will be using the Adj Close column as our target variable and the other columns as our features.

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

# Create the feature matrix
X = stock_data.drop("Adj Close", axis=1)

# Create the target variable vector
y = stock_data["Adj Close"]

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Here, we have split the data into training and testing sets using a test size of 0.2 (20% of the data will be used for testing).

Training the Model

Now, let's train our Lasso model using the training data.

from sklearn.linear_model import Lasso

# Create the Lasso model
lasso = Lasso(alpha=0.1)

# Train the model
lasso.fit(X_train, y_train)

Here, we have created a Lasso model with an alpha value of 0.1 and trained it using the training data.

Evaluating the Model

Finally, let's evaluate our model using the testing data and calculate the mean squared error (MSE).

from sklearn.metrics import mean_squared_error

# Make predictions using the testing data
y_pred = lasso.predict(X_test)

# Calculate the MSE
mse = mean_squared_error(y_test, y_pred)

print("Mean Squared Error:", mse)

This will output the mean squared error of our Lasso model.

Conclusion

In this tutorial, we learned how to use Lasso to predict stock prices using historical data. We downloaded the data using yfinance, prepared the data for our Lasso model, trained the model, and evaluated its performance using mean squared error.

--

--

Seng Wee Ngui

I lead teams to build enterprise-grade AI products and solutions.