Uso LangChain, RAG para hablar con el SP500

Hello, how are you? Welcome back to my "Learn to Program with Whiteboard and Marker" channel. Today, we’re going to do a very interesting experiment by mixing three technologies: Pandas, ChatGPT, and LangChain… well, actually four, because we’re also using a technology called RAG to interact with S&P 500 data. This experiment is interesting because RAG is the technology that allows us to interact with data. Let’s see if it works, so let’s get started. First, we need to define where we’re going to get the data from. I found a site with information about S&P 500 stocks, which has several tables and, at the end, all the components. I also found another source on Kaggle, which provides additional information like the market, ticker, company name, industry, some financial data like EBITDA, current price, market cap, and profit growth. Now, let’s see how to do it. We move to the code and load the necessary imports: requests, pandas, and BeautifulSoup to extract data from the HTML page. First, we use BeautifulSoup to get the last table on the page, then ask Pandas to read it as a dataframe. Next, we bring in the Kaggle dataset, which has a file called SP500_Companies, the one we’re interested in. We load it into a dataframe as well. Since I see that both tables have columns in common, like the ticker and symbol, I’m going to join them and remove some repeated or irrelevant columns, like the business summary, short name, and weight in the S&P 500 index. This way, we have the current price and other relevant data without redundancies. Let’s look at the data load result, where we get all the information for each company. Since there are many columns, let’s prepare the prompt to send the data to ChatGPT. I tell it that it’s a financial assistant providing information on S&P 500 stocks, indicating the dataframe’s columns and their contents. This is a large prompt that allows it to understand and interact with the data. Now, I hide some elements to make it easier to see on the screen. I’m sending the data, but it looks like I made a mistake that causes the process to end abruptly. Don’t worry; this is almost like a live stream, and there’s no time to edit. Remember to click the bell and subscribe because sometimes I forget to mention it. Now, let’s start interacting with the data. We can ask, for example, “How did Tesla do today?” Upon sending this query, ChatGPT responds that Tesla closed at $258.72, but I see that this data seems incorrect. Let’s check on Google, and sure enough, the price doesn’t match. This shows that the model might make up some data. Let’s try another: “Give me the tickers for the top three companies in the pharmaceutical sector.” The response is Molina Healthcare, Bristol-Myers, and AbbVie. Let’s try one more: “Give me the tickers for the top three companies.” Here, it responds with Zoetis, Dexcom, and Universal Health Services. This is what RAG does: it sends data to ChatGPT, which then analyzes it and responds with a conclusion. Although promising, this system isn’t always accurate. It’s essential to validate the data that ChatGPT provides, as it can sometimes make things up. I hope you enjoyed the experiment. I’ll upload the code to a GitHub repository and let you know in the comments. Don’t forget to click the bell, subscribe, and “like” the video because it really helps and motivates me. Bye, and thank you so much!