Paper Key : IRJ************323
Author: Sony Nagilla,Gayatri Mavuri,Kothapally Akanksha
Date Published: 01 Apr 2025
Abstract
Traditional web scraping tools often fail to efficiently navigate and extract data from complex, dynamic websites and typically lack capabilities for deep semantic understanding of content, posing significant challenges for businesses needing comprehensive product insights. WebSage, an AI-powered web scraping tool, is designed to address these challenges by using advanced natural language processing and data extraction technologies. Its main objective is to provide businesses, researchers, and consumers with an effective means of extracting and analyzing product data from various websites. To fill the gaps in current web scraping solutions, WebSage employs Gemini embeddings for semantic understanding and BeautifulSoup for precise data extraction, allowing for sophisticated querying and analytics. It utilizes a recursive crawling method to navigate and collect data across entire websites and integrates these capabilities within a user-friendly Streamlit interface. The significant outcomes of WebSage include its ability to answer complex queries, perform semantic searches, and generate detailed analytics and visualizations, which enhance decision-making and provide a competitive advantage in market analysis and e-commerce.
DOI Requested