As a Junior Data Scientist, I’m deeply passionate about every stage of working with data, from collecting raw datasets to cleaning and transforming them into valuable insights. I thrive on building models that drive meaningful decisions and solve real-world problems.
I have build different project about web scraping and data cleaning and and building machine learning models.
Web scraping is a technique used to collect and extract structured data from publicly available websites.
It enables businesses and researchers to gather valuable information for analysis, monitoring, and decision-making.
Below are some projects in which I scraped data from various websites using Python with Beautiful Soup, Selenium, and Playwright..
This project will guide you in extracting data tables in HTML format. We extract data from a static website, Wikipedia. This project involves scraping a data table step by step. It demonstrates how to collect structured data from a web page.
The source code is available in the GitHub repository Scrape-table-from-website.
This project collects e-commerce product and pricing data to support competitor monitoring and market research from a website. It extracts product details, prices, and customer ratings from public product pages. Static content is handled using eautiful Soup.
The source code is available in the GitHub repository ecommerce-data-collection.
This project focuses on collecting news articles from a dynamic news website. It extracts the article title, full content, and publication date using Selenium, as the site relies on dynamic loading. Playwright can also be used as an alternative for handling dynamic content.
The source code is available in the GitHub repository scrape-news-articles.
This project focuses on collecting publicly available Airbnb listing data for market analysis.
It extracts key details such as listing title, price, location, ratings, and availability from dynamically loaded pages. Selenium is used to handle dynamic content and user interactions, with Playwright as an alternative approach.
The source code is available in the GitHub repository airbnb-market-data-collection.
Collecting data to gain insights is important in the field of data science.
Here, I will share the projects I have done, along with their source code.
Thank you very much for the great work you did on the project.