five

zennn077/budget

收藏
Hugging Face2024-02-10 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/zennn077/budget
下载链接
链接失效反馈
官方服务:
资源简介:
!pip install requests-html import requests from bs4 import BeautifulSoup import csv # Function to scrape data from the website def scrape_website(url): # Send a GET request to the URL response = requests.get(url) # Check if the request was successful if response.status_code == 200: # Parse the HTML content soup = BeautifulSoup(response.content, 'html.parser') # Find the press release content press_release_content = soup.find('div', {'id': 'divPressRelease'}) # Extract the title and content title = press_release_content.find('h1').text.strip() content = press_release_content.find('div', {'class': 'pressreldetail'}).text.strip() return title, content else: print("Failed to retrieve data from the website.") return None, None # Main function def main(): # URL of the website to scrape url = 'https://www.pib.gov.in/PressReleasePage.aspx?PRID=1895315' # Scrape data from the website title, content = scrape_website(url) # Write the scraped data to a CSV file if title and content: with open('scraped_data.csv', 'w', newline='', encoding='utf-8') as csvfile: writer = csv.writer(csvfile) writer.writerow(['Title', 'Content']) writer.writerow([title, content]) print("Scraped data has been saved to 'scraped_data.csv'.") else: print("No data was scraped.")
提供机构:
zennn077
原始信息汇总

数据集概述

数据来源

  • 数据来源于网站:https://www.pib.gov.in/PressReleasePage.aspx?PRID=1895315

数据内容

  • 数据包括新闻发布的内容,具体包括标题和内容。

数据处理

  • 通过网络爬虫技术从指定网站抓取数据。
  • 使用BeautifulSoup库解析HTML内容,提取新闻发布的标题和内容。

数据存储

  • 抓取的数据存储在CSV文件中,文件名为scraped_data.csv
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作