zennn077/budget
收藏Hugging Face2024-02-10 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/zennn077/budget
下载链接
链接失效反馈官方服务:
资源简介:
!pip install requests-html
import requests
from bs4 import BeautifulSoup
import csv
# Function to scrape data from the website
def scrape_website(url):
# Send a GET request to the URL
response = requests.get(url)
# Check if the request was successful
if response.status_code == 200:
# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')
# Find the press release content
press_release_content = soup.find('div', {'id': 'divPressRelease'})
# Extract the title and content
title = press_release_content.find('h1').text.strip()
content = press_release_content.find('div', {'class': 'pressreldetail'}).text.strip()
return title, content
else:
print("Failed to retrieve data from the website.")
return None, None
# Main function
def main():
# URL of the website to scrape
url = 'https://www.pib.gov.in/PressReleasePage.aspx?PRID=1895315'
# Scrape data from the website
title, content = scrape_website(url)
# Write the scraped data to a CSV file
if title and content:
with open('scraped_data.csv', 'w', newline='', encoding='utf-8') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['Title', 'Content'])
writer.writerow([title, content])
print("Scraped data has been saved to 'scraped_data.csv'.")
else:
print("No data was scraped.")
提供机构:
zennn077
原始信息汇总
数据集概述
数据来源
- 数据来源于网站:https://www.pib.gov.in/PressReleasePage.aspx?PRID=1895315
数据内容
- 数据包括新闻发布的内容,具体包括标题和内容。
数据处理
- 通过网络爬虫技术从指定网站抓取数据。
- 使用BeautifulSoup库解析HTML内容,提取新闻发布的标题和内容。
数据存储
- 抓取的数据存储在CSV文件中,文件名为
scraped_data.csv。



