GitLocker: The Coding Marketplace

Description:

newsfetch 0.2.8

news-fetch

news-fetch is an open-source, easy-to-use news crawler that extracts structured information from almost any news website. It can follow recursively internal hyperlinks and read RSS feeds to fetch both most recent and also old, archived articles. You only need to provide the root URL of the news website to crawl it completely. News-fetch combines the power of multiple state-of-the-art libraries and tools, such as news-please - Felix Hamborg and Newspaper3K - Lucas (æ¬§é˜³è±¡) Ou-Yang. This package consists of both features provided by Felix's work and Lucas' work.
I built this to reduce most of NaN or '' or [] or 'None' values while scraping for some news websites. Platform-independent and written in Python 3. Programmers and developers can very easily use this package to access the news data to their programs.

Source
Link

PyPI:
https://pypi.org/project/news-fetch/

Repository:
https://santhoshse7en.github.io/news-fetch/

Documentation:
https://santhoshse7en.github.io/news-fetch_doc/ (Not Yet Created!)

Dependencies

news-please
newspaper3k
beautifulsoup4
fake_useragent
selenium
chromedriver-binary
pandas

Extracted information
news-fetch extracts the following attributes from news articles. Also, have a look at an examplary JSON file extracted by news-please.

headline
name(s) of author(s)
publication date
publication
category
source_domain
article
summary
keyword
url
language

Dependencies Installation
Use the package manager pip to install following
pip install -r requirements.txt

Usage
Download it by clicking the green download button here on Github. To extract URLs from a targeted website, call the google_search function. You only need to parse the keyword and newspaper link argument.
>>> from newsfetch.google import google_search
>>> google = google_search('Alcoholics Anonymous', 'https://timesofindia.indiatimes.com/')

Use the URLs attribute to get the links of all the news articles scraped.
>>> google.urls

Directory of google search results urls

To scrape all the news details, call the newspaper function
>>> from newsfetch.news import newspaper
>>> news = newspaper('https://www.bbc.co.uk/news/world-48810070')

Directory of news

>>> news.headline

'g20 summit: trump and xi agree to restart us china trade talks'

Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
License
MIT

Files:

There are no reviews.

news-fetch 0.2.8

Languages

Categories

Description:

License

Share

Files:

Overview

What you can do with it

What you can't do with it

Related Products

Gitlocker Clone

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

zed

Awesome Game Analysis

Blender For UnrealEngine Addons

holodeck develop

aws-cli

django framework

fastapi

flask framework

jaxley

lorrystream

news-fetch 0.2.8

Languages

Categories

Description:

License

Share

Files:

Customer Reviews

License

Overview

What you can do with it

What you can't do with it

Related Products

Gitlocker Clone

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

zed

Awesome Game Analysis

Blender For UnrealEngine Addons

holodeck develop

aws-cli

django framework

fastapi

flask framework

jaxley

lorrystream