datasetscraper 0.0.4

Last updated: September 28, 2024

0 purchases

Free

Creator: bradpython12

Languages

Python

Description:

datasetscraper 0.0.4

DatasetScraper
Tool to create image datasets for machine learning problems by scraping search engines like Google, Bing and Baidu.
Features:

Search engine support: Google, Bing, Baidu. (in-production): Yahoo, Yandex, Duckduckgo
Image format support: jpg, png, svg, gif, jpeg
Fast multiprocessing enabled scraper
Very fast multithreaded downloader
Data verification after download for assertion of image files

Installation

COMING SOON on pypi

Usage:

Import
from datasetscraper import Scraper

Defaults

obj = Scraper()
urls = obj.fetch_urls('kiniro mosaic')
obj.download(urls, directory='kiniro_mosaic/')

Specify a search engine

obj = Scraper()
urls = obj.fetch_urls('kiniro mosaic', engine=['google'])
obj.download(urls, directory='kiniro_mosaic/')

Specify a list of search engines

obj = Scraper()
urls = obj.fetch_urls('kiniro mosaic', engine=['google', 'bing'])
obj.download(urls, directory='kiniro_mosaic/')

Specify max images (default was 200)

obj = Scraper()
urls = obj.fetch_urls('kiniro mosaic', engine=['google', 'bing'], maxlist=[500, 300])
obj.download(urls, directory='kiniro_mosaic/')

FAQs

Why aren't yandex, yahoo, duckduckgo and other search engines supported?
They are hard to scrape, I am working on them and will update as soon as I can.

I set maxlist=[500] why are only (x<500) images downloaded?
There can be several reasons for this:

Search ran out: This happens very often, google/bing might not have enough images for your query
Slow internet: Increase the timeout (default is 60 seconds) as follows: obj.download(urls, directory='kiniro_mosaic/', timeout=100)

How to debug?
You can change the logging level while making the scraper object : obj = Scraper(logger.INFO)

TODO:

More search engines
Better debug
Write documentation
Text data? Audio data?

License:

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Files In This Product: (if this is empty don't purchase this product)

There are no reviews.

zed

datasetscraper 0.0.4

Languages

Categories

Description:

License:

Share

Files In This Product: (if this is empty don't purchase this product)

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

More From This Creator

xdict 1.1.11

xdisplayselect 1.0.0

xfcs 1.1.6

xfcsdashboard 0.0.2

xfds 0.3.0

datasetscraper 0.0.4

Languages

Categories

Description:

License:

Share

Files In This Product: (if this is empty don't purchase this product)

Customer Reviews

License

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

zed

More From This Creator

xdict 1.1.11

xdisplayselect 1.0.0

xfcs 1.1.6

xfcsdashboard 0.0.2

xfds 0.3.0