facebook-page-scraper 5.0.6

Creator: bigcodingguy24

Last updated:

Add to Cart

Description:

facebookpagescraper 5.0.6

Facebook Page Scraper


No need of API key, No limitation on number of requests. Import the library and Just Do It !

Table of Contents

Table of Contents


Getting Started

Prerequisites
Installation

Installing from source
Installing with PyPI




Usage

How to instantiate?

Parameters for Facebook_scraper()
Scrape in JSON format
JSON Output Format

Scrape in CSV format
Parameters for scrape_to_csv() method

Keys of the output data


Tech
License



Prerequisites

Internet Connection
Python 3.7+
Chrome or Firefox browser installed on your machine



Installation:
Installing from source:
git clone https://github.com/shaikhsajid1111/facebook_page_scraper

Inside project's directory
python3 setup.py install


Installing with pypi
pip3 install facebook-page-scraper



How to use?
#import Facebook_scraper class from facebook_page_scraper
from facebook_page_scraper import Facebook_scraper

#instantiate the Facebook_scraper class

page_or_group_name = "Meta"
posts_count = 10
browser = "firefox"
proxy = "IP:PORT" #if proxy requires authentication then user:password@IP:PORT
timeout = 600 #600 seconds
headless = True
# get env password
fb_password = os.getenv('fb_password')
fb_email = os.getenv('fb_email')
# indicates if the Facebook target is a FB group or FB page
isGroup= False
meta_ai = Facebook_scraper(page_or_group_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless, isGroup=isGroup)

Parameters for Facebook_scraper(page_name, posts_count, browser, proxy, timeout, headless) class



Parameter Name
Parameter Type
Description




page_or_group_name


String


Name of the facebook page or group




posts_count


Integer


Number of posts to scrap, if not passed default is 10




browser


String


Which browser to use, either chrome or firefox. if not passed,default is chrome




proxy(optional)


String


Optional argument, if user wants to set proxy, if proxy requires authentication then the format will be user:password@IP:PORT




timeout


Integer


The maximum amount of time the bot should run for. If not passed, the default timeout is set to 10 minutes





headless


Boolean


Whether to run browser in headless mode?. Default is True





isGroup


Boolean


Whether the Facebook target is a group or page. Default is False





username


String


username to log into Facebook when scraping (recommended to use .env)





password


String


password to log into Facebook when scraping (recommended to use .env)







⚠️ Warning: Use Logged-In Scraping at Your Own Risk ⚠️
Using logged-in scraping methods may result in the permanent suspension of your account. Proceed with caution, as violating a platform's terms of service can lead to severe consequences. Exercise discretion and adhere to ethical practices when collecting data through scraping. The library/provider assumes no responsibility for any consequences resulting from the misuse of scraping methods.
Done with instantiation?. Let the scraping begin!


For post's data in JSON format:
#call the scrap_to_json() method

json_data = meta_ai.scrap_to_json()
print(json_data)

Output:
{
"2024182624425347": {
"name": "Meta AI",
"shares": 0,
"reactions": {
"likes": 154,
"loves": 19,
"wow": 0,
"cares": 0,
"sad": 0,
"angry": 0,
"haha": 0
},
"reaction_count": 173,
"comments": 2,
"content": "We’ve built data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. We applied it to different modalities and found it matches or outperforms the best self-supervised algorithms. We hope this brings us closer to a world where computers can learn to solve many different tasks without supervision. Learn more and get the code: https://ai.facebook.com/…/the-first-high-performance-self-s…",
"posted_on": "2022-01-20T22:43:35",
"video": [],
"image": [
"https://scontent-bom1-2.xx.fbcdn.net/v/t39.30808-6/s480x480/272147088_2024182621092014_6532581039236849529_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=8024bb&_nc_ohc=j4_1PAndJTIAX82OLNq&_nc_ht=scontent-bom1-2.xx&oh=00_AT9us__TvC9eYBqRyQEwEtYSit9r2UKYg0gFoRK7Efrhyw&oe=61F17B71"
],
"post_url": "https://www.facebook.com/MetaAI/photos/a.360372474139712/2024182624425347/?type=3&__xts__%5B0%5D=68.ARBoSaQ-pAC_ApucZNHZ6R-BI3YUSjH4sXsfdZRQ2zZFOwgWGhjt6dmg0VOcmGCLhSFyXpecOY9g1A94vrzU_T-GtYFagqDkJjHuhoyPW2vnkn7fvfzx-ql7fsBYxL5DgQVSsiC1cPoycdCvHmi6BV5Sc4fKADdgDhdFvVvr-ttzXG1ng2DbLzU-XfSes7SAnrPs-gxjODPKJ7AdqkqkSQJ4HrsLgxMgcLFdCsE6feWL7rXjptVWegMVMthhJNVqO0JHu986XBfKKqB60aBFvyAzTSEwJD6o72GtnyzQ-BcH7JxmLtb2_A&__tn__=-R"
}, ...

}


Output Structure for JSON format:
{
"id": {
"name": string,
"shares": integer,
"reactions": {
"likes": integer,
"loves": integer,
"wow": integer,
"cares": integer,
"sad": integer,
"angry": integer,
"haha": integer
},
"reaction_count": integer,
"comments": integer,
"content": string,
"video" : list,
"image" : list,
"posted_on": datetime, //string containing datetime in ISO 8601
"post_url": string
}
}





For saving post's data directly to CSV file
#call scrap_to_csv(filename,directory) method


filename = "data_file" #file name without CSV extension,where data will be saved
directory = "E:\data" #directory where CSV file will be saved
meta_ai.scrap_to_csv(filename, directory)

content of data_file.csv:
id,name,shares,likes,loves,wow,cares,sad,angry,haha,reactions_count,comments,content,posted_on,video,image,post_url
2024182624425347,Meta AI,0,154,19,0,0,0,0,0,173,2,"We’ve built data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. We applied it to different modalities and found it matches or outperforms the best self-supervised algorithms. We hope this brings us closer to a world where computers can learn to solve many different tasks without supervision. Learn more and get the code: https://ai.facebook.com/…/the-first-high-performance-self-s…",2022-01-20T22:43:35,,https://scontent-bom1-2.xx.fbcdn.net/v/t39.30808-6/s480x480/272147088_2024182621092014_6532581039236849529_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=8024bb&_nc_ohc=j4_1PAndJTIAX82OLNq&_nc_ht=scontent-bom1-2.xx&oh=00_AT9us__TvC9eYBqRyQEwEtYSit9r2UKYg0gFoRK7Efrhyw&oe=61F17B71,https://www.facebook.com/MetaAI/photos/a.360372474139712/2024182624425347/?type=3&__xts__%5B0%5D=68.ARAse4eiZmZQDOZumNZEDR0tQkE5B6g50K6S66JJPccb-KaWJWg6Yz4v19BQFSZRMd04MeBmV24VqvqMB3oyjAwMDJUtpmgkMiITtSP8HOgy8QEx_vFlq1j-UEImZkzeEgSAJYINndnR5aSQn0GUwL54L3x2BsxEqL1lElL7SnHfTVvIFUDyNfAqUWIsXrkI8X5KjoDchUj7aHRga1HB5EE0x60dZcHogUMb1sJDRmKCcx8xisRgk5XzdZKCQDDdEkUqN-Ch9_NYTMtxlchz1KfR0w9wRt8y9l7E7BNhfLrmm4qyxo-ZpA&__tn__=-R
...




Parameters for scrap_to_csv(filename, directory) method.



Parameter Name
Parameter Type
Description




filename


String


Name of the CSV file where post's data will be saved




directory


String


Directory where CSV file have to be stored.






Keys of the outputs:




Key


Type


Description






id


String


Post Identifier(integer casted inside string)





name


String


Name of the page




shares


Integer


Share count of post




reactions


Dictionary


Dictionary containing reactions as keys and its count as value. Keys => ["likes","loves","wow","cares","sad","angry","haha"]




reaction_count


Integer


Total reaction count of post




comments


Integer


Comments count of post




content


String


Content of post as text




video


List


URLs of video present in that post




images


List


List containing URLs of all images present in the post




posted_on


Datetime


Time at which post was posted(in ISO 8601 format)




post_url


String


URL for that post





Tech
This project uses different libraries to work properly.

Selenium
Webdriver Manager
Python Dateutil
Selenium-wire



If you encounter anything unusual please feel free to create issue here

LICENSE
MIT

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.