0 purchases
revealuserclassification 0.2.8
Performs user classification into labels using a set of seed Twitter users with known labels and the structure of the interaction network between them.
Features
Implementation of the [REVEAL FP7](http://revealproject.eu/) user-network-profile-classifier module.
Utilization of ARCTE algorithm for graph embedding via [reveal-graph-embedding](https://github.com/MKLab-ITI/reveal-graph-embedding).
Community weighting for improved graph-based user classification and via [reveal-graph-embedding](https://github.com/MKLab-ITI/reveal-graph-embedding).
Twitter list crowdsourcing for user annotation via [reveal-user-annotation](https://github.com/MKLab-ITI/reveal-user-annotation).
Messaging and communication with databases via [reveal-user-annotation](https://github.com/MKLab-ITI/reveal-user-annotation).
Install
### Required packages
- numpy
- scipy
- scikit-learn
- networkx
- [reveal-user-annotation](https://github.com/MKLab-ITI/reveal-user-annotation)
- [reveal-graph-embedding](https://github.com/MKLab-ITI/reveal-graph-embedding)
### Installation
To install for all users on Unix/Linux:
python3.4 setup.py build
sudo python3.4 setup.py install
Alternatively:
pip install reveal-user-classification
Reveal-FP7 Integration
The name of the entry point script is user_network_profile_classifier.
user_network_profile_classifier -uri MONGODBURI−idMONGO_ASSESSMENT_ID
-tak TWITTERAPPKEY−tasTWITTER_APP_SECRET
-rmquri AMQPURI−rmqqAMQP_QUEUE_NAME -rmqe AMQPEXCHANGE−rmqrkAMQP_ROUTING_KEY
-ln LATESTN−ltsLOWER_TIMESTAMP -uts UPPERTIMESTAMP−ntNUMBER_OF_PARALLEL_TASKS -nua NUMBEROFUSERSTOANNOTATE−unpcdbUSER_NETWORK_PROFILE_CLASSIFIER_MONGO_DB
The following two arguments are for establishing a connection to a Mongo database and
accessing the documents in a collection.
$MONGO_DB_URI example: “mongodb://admin:[email protected]:27017”
$MONGO_ASSESSMENT_ID example: “new_tweets_database_name.new_tweets_collection_name”, separated by a “.” as shown.
The following two arguments are for using a Twitter app in order to fetch data from Twitter.
TWITTERAPPKEYandTWITTER_APP_SECRET: Both are taken from one’s created app in the Twitter development site.
The following four arguments are for publishing messages to a RabbitMQ queue.
The queue is used both for publishing a “SUCCESS” message at completion,
but also for publishing the results of the module.
$AMQP_URI example: amqp://guest:guest@localhost:5672//
One must also supply: AMQPQUEUENAME,AMQP_EXCHANGE and $AMQP_ROUTING_KEY
There are some optional arguments that can be considered. The following three can be used either together or apart;
otherwise all of the tweets in the collection will be read.
$LATEST_N: The N latest chronologically documents will be read from the defined collection.
In order for this to work properly, the “created_at” field of the tweets must be in the proper time format as defined by MongoDB.
$LOWER_TIMESTAMP: A UNIX timestamp; based on the created_at tweet field. Only tweets after this timestamp will be used for the analysis.
$UPPER_TIMESTAMP: Similarly, for an upper limit.
The following four arguments set various parameters for the execution of the module.
$NUMBER_OF_PARALLEL_TASKS: Number of parallel tasks initiated for each assessment analysis launch. If not specified, tries to set as number of cores.
$NUMBER_OF_USERS_TO_ANNOTATE: Number of users to annotate automatically, using Twitter data. Each user requires approximately at least an additional minute. Default value is 90. For faster testing, try a smaller number.
Some intermediate data and the resulting user-to-topic association will be written in a Mongo database on the same Mongo client used for the input.
$USER_NETWORK_PROFILE_CLASSIFIER_MONGO_DB: A distinctive name should be chosen so as not to interfere with the databases reserved for input data. The collection in which the results are written is: “user_topics_collection”.
The entry point script can be viewed on /reveal_user_classification/entry_points/user_network_profile_classifier.py
where the argument usage can be read in greater detail.
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.