Threaded-Sparse-TFIDF 0.2

Creator: bradpython12

Last updated:

0 purchases

TODO
Add to Cart

Description:

ThreadedSparseTFIDF 0.2

Threaded-Sparse-TFIDF
Creating a repository for multithreading TF-IDF vectorization for similarity search using sparse matrices for computations.
Usage:
from TF_IDF import TF_IDF_Vectorizer

tf_idf = TF_IDF_Vectorizer(use_cached=True, print_output=False)
_, ranking = tf_idf.get_similarity_score("science fiction super hero movie", num_workers=k)

Performance:
Image:

Table:



num_workers
time
partition_size




1.0
1.1117637634277344
6.778499999999999


2.0
0.8195240020751953
3.4149000000000003


3.0
0.7357232332229614
2.2773


4.0
0.7232689380645752
1.7081


5.0
0.7375946760177612
1.3555999999999997


6.0
0.7682486534118652
1.1307000000000003


7.0
0.7640876531600952
0.9618


8.0
0.7513441801071167
0.8506


9.0
0.7795052766799927
0.7587


10.0
0.8141436100006103
0.6807


11.0
0.8003325223922729
0.6195000000000002


12.0
0.8441393852233887
0.5697


13.0
0.8490614175796509
0.5258000000000002


14.0
0.9322290658950806
0.48739999999999994


15.0
0.8824400186538697
0.45729999999999993



Data
A subset of the Information Retrieval Dataset - Internet Movie Database (IMDB) specifically movies after the year 2007.

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Files In This Product:

Customer Reviews

There are no reviews.