preprocessingtext 0.0.4

Creator: railscoder56

Last updated:

Add to Cart

Description:

preprocessingtext 0.0.4

# preprocessingtext A tool short, but very usefull to help in pre-processing data from texts. ## How to Install >> pip install --user preprocessingtext ## Usage#### Using stem_sentence() >> from preprocessingtext import CleanSentence >> cleaner = CleanSentence(idiom='portuguese') >> cleaner.stem_sentence(sentence="String", remove_stop_words=True, remove_punctuation=True, normalize_text=True, replace_garbage=True)To init a a class, you need to pass the idiom that you want to work. The custom value, is "portuguese". Before, you can instance a new object from CleanSentence, and call the method stem_sentence. You can choose in use "remove_stop_words" from string (pass True or False) and "remove_punctuation" from string (pass True or False), "replace_garbage" (True or False) removing values from data, and "normalize_text" (True or False) to normalize text. #### Usage of list_to_replaceYou can improve what you need to replace (clean) in your data. You can use "cleaner.list_to_replace.append('what_you_need_to_add')",or you can pass a new list of values: cleaner.list_to_replace = ['item1', 'item2', 'item3'] # Custom value of list_to_replace >> list_to_replace = ['https://', 'http://', 'You can't use 'macro parameter character #' in math modeYou can't use 'macro parameter character #' in math mode', '$', 'item1'] # Replacing values >> list_to_replace = ['item1', 'item2', 'item3'] #### Using tokenizer() >> cleaner.tokenizer('Um exemplo de tokens.') >> ['Um', 'exemplo', 'de', 'tokens']## Example ## Using all parameters of stem_sentence() >> string = "Eu sou uma sentença comum. Serei pré-processada com este modulo, veremos a serguir usando os métodos disponiveis" >> cleaner.stem_sentence(sentence=string, remove_stop_words=True, remove_punctuation=True, normalize_text=True, replace_garbage=True ) >> sentenc comum pre-process modul ver segu us metod disponi ## Don't using remove_stop_words >> print(cleaner.stem_sentence(sentence=string, remove_stop_words=False, remove_punctuation=True, normalize_text=True, replace_garbage=True ) ) >> eu sou uma sentenc comum ser pre-process com est modul ver a segu us os metod disponi ## Tokenizer >> print(cleaner.tokenizer('Um exemplo de tokens.')) >> ['Um', 'exemplo', 'de', 'tokens'] ## Cleaning garbage words >> string_web = 'Acesse esses links para ganhar dinheiro: https://easymoney.com.net and http://falselink.com' >> cleaner.stem_sentence(sentence=string_web, remove_stop_words=False, remove_punctuation=True, replace_garbage=True ) >> acess ess link par ganh dinh easymoney.com.net and falselink.com ## English example >> en_cleaner = CleanSentences(idiom='english') >> string_web = 'Access these links to gain money: https://easymoney.com.net and http://falselink.com' >> print(en_cleaner.stem_sentence(sentence=string_web, remove_stop_words=True, remove_punctuation=True, replace_garbage=True ) ) >> acc link gain money easymoney.com.net falselink.com # Author{ 'name': Everton Tomalok, 'email': evertontomalok123@gmail.com }

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.