How to add stop word remover, lemmatization and stemming feature in Rasa NLU

1 min readApr 21, 2019

Hi everyone, I have been working on Rasa Stack for past 4 months, and we were doing chatbot for a wedding card website, In that chatbot, I planned to add stop word remover so that it can predict the intent and entity much easily.

What is a stop word?

Words that are filtered out by Web search engines and other enterprise searching and indexing platforms. Stop words are natural language words which have very little meaning, such as “and”, “the”, “a”, “an”, and similar words.

NLTK stop words

Natural language processing (nlp) is a research field that presents many challenges such as natural language…

pythonspot.com

What are lemmatization and stemming?

Stemming and Lemmatization in Python

Stemming and Lemmatization are Text Normalization (or sometimes called Word Normalization) techniques in the field of…

www.datacamp.com

please download the file from below link and copy the file and overwrite the existing file.

vigneshgig/rasanlu_stopword

Contribute to vigneshgig/rasanlu_stopword development by creating an account on GitHub.

github.com

Note : New rasa stack planning to merge both rasa nlu and rasa core, I don’t know the path, So please search for the whitespace_tokenizer.py in the rasa stack installed path and overwrite the file and it only works for tensorflow embedding because in tensorflow embedding only we use the whitespace_tokenizer feature and for another tokenizer you can write the code for stop_word remover.