Spell Correction using CNN 1d and BILSTM

Hi, everyone, recently I have been worked on conversational UI chatbot, in that chatbot we had planned to add spell correction so that our chatbot is robust with spell mistake chatted by the user. Our team created the conversational chatbot for the wedding card website. The user can easily make a spelling mistake, for example, most of the people type wedding card as wedd cards in short form or hinducard as hindu card due to that I created the spell correction for a closed domain, I trained the word which is limited to the wedding domain. So you can train any other domain word regarding your problem.

This spell correction model can handle no space between the 5 words, include misspelled words and but it is not a contextual spelling correction model, because I didn’t train the model as seq to seq. If you want contextual spelling correction we need to change the architecture of the model.

In the upcoming blog, I try to explain how to make contextual spell correction model.

For example, Let say I trained a model with both words card and cart

misspelled sentence 1: I want a laseer wedding cart

misspelled sentence 2: In amazon.com my add to card is nott showing in the screen.

If it is a non-contextual model

corrected sentence 1: I want a laser wedding cart

corrected sentence 2: In amazon.com my add to card is not showing in the screen.

It is corrected the laseer word to laser and nott to not but does not change the cart to the card because the cart is also a correct word but it is not the contextual correct word.

If it is a contextual model

corrected sentence 1: I want a laser wedding card

corrected sentence 2: In amazon.com my add to cart is not showing in the screen.

As we can see that it also changed the cart to card in the first sentence and card to cart in the second sentence even though cart and card are correct words but it is changed due to contextual not correct word in that sentence. I hope you understand the contextual spell correction.

Now I going to give a small brief about how my spelling correction model is work.

As I said it can handle no space between the 5 words but in the real world mostly human can make mistake with no space between 2 words or rarely 3 words.

I architected to two models, the First model is to add space between the words if no space is between the words in a sentence. It does not correct the misspelled word.

For example,

Sentence: showwmesomme weddcard and also a special birthday card.

First, we remove punctuation, a single word or two words, we will split the sentence using space as delimiter like below

. punctuation and ‘a’ is removed from the sentence.

Split sentence: [showwmesomme,weddcard,and,also,special,birthday,card]

then we send this list to the first model. It will add space like below

space added sentence : [showw me somme,wedd card,and,also,special,birthday,card]

after we split it.

Split word: [showw,me,somme,wedd,card,and,also,special,birthday,card]

Now all the word are split from the sentence, we will then send the split word to the second model which will correct the misspelled word.

Second model output: [show,me,some,wedding,card,and,also,special,birthday,card]

then we combine this word into a sentence.

corrected sentence: Show me some wedding card and also special birthday card

Note: It does not handle the number but, we can train a model to handle the number for example

I want1000sample of wedding card

first step spacing I want 1000 sample of wedding card

then if it is a number don’t do anything keep it has same.

That’s it about how my spelling correction will work.

Now I explain how to create the dataset and also how to train the model.

The first step is to create the dataset It is explained in the notebook itself, So I am not going to explain it, So please download the dataset.ipynb notebook and run it.


To train the model please download the notebook from the below link and run it.

CNN 1d is very fast compared to LSTM, but both have some advantage and disadvantage. In my model I used both the CNN 1d and also BILSTM then I concatenated this two output layer and connected with a dense layer or fully connected layer.

First Model Architecture:

In the sequential model, there are many types

In this, we used many to many or char to char level sequential model.

For example: sentence : pleasegive your contact number

input to model : [p,l,e,a,s,e,g,i,v,e, ,y,o,u,r, ,c,o,n,t,a,c,t, ,n,u,m,b,e,r]

In deep learning every thing fixed input vector so we have set a fixed input value. Let say set to 50

then it is converted to index

[1,4,5,3,9,5,10,30,5,39,33,34,29,20,39, ………….,after 30…000 , we add 0 until it get 50.]

output of the model : [1,4,5,3,9,5,39,10,30,5,39,33,34,29,20,39, ………….,0,0,0,0,0,0,0,0,0,0,0]

after decoding we get [p,l,e,a,s,e, ,g,i,v,e, ,y,o,u,r, ,c,o,n,t,a,c,t, ,n,u,m,b,e,r,]

input and output is maps to many to many.

So the input and output length of the element is the same.

Second Model Architecture:

In the second model, we combined both CNN 1d and BILSTM output layer then we reshaped the combined layer to 1d vector or flattened the combined layer so that we can connect with a dense layer to classify the correct word.

So it is a character to word level model which is not a sequential model like the first model.


word_list : [pllease,give,youtr,conttacte,num]


One thing I want to say is don’t set the embedding layer to trainable because we are using char to char process so if you set the embedding layer to trainable it will learn the relation between the character according to the dataset.

In real-world A is not related to B and so on maybe we get the relation between alphabet and number, but not between the alphabet.

For example, let say In our dataset we find that the mostly letter is e and also e follow the t that means (‘te’) then embedding layer may give more weight to ‘e’ and also it will learn that after t mostly e will come. So the embedding layer gets biased toward ‘e’ letter.

So mostly don’t train the embedding layer if it is a character process and set the weight for each and every letter has one hot encoder.

Hint for Contextual spell correction:

If you contextual spell correction first thing is you want more dataset so that it can learn the contextual thing. If we use small dataset it will be biased toward the dataset.

the second thing is you need seq to seq model like translation model.

please download the code below and make use of it

and also download the below pre-trained model and dataset, So that you can quickly try it out, Please check what word I trained.

Thanks for your wonderful time spent on my blog…………………

Data Scientist