Basic ChatBots And Voice processing Functionality in Python

azam sayeed
4 min readMay 26, 2020

ChatBots

An AI-based software that helps the customers quick and faster response by mimicking a human's behavior.

Benefits

  1. Cost and Time Effective
  2. Cheaper to maintenance
  3. Degree of automation for mundane tasks
  4. Increase productivity especially in Customer Support

Chatbot evolution

1st Generation — Traditional Bot

  • System driven
  • automation Scripts
  • Minimum functionality and the specific context

2nd Generation- Current Bots

  • Task level Automation (can perform task or understand required task)
  • System and Task Context
  • Immediate Communication and Customer help

Next Generation — Future bot

  • multiple levels of Communication
  • Service level automation
  • Service,task and People Context
  • Understand and behave as Humans with emotions

Working on Chatbot

  1. Rule-Based Bots — there are bots that are trained on Rules, cannot handle Complex queries . Designed for Preconditioned questions and undergo simple training
  2. Self Learning Bots — These are Machine Learning based Bots that can be more efficient but require Complex Training. Tries to understand, analyze, and learn from user queries.

Types of Self Learning Bots

  1. Generative Bots- Does Word to Word Analysis, may not respond always
  2. Retrieval-Based Bots- Predefined Responses and Context-based answers
Flow Chart of ChatBots

NLTK- Package used for Human Language Data Preprocessing using nlp Concepts of Classification(identifying stopwords, nouns, verbs etc) , Tokenization (each of the words are broken into smaller units called token for preprocessing) , Stemming, Parsing

For Chatbot to understand Query String or Language for training it cannot understand Stream of text but needs a means of Input Numerical Values, this can be achieved using Text Pre-processing

Case Conversion — Converting Text to Familar and basic Cases and avoid complex mixed cases

Sentence Tokenizers- Each of the Sentence is converted into Tokens

Word Tokenizers- Each of the word is converted into Tokens

Building ChatBot

  • NLTK Tokenizer , It is a pretrained Tokenizer provided by NLTK which supports Removal of noise (Not number or letters), Stop words( like a, at,in ,the etc) , Stemming — words derived from stem words (ex- love, loving,loved,lover ) and Lemmatization — similar words based on context (ex-good ,better,best,awesome,great,wow etc gives degree of same meaning)
  • Bag of Words- Input of Chatbot is given a String which is converted into numerical format for chatbot to understand, but we don't convert all the input words in String as numbers but only the ones that are important to the bots.Input is called Bag of words because input String is just placed as a bag of words with no order or grammar sense. Ex- Input String:” I don't like injustice” may be placed in bag of words like don't I like injustice , next there is feature vectorization , the vector is 1 if word is required otherwise 0 . Ex; we don't want the words ‘dont’ in above string required Vectorization: I like injustice then the resultant vector is (1,0,1,1)

TF-IDF Formula

In Bag of Words input, may have lot of words . Adv is that there are lots of words but the downside being a compromise of Information for Bot. The Downside is fixed using below:

Term Frequency — Number of times a term appears/Total terms in the document

Inverse Document Frequency-

IDF=1+log(x/y)

x- Number of Documents, y- Number of documents with the occurrence of term

import io
import random
import string # to process standard python strings
import warnings
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import warnings
warnings.filterwarnings('ignore')
import nltk
from nltk.stem import WordNetLemmatizer
nltk.download('popular', quiet=True) # for downloading packages
# uncomment the following only the first time
#nltk.download('punkt') # first-time use only
#nltk.download('wordnet') # first-time use only
#Reading in the corpus
with open('chatbot.txt','r', encoding='utf8', errors ='ignore') as fin:
raw = fin.read().lower()
#TOkenisation
sent_tokens = nltk.sent_tokenize(raw)
word_tokens = nltk.word_tokenize(raw)
# Preprocessing
lemmer = WordNetLemmatizer()
def LemTokens(tokens):
return [lemmer.lemmatize(token) for token in tokens]
remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)
def LemNormalize(text):
return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))
# Keyword Matching
GREETING_INPUTS = ("hello", "hi", "greetings", "sup", "what's up","hey",)
GREETING_RESPONSES = ["hi", "hey", "*nods*", "hi there", "hello", "I am glad! You are talking to me"]
def greeting(sentence):
"""If user's input is a greeting, return a greeting response"""
for word in sentence.split():
if word.lower() in GREETING_INPUTS:
return random.choice(GREETING_RESPONSES)
# Generating response
def response(user_response):
robo_response=''
sent_tokens.append(user_response)
TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')
tfidf = TfidfVec.fit_transform(sent_tokens)
vals = cosine_similarity(tfidf[-1], tfidf)
idx=vals.argsort()[0][-2]
flat = vals.flatten()
flat.sort()
req_tfidf = flat[-2]
if(req_tfidf==0):
robo_response=robo_response+"I am sorry! I don't understand you"
return robo_response
else:
robo_response = robo_response+sent_tokens[idx]
return robo_response
flag=True
print("ROBO: My name is Robo. I will answer your queries about Chatbots. If you want to exit, type Bye!")
while(flag==True):
user_response = input()
user_response=user_response.lower()
if(user_response!='bye'):
if(user_response=='thanks' or user_response=='thank you' ):
flag=False
print("ROBO: You are welcome..")
else:
if(greeting(user_response)!=None):
print("ROBO: "+greeting(user_response))
else:
print("ROBO: ",end="")
print(response(user_response))
sent_tokens.remove(user_response)
else:
flag=False
print("ROBO: Bye! take care..")

Reference Link: https://github.com/parulnith/Building-a-Simple-Chatbot-in-Python-using-NLTK/blob/master/chatbot.py

For Speech Recognition we use Recognizer class provided by speechrecognition package which has instances which are used to recognize speech. Each instance has seven methods to recognize speech from any audio using various APIs we use recognize_google()

Basic Example for Speech to Text

import speech_recognition as sr
r1=sr.Recognizer()
with sr.Microphone() as source:
print('Speak now')
audio =r1.listen(source)
text=r1.recognize_google(audio)
print(text)

Basic Example for Text to Speech

# importing the pyttsx library 
import pyttsx3

# initialisation
engine = pyttsx3.init()

# testing
engine.say("My first code on text-to-speech")
engine.say("Thank you, Geeksforgeeks")
engine.runAndWait()

--

--