The Basics to Natural Language Procesing

Riya Kumar
4 min readFeb 23, 2019

Some days, I feel quite bad for Siri. Poor Siri has to endure countless hours of my little sister, and most conversations go like this:

“Hey Siri.”

“Hey Siri!”

“HEY SIRI!”

“HEY SIIIIIIRI!”

*beep*

“Tell me a joke!”

*I taught a wolf to meditate. Now he’s Aware Wolf.*

“Hahaha, hey Siri! Tell me a joke!”

Siri is a comedian

It’s a never-ending cycle, but Siri seems to always come up with a response which got me thinking how?

How does the software on a phone detect when we a speaker is talking to them, how does it know what they’re saying, and how does it even know what to respond with? Well, natural language processing (NLP) in large part is the reason why we have so many apps that can communicate with humans and make it seem like we are talking to another human. But after looking into it, there are more uses for NLP than just programs like Siri and Alexa. Uses of NLP can range from email sorting to recommended words when searching on Google.

There are 2 main components to how NLP works, NLUs and NLGs, so let’s break it down.

NLUs

NLU, which stands for natural language understanding, takes the speech it collects from the user and converts it to text for it to interpret. It has to recognize what the user is saying, and this process is called speech recognition. Speech recognition use HMMs, which are Hidden Markov Models, to recognize speech and convert it into words.

HMMs do this by breaking up what the user says into smaller chunks. For example, when we talk to Alexa, Alexa takes our input and breaks it up into smaller chunks a few milliseconds long. These chunks then are further separated into phonemes which is the smallest unit of speech, which makes one word different from another. Like the word “hat” has the phonemes “h”, “a”, and “t”. Alexa then tries to detect what phonemes we said when talking to it and compares the phonemes we said to phonemes that are in Alexa’s pre-recorded data set. Comparing the data sets, Alexa chooses the phonemes that were most likely the same and forms words in text form.

How POS works!

It then tries to understand what each word means through POS tagging, or Part-of-Speech, tagging where it tries to understand the words itself. Things such as whether a word is a noun or verb, what tense it’s in, and more are brought into consideration. While words may have many meanings or are synonyms, many NLU’s are trained so that the rules are applied in a way that they understand the difference between synonyms and words with many meanings.

What a sentence looks like after the POS system

NLGs

The second component to NLPs like Siri is NLGs, which stands for natural language generation. NLGs take information/answer they will give back to the user and brings it to the user. NLGs first organize the structure in which their response will be presented to the user, then form sentences with the grammar rules implemented in the system and the dictionary it has. The very last process in the NLG process involves the text being projected in the form of speech to the user if that is what is wanted from the program like it is in Siri, or Alexa, but is not always needed like in Google’s search engine with the recommended words. Text-to-speech takes over to finish the process in which the pauses and breaks are formulated through a prosody model. The words are formed through using phonemes that are pre-recorded in its data set forming a sentence the user can understand and hear.

The Future

What does the future of NLP look like? Using DNN’s (deep neural networks) as Google is researching into, it can make human to machine interactions feel more and more real allowing us to eventually create machines that can finally pass the Turing test! Using DNNs we can break it down into smaller chunks and have the text to speech sound more realistic.

--

--