After playing a choose-your-own-adventure game, using a chatbot, or asking Siri about the weather, one has likely used natural language processing (NLP)—a branch of artificial intelligence (AI) that combines linguistics with machine learning by teaching a program to understand human language.
Compared to other forms of AI, which may focus on visual images or robotics, NLP makes use of spoken and written language. It processes human language by dividing it into chunks of data that will allow programs to compare it with previous data. Then, it analyzes the meaning of the words and takes note of patterns and contextual information.
One of the most universal examples of NLP applications includes automated personal assistants such as Google, Siri, and Alexa. By taking speech or voice signals and interpreting the meaning of the words, these programs can perform the requested command.
There is a subset of NLP called natural language understanding (NLU) that specifically converts human language into computer representations that can be understood by programs. NLU enables computers to understand human language commands without the syntax of computer languages, allowing them to have a back-and-forth interaction with humans using human language. Chatbots are a prime example of these, particularly ones used for customer service.
The faculty of the Center for Language and Technologies (CeLT) at De La Salle University (DLSU) have worked on natural language processing, understanding, and generation since the early 2000s.
Dr. Charibeth Cheng, associate dean of the College of Computer Studies, shares her current work on Filipino NLP projects, “I am currently part of two DOST-funded projects. The first project will create the new Filipino Wordnet using purely deep learning approaches. The second project will develop a bilingual (Filipino and Tagalog) health monitoring chatbot for children.”
She also explains how Filipino NLP projects have faced a few obstacles along the way because of the grammatical standards of our language. “Our language is morphologically rich and we can form so many words with our affixes,” she says, adding that “models have been developed in DLSU that show the machine ‘understanding’ our verbs and generating naturally formed sentences by simply seeing samples of our language.”
Cheng also discusses other problems she has faced in Filipino NLP research, such as insufficient resources and the evaluation of informative and representative data for machine learning. However, despite the challenges she faces, Cheng still wants to work on Filipino NLP in the future, hoping for more language resources and translators for the more than 160 languages and dialects in the Philippines.
On stories and chatbots
Natural language generation (NLG), on the other hand, is the generation of text from a machine. A computer uses grammatical rules that it has learned to create something as simple as a sentence or as complex as a full story. When it comes to chatbots, the bot generates an appropriate response to what the user inputs. In this case, NLU and NLG work hand-in-hand so that users will be able to interact with them.
To Dr. Ethel Ong, an associate professor from the Department of Software Technology working at CeLT alongside Cheng, these chatbots have the potential to help in terms of mental health, a field that DLSU’s NLP research is exploring with the help of the Office of Counseling and Career Services. Chatbots, in this sense, would serve as an outlet for some who may not feel comfortable discussing certain topics with actual people.
Ong also researches “computational storytelling,” where users—primarily children—can interact with a computer and share stories with it. This method of story generation relies mostly on the computer knowing a certain set of grammatical rules and vocabulary to be able to create stories.
There is an option for the user to choose certain events and work with the computer on the story, but there is also a way for the computer to simply generate the story on its own without any external assistance.
A common concern people have regarding the computer’s capability of generating its own original stories is these narratives’ quality, originality, and creativity. In response to this, Ong explains, “Part of our evaluation is showing the generated stories to literary writers and telling them to evaluate it based on creativity.” Thus, story creativity is also something that the researchers take into account.
One of the most popular examples of story generation is GPT-3, a bot developed by OpenAI which has been said to be able to generate such fluent original prose that it would be difficult to discern that it was written by a bot, not a human. Not only does the bot create text, but it is also capable of interacting and answering questions, making it one of the most well-known examples of NLP technology.
Despite all of this technological progress, however, there are still some ethical issues that researchers have to consider when it comes to NLP. For instance, the data that is being used in processing language can be biased. “These biases may be [based] on stereotypes, gender, race, religion, underrepresentation, misrepresentation, et cetera,” Cheng says.
Moreover, computers may also end up misunderstanding user inputs, which would lead to incorrect responses. When it comes to computers learning from their interactions with random users, this becomes an even greater problem. A primary example of this issue is Microsoft’s Tay chatbot.
The chatbot, designed to interact with Twitter users, was exploited by people who took advantage of its capability to learn. Users wound up teaching the chatbot to be racist and extremely offensive, which led to its suspension.
Of course, not everyone would choose to exploit a chatbot’s interactivity. However, this does not mean that it is impossible for the chatbot to still be taught incorrect or malicious things. For instance, people may end up teaching the chatbot the wrong meaning of certain words.
There are, however, certain safeguards for this. One of which has the chatbot repeat words and definitions to different users to confirm that the word is being used correctly. This way, there are still certain checks and balances so the chatbot doesn’t learn incorrect information too frequently.
Ultimately, there is still much to be explored in the field of NLP.
For Cheng, the numerous dialects of the Philippines are something that can still be worked on. For Ong, there is also plenty of room to work on creating better storytellers and chatbots that can serve as temporary therapists in different health-related concerns. “Basically, this is my dream—[for us to] have a Commander Data like in Star Trek who can really be immersed in our daily lives,” she expresses.