A Genetic-Bayesian Short Message Service (SMS) Spam Filter with Text Normalization and Semantic Indexing (TNSI)

A Genetic-Bayesian Short Message Service (SMS) Spam Filter with Text Normalization and Semantic Indexing (TNSI)

Author by Dr. Oluwaseun Ebiesuwa

Journal/Publisher: International Journal Of Computer & Organization Trends

Volume/Edition: 7

Language: English

Pages: 14 - 19

Abstract

Ever since the first Short Message Service (SMS) service was introduced in 1993, its popularity has continued to soar over the years such that SMS communication now constitutes a major segment in the spectrum of telecommunication. The popularity and extensive usage has attracted the interest of many researchers to the inherent potential in harvesting data and metadata from collection of SMS corpus for the performance of linguistic, diachronic, normalization and sociolinguistic studies and also in the validation and comparison of different classifiers in SMS spam filters. However, freely available dataset where this type of information can be found for research purposes are quite difficult to obtain. This is mostly due to the confidentiality of SMS where users want to reveal as little of the contents of their phones as possible. This paper is geared towards the examination of the techniques adopted in the creation of SMS corpus and the ethical consideration involved in the protection of users’ interest and privacy. For a successful SMS corpus creation, a main consideration is the requirement to protect the rights and interests of the message donors and any other person mentioned in the text messages, without altering the original text in order to gather sufficient metadata information. A review of existing work in the field was done to ascertain ethical observations adopted. Participant consent, data anonymization, and ensuring participants’ safe information storage are basic ethical consideration adopted to ensure a successful SMS corpus creation.


Other Co-Authors