Comparison of String Similarity Algorithms to Measure Lexical Similarity
Published: 2017
Author(s) Name: Sagar J. Gandhi, Mihirraj M. Thakor, Jikitsha Sheth, Hariom I. Pandit, Hemin S. Patel |
Author(s) Affiliation: Student, MCA Prog., Shrimad Rajchandra Instt. of Mgt. and Comp. Applicn. of UTU, Bardoli, Gujarat.
Locked
Subscribed
Available for All
Abstract
A string similarity represents the lexical similarity between two words. This can be further exploited to identify similarity between questions. Several string similarity algorithm exists in literature. In this paper the authors have implemented five string similarity algorithms viz. Dice coefficient, Jaccard similarity, Levenshtein distance, Jaro distance and Cosine similarity. The results of these algorithms are further compared with human judges to determine, which of them resembles the human way to dissimilarize the given strings. The experimentation is done over 1000 English word pairs.
Keywords: N.A.
View PDF