An open Source Configuration for a Large - Scale Web Crawler with Clustering

IMS Manthan (The Journal of Mgt., Comp. Science & Journalism)

Volume 4 Issue 2

Published: 2009
Author(s) Name: Prof. R. Nedunchelian

Locked

Subscribed

Available for All

Abstract

This paper aims at implementing a fully functional Free and Open Source Software (FOSS) based search engine. Implementation will involve the use of information retrieval procedures such as crawling, indexing and searching. In addition the results to the search queries are clustered so that similar results are grouped together into clusters, thereby easing the job of the end-user by not being forced to go through lists of results. We use an open source search engine called Nutch which performs the aforementioned crawling and indexing procedures and displays results to search queries. An open source results clustering engine called Carrot2 is used to cluster the results to the user queries and display them in groups of related results called as clusters. Such a search engine is a fully customizable and cost-effective solution for non-profit organizations and small businesses and is tailored to suit individual needs. Keywords : Open source, Search engine, clustering

View PDF

Welcome Guest

An open Source Configuration for a Large - Scale Web Crawler with Clustering

IMS Manthan (The Journal of Mgt., Comp. Science & Journalism)

Volume 4 Issue 2

Abstract