Thesis and Dissertation Workshop (WTDBD)
Páginas-objeto, Busca-objeto, Classificação de Páginas Web
This paper proposes a new method for identifying and searching object pages named OPIS (acronyms to Object Page Identifying and Searching). Object pages are pages that represent exactly one inherent real-world object on the web. The purpose of OPIS is to address the search for these real-world objects pages, since the General Search Engines (GSEs) cannot answer satisfactorily this type of search today. The kernel of our method is to adopt feedback relevance and machine learning techniques in the task of content-based pages classification. OPIS, when integrated into a GSE, enables the filtering of object pages, in which only pages classified as object pages are retrieved by user keyword queries instead of all pages that contain those words. Preliminary experiments show that OPIS improved on average 37% of the precision in 20 (p@20) of the results retrieved when compared with a GSE.
Miriam Pizzatto Colpo, Edimar Manica, Renata Galante