Title:

Correlation between the quality of focused crawlers and the linguistic resources obtained from them

Category:

Thesis and Dissertation Workshop (WTDBD)

Topics of interest:

Focused Crawling, Lexical Resources, Machine Translation, Correlation

Abstract:

Focused web crawlers have been used for the automatic acquisition of lexical resources for particular domains, gathering websites related to a set of topics of interest. For this purpose, a portion of the web graph is traversed, and the documents corresponding to pages considered relevant are stored and treated as a corpus. It is important to traverse this graph in a targeted way, organizing pages in a queue that prioritizes pages that are more likely to be relevant. Texts collected by these tools can be used to train domain-specific machine translation (MT) systems. In this work, we compare the performance of focused crawling algorithms, measured with standard metrics, and the quality of the linguistic resources obtained, in order to try to establish a correlation between both. Also, we propose a novel, extrinsic metric to evaluate the efficiency of a focused crawling algorithm.

Author(s):

Bruno Rezende Laranjeira, Aline Villavicencio, Viviane P. Moreira

Baixar o PDF