Extracting compound terms from domain corpora

Lopes, Lucelene; Vieira, Renata; Finatto, Maria José Bocorny; Martins, Daniel

dc.contributor.author	Lopes, Lucelene	pt_BR
dc.contributor.author	Vieira, Renata	pt_BR
dc.contributor.author	Finatto, Maria José Bocorny	pt_BR
dc.contributor.author	Martins, Daniel	pt_BR
dc.date.accessioned	2018-04-03T02:26:06Z	pt_BR
dc.date.issued	2010	pt_BR
dc.identifier.issn	0104-6500	pt_BR
dc.identifier.uri	http://hdl.handle.net/10183/174302	pt_BR
dc.description.abstract	The need for domain ontologies motivates the research on structured information extraction from texts. A foundational part of this process is the identification of domain relevant compound terms. This paper presents an evaluation of compound terms extraction from a corpus of the domain of Pediatrics. Bigrams and trigrams were automatically extracted from a corpus composed by 283 texts from a Portuguese journal, Jornal de Pediatria, using three different extraction methods. Considering that these methods generate an elevated number of candidates, we analyzed the quality of the resulting terms according to different methods and cut-off points. The evaluation is reported by metrics such as precision, recall and f-measure, which are computed on the basis of a hand-made reference list of domain relevant compounds.	en
dc.format.mimetype	application/pdf	pt_BR
dc.language.iso	eng	pt_BR
dc.relation.ispartof	Journal of the Brazilian Computer Society. Rio de Janeiro, RJ. Vol. 16 (2010), p. [247]-259	pt_BR
dc.rights	Open Access	en
dc.subject	Ontologia	pt_BR
dc.subject	Term extraction	en
dc.subject	Statistical and linguistic methods	en
dc.subject	Terminologia	pt_BR
dc.subject	Ontology automatic construction	en
dc.subject	Extraction from corpora	en
dc.title	Extracting compound terms from domain corpora	pt_BR
dc.type	Artigo de periódico	pt_BR
dc.identifier.nrb	001057475	pt_BR
dc.type.origin	Nacional	pt_BR

Nome:: 001057475.pdf
Tamanho:: 493.9Kb
Formato:: PDF
Descrição:: Texto completo (inglês)

Visualizar/abrir

Este item está licenciado na Creative Commons License

Artigos de Periódicos (44386)

Linguística, Letras e Artes (3004)

Mostrar registro simples