Computational modeling of 19th-century nheengatu from the lower amazon region using universal dependencies

Alexandre, Dominick Maia

Use este identificador para citar ou linkar para este item: http://repositorio.ufc.br/handle/riufc/84856

Tipo:	Dissertação
Título:	Computational modeling of 19th-century nheengatu from the lower amazon region using universal dependencies
Título em inglês:	Computational modeling of 19th-century nheengatu from the lower amazon region using universal dependencies
Autor(es):	Alexandre, Dominick Maia
Orientador:	Araripe, Leonel Figueiredo de Alencar
Palavras-chave em português:	Morfossintaxe;Nheengatu;Dependências universais;Processamento de Linguagem Natural
Palavras-chave em inglês:	Morphosyntax;Universal dependencies;Natural Language Process-ing
CNPq:	CNPQ::LINGUISTICA, LETRAS E ARTES::LINGUISTICA
Data do documento:	2026
Citação:	ALEXANDRE, Dominick Maia. Computational modeling of 19th-century nheengatu from the lower amazon region using universal dependencies. 2026. 108 f. Dissertação (Mestrado em Linguística) - Programa de Pós-graduação em Linguística, Centro de Humanidades, Universidade Federal do Ceará, Fortaleza, 2026.
Resumo:	Esta dissertação tem como objetivo analisar a morfossintaxe do nheengatu usando o modelo das Dependências Universais (Universal Dependencies – UD). A proposta insere-se no campo das Humanidades Digitais e busca contribuir para a descrição dessa língua indígena e para o desenvolvimento de recursos computacionais para sua anotação morfossintática. Considerando que o nheengatu enfrenta um processo de diminuição no número de falantes e na sua transmissão intergeracional (NAVARRO, 2012; EBERHARD et al., 2025), partimos do pressuposto que o desenvolvimento de ferramentas de processamento de linguagem natural pode contribuir para sua documentação e inclusão digital, especialmente por meio da aplicação de formalismos computacionais de uso crescente no estado da arte, como o modelo UD. Do ponto de vista teórico, a pesquisa fundamenta-se na teoria das Dependências Universais (NIVRE et al., 2017a; NIVRE et al., 2017b) e em descrições linguísticas prévias, com destaque para Cruz (2011), Stradelli (2014), Navarro (2016) e Avila (2021). O corpus adotado é composto por registros do século XIX do nheengatu falado na região do Baixo Amazonas, conforme documentado em Hartt (1938). A metodologia empregada envolve a adaptação ortográfica dessas sentenças, com base em Avila (2021), seguida de sua anotação morfossintática e revisão conforme os princípios do projeto UD. O objetivo prático é expandir o banco de árvores do nheengatu já existente na coleção UD (ALENCAR, 2023; ALENCAR, 2024a), além de identificar e descrever padrões sintáticos característicos dessa variação histórica da língua. Posto isso, espera-se fomentar a inclusão do nheengatu no atual cenário dos estudos linguístico-computacionais, valendo-se da aplicabilidade já consolidada do modelo UD na análise morfossintática de línguas minoritárias. (GERARDI et al., 2022; RODRÍGUEZ et al., 2022; SANTOS et al., 2024).
Abstract:	This thesis aims to analyze the morphosyntax of Nheengatu using the Universal Dependencies (UD) framework. The proposal is situated within the field of Digital Humanities and seeks to con- tribute to the linguistic description of this Indigenous language, as well as to the development of computational resources for its morphosyntactic annotation. Given that Nheengatu is undergoing a reduction in the number of speakers and in intergenerational transmission (NAVARRO, 2012; EBERHARD et al., 2025), we claim that the development of natural language processing tools can contribute to its documentation and digital inclusion, particularly through the application of computational formalisms that are increasingly adopted in the state of the art, such as the UD framework. From a theoretical perspective, this study is grounded in the Universal Dependencies theory (NIVRE et al., 2017a; NIVRE et al., 2017b) and in previous linguistic descriptions, especially those by Cruz (2011), Stradelli (2014), Navarro (2016), and Avila (2021). The adopted corpus consists of 19th-century records of Nheengatu spoken in the region of the lower Amazon River, as documented in Hartt (1938). The methodology involves the orthographic normalization of these sentences based on Avila (2021), followed by morphosyntactic annotation and revision according to the principles of the UD project. The practical objective is to expand the existing Nheengatu treebank in the UD collection (ALENCAR, 2023; ALENCAR, 2024a), as well as to identify and describe syntactic patterns characteristic of this historical variety of the language. In doing so, this study aims to promote the inclusion of Nheengatu in the current landscape of computational-linguistic research, building upon the already attested applicability of the UD framework to the morphosyntactic analysis of minority languages (GERARDI et al., 2022; RODRÍGUEZ et al., 2022; SANTOS et al., 2024).
URI:	http://repositorio.ufc.br/handle/riufc/84856
ORCID do(s) Autor(es):	https://orcid.org/0009-0000-7749-7762
Currículo Lattes do(s) Autor(es):	http://lattes.cnpq.br/3506433597849477
ORCID do Orientador:	https://orcid.org/0000-0001-8148-6994
Currículo Lattes do Orientador:	http://lattes.cnpq.br/0669766218971125
Tipo de Acesso:	Acesso Aberto
Aparece nas coleções:	PPGL - Dissertações defendidas na UFC

Arquivos associados a este item:

Arquivo	Descrição	Tamanho	Formato
2026_dis_dmalexandre.pdf		2,15 MB	Adobe PDF	Visualizar/Abrir

Mostrar registro completo do item Visualizar estatísticas