Linguistic Analyses and Corpus Building using Natural Language Processing

This work examines the use of Tunisian Arabic encoded in Arabizi, a non-standard orthographic system using the Latin alphabet and numbers, which emerged in digital contexts and transformed written communication. Traditionally considered primarily oral, Tunisian Arabic has seen increased use on social media over the past two decades, challenging its conventional definition.
This study examines whether Arabizi is influenced by the specific writing context (blogs, forums, and social networks) and how it facilitates the use of French vocabulary during code-mixing. Indeed, one of the key aspects of the study concerns the quasi-oral nature of Arabizi, considered a system that allows independence from writing traditions, such as that of the Arabic alphabet.
These analyses are corpus-based, and in particular, the Tunisian Arabish Corpus (TArC) was created to observe these linguistic dynamics. TArC collects texts produced over ten years, comprising 43,327 words with various levels of linguistic annotation, allowing for a detailed study of the language. The hybrid methodology adopted to build TArC combines approaches from Arabic dialectology, Corpus Linguistics, and Natural Language Processing. TArC has been annotated using semi-automatic procedures, making it useful for NLP research as well. Finally, the work examines the challenges and limitations of interdisciplinary research, proposing preliminary hypotheses on the linguistic and sociolinguistic trends related to the use of Arabizi in Tunisia. These observations aim to mark a starting point for future research in the field of dialectology and linguistic technologies applied to Tunisian Arabic.

DATI BIBLIOGRAFICI
Autrice: Elisa Gugliotta
Editore: Ledizioni
Pubblicato in: ottobre 2024
Collana: Quaderni del CERM
Lingua: inglese
Formato: brossura, 277 p. – PDF in OA
ISBN cartaceo: 9791256002139
Prezzo cartaceo: 28,00 €

Dimensioni

17 × 24 cm

Formato

Cartaceo, eBook in PDF

Clicca per ampliare

Torna alla lista

Elisa Gugliotta

TUNISIAN ARABIZI

Fascia di prezzo: da 0,00 € a 28,00 €

Formato	Svuota

Compara

Aggiungi alla lista dei desideri

ISBN: N/A Argomento o collana: Quaderni del CERM Tag: arabic, language studies, linguistic, Open Access

Descrizione

Informazioni aggiuntive

Dimensioni	17 × 24 cm
Formato	Cartaceo, eBook in PDF

Recensioni utenti (0)

Spedizioni e consegna

Spedizioni

Spediamo in tutta Italia con corriere GLS, in Europa con BRT ed in Extra Europa con DHL.
Normalmente spediamo il volume entro 24 ore dal lunedì al venerdì, lo riceverai normalmente entro 48/72 ore dal tuo ordine.

Costi

La spedizione in Italia è gratuita per ordini superiori a €29 mentre per ordini inferiori costa €6 più iva. Se acquisti eBooks non c’è ovviamente costo di spedizione.
La spedizione in Europa dipende dalla nazione e dal peso. Si parte da €14.50 a salire.
La spedizione in Extra Europa dipende dalla nazione e dal peso. Si parte da €26 a salire.

Per spedizioni e richieste particolari verrà fornito un preventivo personalizzato.

TUNISIAN ARABIZI

Linguistic Analyses and Corpus Building using Natural Language Processing

Spedizioni

Costi

MINORITIES IN THE POST-SOVIET SPACE THIRTY YEARS AFTER THE DISSOLUTION OF THE USSR

ESPRESSIONI E POETICHE DELL’IDENTITÀ E DELL’APPARTENENZA NEGLI SCRITTORI DELLA LETTERATURA ITALIANA DELL’ISTRIA

FLUMEN FIUME RIJEKA

TUNISIAN ARABIZI

Linguistic Analyses and Corpus Building using Natural Language Processing

Spedizioni

Costi

Ti potrebbe interessare…

MINORITIES IN THE POST-SOVIET SPACE THIRTY YEARS AFTER THE DISSOLUTION OF THE USSR

ESPRESSIONI E POETICHE DELL’IDENTITÀ E DELL’APPARTENENZA NEGLI SCRITTORI DELLA LETTERATURA ITALIANA DELL’ISTRIA

FLUMEN FIUME RIJEKA