Natural Language Understanding and Generation models suffer from a limited capability of understanding the nuances of inclusive communication as they are trained on massive data, often including significant portions of non-inclusive content. Even when the models are specifically designed to address non-inclusive language detection or reformulation, they disregard, to a large extent, inclusivenessrelated features that are likely correlated with the inclusive language nuances, such as the discourse type, level of inclusiveness, and intended context of use. To assess the importance of additional inclusiveness-related features, we collect a new corpus of Italian administrative documents humanly annotated by linguistic experts. Linguistic experts not only highlight non-inclusive text snippets and propose possible reformulations, but also annotate multi-aspect labels related to different inclusive language nuances. We empirically show that a multi-task learning approach that leverages the multi-aspect annotations can improve the non-inclusive text reformulation performance, thereby confirming the potential of expert-annotated data in inclusive language processing.

(2024). Building Foundations for Inclusiveness through Expert-Annotated Data . Retrieved from https://hdl.handle.net/10446/268489

Building Foundations for Inclusiveness through Expert-Annotated Data

Tonti, Michela;
2024-01-01

Abstract

Natural Language Understanding and Generation models suffer from a limited capability of understanding the nuances of inclusive communication as they are trained on massive data, often including significant portions of non-inclusive content. Even when the models are specifically designed to address non-inclusive language detection or reformulation, they disregard, to a large extent, inclusivenessrelated features that are likely correlated with the inclusive language nuances, such as the discourse type, level of inclusiveness, and intended context of use. To assess the importance of additional inclusiveness-related features, we collect a new corpus of Italian administrative documents humanly annotated by linguistic experts. Linguistic experts not only highlight non-inclusive text snippets and propose possible reformulations, but also annotate multi-aspect labels related to different inclusive language nuances. We empirically show that a multi-task learning approach that leverages the multi-aspect annotations can improve the non-inclusive text reformulation performance, thereby confirming the potential of expert-annotated data in inclusive language processing.
2024
La Quatra, Moreno; Greco, Salvatore; Cagliero, Luca; Tonti, Michela; Dragotto, Francesca; Raus, Rachele; Cavagnoli, Stefania; Cerquitelli, Tania
File allegato/i alla scheda:
File Dimensione del file Formato  
DARLI-AP-3.pdf

accesso aperto

Versione: publisher's version - versione editoriale
Licenza: Creative commons
Dimensione del file 1.02 MB
Formato Adobe PDF
1.02 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/268489
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact