A multilingual corpus of 12,000 Facebook posts fully annotated for bias and propaganda. The corpus was created as part of the FigNews 2024 Shared Task on News Media Narratives for framing the Israeli War on Gaza. It covers various events during the War from October 7, 2023 to January 31, 2024. The corpus comprises 12,000 posts in five languages (Arabic, Hebrew, English, French, and Hindi), with 2,400 posts for each language.
Tables below illustrate the distribution of the bias and propaganda classes across languages.
| Class | Ar | En | He | Fr | Hi | Total |
|---|---|---|---|---|---|---|
| Biased Against Palestine | 466 | 514 | 595 | 807 | 534 | 2916 |
| Biased Against Israel | 94 | 79 | 23 | 19 | 70 | 285 |
| Biased Against Both | 6 | 7 | 11 | 6 | 14 | 44 |
| Biased Against Others | 42 | 28 | 53 | 39 | 49 | 211 |
| Unbiased | 1371 | 1486 | 1369 | 1212 | 1386 | 6824 |
| Not applicable | 49 | 7 | 17 | 20 | 25 | 118 |
| Unclear | 132 | 39 | 92 | 57 | 82 | 402 |
| Total | 2160 | 2160 | 2160 | 2160 | 2160 | 10800 |
Table 1 : Distribution of bias classes across languages
| Typs of Bias | Ar | En | He | Fr | Hi | Total |
|---|---|---|---|---|---|---|
| Explicit (تحيز صريح) | 2093 | 388 | 412 | 563 | 336 | 394 |
| Implicit (تحيز ضمني) | 1186 | 269 | 236 | 265 | 217 | 199 |
| Vague (تحيز مبهم) | 211 | 27 | 52 | 59 | 37 | 36 |
Table 2: Types of Bias
| Class | Ar | En | He | Fr | Hi | Total |
|---|---|---|---|---|---|---|
| Propaganda | 524 | 679 | 648 | 809 | 673 | 3333 |
| Not propaganda | 1484 | 1443 | 1447 | 1297 | 1413 | 7084 |
| Not applicable | 48 | 11 | 17 | 16 | 24 | 116 |
| Unclear | 104 | 27 | 48 | 38 | 50 | 267 |
| Total | 2160 | 2160 | 2160 | 2160 | 2160 | 10800 |
Table 3: Distribution of propaganda subtask classes across languages
| Class | Ar | En | He | Fr | Hi | Total |
|---|---|---|---|---|---|---|
| Propaganda Must be deleted | 192 | 191 | 266 | 348 | 277 | 1274 |
| Propaganda May be deleted | 524 | 488 | 382 | 461 | 396 | 2059 |
| Propaganda not to be deleted | 451 | 422 | 565 | 648 | 436 | 2522 |
Table 4: Types of Propaganda classes
Table 1 and Table 2 illustrate the distribution of the bias classes and types of bias across languages respectively. Table 1 shows that about 27% of the posts are biased against Palestine and 63% of the posts are unbiased. Most of the bias against Palestine originated from French posts. Table 2 gives more statistics about the types of bias. As shown in this table, most of the posts annotated as Explicit bias are in Hebrew. For propaganda results, Table 3 illustrates the distributions of propaganda classes across languages, which shows that 31% of the posts (3333) are annotated as "Propaganda", and 66% (7084) are "Not Propaganda". The majority of the propaganda originated from French posts. Table 4 illustrates the distribution of the type of propaganda classes among languages. As shown in the table posts that were classified as propaganda must be deleted were in French with 348 posts.
The data directory in this repo contains four groups of sheets:
(a) Main and IAA-(1..4): These are the annotation sheets.
Main has 90% of the posts; and IAAs have 10% in repetition. IAA is inter-annotator agreement. IAA sheets are used to allow for measuring inter-annotator consistency. For better results its preferred to at least use IAA-1 and IAA-2. Copies are provided for IAA-3 and IAA-4. Each annotator must be given a unique ID (1, 2, 3, 4), and provide their details in the Annotation Team Sheet. All these sheets have the following columns:
- Batch: Batch id ranging from B01 to B15.
- Source Language: Original language of the text - Arabic, English, Hebrew, French, and Hindi
- ID: A unique identifier per Source Language
- Type: MAIN or IAA
- Text: Message text to annotate
- English MT: machine translation of text to English
- Arabic MT: machine translation of text to Arabic
- Annotator ID: Unique identifier per annotator (as listed in the 'Annotation Team' sheet)
- Bias: Bias subtask annotation labels: Unbiased, Biased against Palestine, Biased against Israel, Biased against both Palestine and Israel, Biased against others, Unclear, Not Applicable
- Propaganda: Propaganda subtask annotation labels: Propaganda, Not Propaganda, Unclear, Not Applicable
(b) Status: This sheet includes an automatically updated record of completing the annotations to help you track your progress.
- Batches: Batch id ranging from B01 to B15.
- Sub-Batch: MAIN or IAA
- Bias #: number of completed posts
- Propaganda #: number of completed posts
- Bias %: percentage of completed posts
- Propaganda %: percentage of completed posts
(c) Annotation Team: This sheet must be filed by every annotation team to provide the following information.
- Team Name: Pick a cool and inspiring name!
- Subtask: Bias, Propaganda
- Annotator ID: A unique identifier (1,2,3,4) for each annotator. Annotar n should finish IAA-n sheet. The annotators ID for Bias and Propaganda do not need to align.
- Arabic Source Annotation Language: The language the annotator used to annotate the Arabic source posts (e.g., the annotator can read the Arabic or the English MT.)
- Hebrew Source Annotation Language: The language the annotator used to annotate the Hebrew source posts (e.g., the annotator can read the Arabic or the English MT.)
- French Source Annotation Language: The language the annotator used to annotate the French source posts (e.g., the annotator can read the Arabic or the English MT.)
- Hindi Source Annotation Language: The language the annotator used to annotate the Hindi source posts (e.g., the annotator can read the Arabic or the English MT.)
- English Source Annotation Language: The language the annotator used to annotate the English source posts (e.g., the annotator can read the Arabic or the English MT.)
- Native Language: of the Annotator.
- Gender: of the Annotator (defined as they prefer).
- Country of Origin: of the Annotator (could be more than one).
- Education Level: of the Annotator.
- Contribution: Main (Do not edit this column, it will be updated automatically)
- Contribution: IAA (Do not edit this column, it will be updated automatically)
(d) Our Bias/Propaganda Guidelines: These sheets are to be filled by the team members with detailed annotation guidelines covering the following subtasks:
- Define the Objective: Outline the purpose of this specific task.
- Describe the Task: Provide a detailed task description with correct examples.
- Establish Categories: List and define all annotation categories/tags.
- Detailed Category Guidelines: Explain application criteria for each category/tag, with examples.
- Include Examples: Offer examples for correct application and common mistakes.
- Outline the Process: Describe the step-by-step annotation process and tools used.
- Set Quality Standards: Define expectations for accuracy and consistency, along with quality check procedures.
- Handle Ambiguities: Provide guidance on ambiguous cases and a protocol for seeking clarification.
- Ensure Consistency: Implement measures for annotator consistency and recommend calibration sessions.
- Ethical Considerations: Highlight unbiased annotation practices and handling of sensitive data.
- Training and Support: Detail training procedures and support resources for annotators.
- Review and Update: Schedule guideline reviews for updates based on feedback and new insights.
- Feedback Mechanism: Include a system for annotator feedback to refine guidelines and processes.
Clone this repo
git clone https://github.com/SinaLab/BiasFignews
Lina Duaibes, Areej Jaber, Mustafa Jarrar, Ahmad Qadi, Mais Qandeel: Sina at FigNews 2024: Multilingual Datasets Annotated with Bias and Propaganda. In Proceedings of the Second Arabic Natural Language Processing Conference (ArabicNLP 2024), Bangkok, Thailand. Association for Computational Linguistics.