Bias Fignews

A multilingual corpus of 12,000 Facebook posts fully annotated for bias and propaganda. The corpus was created as part of the FigNews 2024 Shared Task on News Media Narratives for framing the Israeli War on Gaza. It covers various events during the War from October 7, 2023 to January 31, 2024. The corpus comprises 12,000 posts in five languages (Arabic, Hebrew, English, French, and Hindi), with 2,400 posts for each language.

The Distribution of the Bias and Propaganda Classes Across Languages.

Tables below illustrate the distribution of the bias and propaganda classes across languages.

Class	Ar	En	He	Fr	Hi	Total
Biased Against Palestine	466	514	595	807	534	2916
Biased Against Israel	94	79	23	19	70	285
Biased Against Both	6	7	11	6	14	44
Biased Against Others	42	28	53	39	49	211
Unbiased	1371	1486	1369	1212	1386	6824
Not applicable	49	7	17	20	25	118
Unclear	132	39	92	57	82	402
Total	2160	2160	2160	2160	2160	10800

Table 1 : Distribution of bias classes across languages

Typs of Bias	Ar	En	He	Fr	Hi	Total
Explicit (تحيز صريح)	2093	388	412	563	336	394
Implicit (تحيز ضمني)	1186	269	236	265	217	199
Vague (تحيز مبهم)	211	27	52	59	37	36

Table 2: Types of Bias

Class	Ar	En	He	Fr	Hi	Total
Propaganda	524	679	648	809	673	3333
Not propaganda	1484	1443	1447	1297	1413	7084
Not applicable	48	11	17	16	24	116
Unclear	104	27	48	38	50	267
Total	2160	2160	2160	2160	2160	10800

Table 3: Distribution of propaganda subtask classes across languages

Class	Ar	En	He	Fr	Hi	Total
Propaganda Must be deleted	192	191	266	348	277	1274
Propaganda May be deleted	524	488	382	461	396	2059
Propaganda not to be deleted	451	422	565	648	436	2522

Table 4: Types of Propaganda classes

Table 1 and Table 2 illustrate the distribution of the bias classes and types of bias across languages respectively. Table 1 shows that about 27% of the posts are biased against Palestine and 63% of the posts are unbiased. Most of the bias against Palestine originated from French posts. Table 2 gives more statistics about the types of bias. As shown in this table, most of the posts annotated as Explicit bias are in Hebrew. For propaganda results, Table 3 illustrates the distributions of propaganda classes across languages, which shows that 31% of the posts (3333) are annotated as "Propaganda", and 66% (7084) are "Not Propaganda". The majority of the propaganda originated from French posts. Table 4 illustrates the distribution of the type of propaganda classes among languages. As shown in the table posts that were classified as propaganda must be deleted were in French with 348 posts.

Corpus Download

The data directory in this repo contains four groups of sheets:

(a) Main and IAA-(1..4): These are the annotation sheets.

Main has 90% of the posts; and IAAs have 10% in repetition. IAA is inter-annotator agreement. IAA sheets are used to allow for measuring inter-annotator consistency. For better results its preferred to at least use IAA-1 and IAA-2. Copies are provided for IAA-3 and IAA-4. Each annotator must be given a unique ID (1, 2, 3, 4), and provide their details in the Annotation Team Sheet. All these sheets have the following columns:

Batch: Batch id ranging from B01 to B15.
Source Language: Original language of the text - Arabic, English, Hebrew, French, and Hindi
ID: A unique identifier per Source Language
Type: MAIN or IAA
Text: Message text to annotate
English MT: machine translation of text to English
Arabic MT: machine translation of text to Arabic
Annotator ID: Unique identifier per annotator (as listed in the 'Annotation Team' sheet)
Bias: Bias subtask annotation labels: Unbiased, Biased against Palestine, Biased against Israel, Biased against both Palestine and Israel, Biased against others, Unclear, Not Applicable
Propaganda: Propaganda subtask annotation labels: Propaganda, Not Propaganda, Unclear, Not Applicable

(b) Status: This sheet includes an automatically updated record of completing the annotations to help you track your progress.

Batches: Batch id ranging from B01 to B15.
Sub-Batch: MAIN or IAA
Bias #: number of completed posts
Propaganda #: number of completed posts
Bias %: percentage of completed posts
Propaganda %: percentage of completed posts

(c) Annotation Team: This sheet must be filed by every annotation team to provide the following information.

Team Name: Pick a cool and inspiring name!
Subtask: Bias, Propaganda
Annotator ID: A unique identifier (1,2,3,4) for each annotator. Annotar n should finish IAA-n sheet. The annotators ID for Bias and Propaganda do not need to align.
Arabic Source Annotation Language: The language the annotator used to annotate the Arabic source posts (e.g., the annotator can read the Arabic or the English MT.)
Hebrew Source Annotation Language: The language the annotator used to annotate the Hebrew source posts (e.g., the annotator can read the Arabic or the English MT.)
French Source Annotation Language: The language the annotator used to annotate the French source posts (e.g., the annotator can read the Arabic or the English MT.)
Hindi Source Annotation Language: The language the annotator used to annotate the Hindi source posts (e.g., the annotator can read the Arabic or the English MT.)
English Source Annotation Language: The language the annotator used to annotate the English source posts (e.g., the annotator can read the Arabic or the English MT.)
Native Language: of the Annotator.
Gender: of the Annotator (defined as they prefer).
Country of Origin: of the Annotator (could be more than one).
Education Level: of the Annotator.
Contribution: Main (Do not edit this column, it will be updated automatically)
Contribution: IAA (Do not edit this column, it will be updated automatically)

(d) Our Bias/Propaganda Guidelines: These sheets are to be filled by the team members with detailed annotation guidelines covering the following subtasks:

Define the Objective: Outline the purpose of this specific task.
Describe the Task: Provide a detailed task description with correct examples.
Establish Categories: List and define all annotation categories/tags.
Detailed Category Guidelines: Explain application criteria for each category/tag, with examples.
Include Examples: Offer examples for correct application and common mistakes.
Outline the Process: Describe the step-by-step annotation process and tools used.
Set Quality Standards: Define expectations for accuracy and consistency, along with quality check procedures.
Handle Ambiguities: Provide guidance on ambiguous cases and a protocol for seeking clarification.
Ensure Consistency: Implement measures for annotator consistency and recommend calibration sessions.
Ethical Considerations: Highlight unbiased annotation practices and handling of sensitive data.
Training and Support: Detail training procedures and support resources for annotators.
Review and Update: Schedule guideline reviews for updates based on feedback and new insights.
Feedback Mechanism: Include a system for annotator feedback to refine guidelines and processes.

Clone this repo

git clone https://github.com/SinaLab/BiasFignews

Citation

Lina Duaibes, Areej Jaber, Mustafa Jarrar, Ahmad Qadi, Mais Qandeel: Sina at FigNews 2024: Multilingual Datasets Annotated with Bias and Propaganda. In Proceedings of the Second Arabic Natural Language Processing Conference (ArabicNLP 2024), Bangkok, Thailand. Association for Computational Linguistics.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data		data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bias Fignews

The Distribution of the Bias and Propaganda Classes Across Languages.

Corpus Download

Citation

About

Uh oh!

Releases

Packages

SinaLab/BiasFignews

Folders and files

Latest commit

History

Repository files navigation

Bias Fignews

The Distribution of the Bias and Propaganda Classes Across Languages.

Corpus Download

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages