Readings

Home	Data	Readings

Use this space for collecting papers related to the project. As you find papers, please add them to this list, and we will decide what we want to read and discuss together.

Readings

PaperRef	Title and link	Notes: why interesting?
Ruder et al. 2019	A Survey of Cross-Lingual Word Embedding Models	excellent survey of methods for learning cross-lingual WEs
Kelly et al. 2019	Sigmorphon 2019 Task 2 system description paper: Morphological analysis in context for many languages, with supervision from only a few	System description submission for our system for Sigmorphon Shared Task 2 2019
Mikolov, 2013a	Efficient Estimation of Word Representations in Vector Space	Introduction of Word2Vec, a pair of early (and still relevant) word embedding algorithms
Mikolov, 2013b	Exploiting Similarities among Languages for Machine Translation	Introduction of translation matrices, allowing for crosslingual transfer from one language to another
Arora, 2020	iNLTK: Natural Language Toolkit for Indic Languages	Introduces iNLTK, a toolkit with pre-trained models for 13 major languages of India
Islam et al. 2017	A Study on Various Applications of NLP Developed for North-East Languages	Gives an overview of previous work done in NE India on Electronic Dictionaries, POS Tagging, Machine Translation, WordNet, and Word Sense Disambiguation

References

PaperRef	Title and link	Notes:
Agić & Vulić, 2019	JW300: A Wide-Coverage Parallel Corpus for Low-Resource Languages	paper introducing the JW300 corpus
Sylak-Glassman, 2016	The Composition and Use of the Universal Morphological Feature Schema (UniMorph Schema)	The Unimorph Schema and Description

Reading Group Ideas

PaperRef	Title and Link	Notes:
Wu & Drezde, 2020	Are All Languages Created Equal in Multilingual BERT?	Compares performance of mBERT on high and low resource languages
Devlin et al. 2018	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	Introduces BERT as a state-of-the-art language model
Lin et al. 2019	Choosing Transfer Languages for Cross-Lingual Learning	Explores typological and other considerations for choosing a higher resource transfer language for a low resource language model
Karthikeyan et al. 2019	CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY	Examines effects of model depth and lexical overlap regarding performance of mBERT on Spanish, Hindi, and Russian
Wang et al. 2020	Extending Multilingual BERT to Low-Resource Languages	Investigates methods for transferring mBERT onto 16 languages outside of the 104 used in mBERT pretraining
Pires et al. 2019	How multilingual is Multilingual BERT?	Perform probing experiments for zero-shot transfer of fine-tuning in mBERT
Gerz et al. 2020	On the Relation between Linguistic Typology and (Limitations of) Multilingual Language Modeling	Investigate performance of Language Models across morphologically diverse languages
Lane & Bird, 2020	Bootstrapping Techniques for Polysynthetic Morphological Analysis	Create a morphological analyzer for polysynthetic languages and demonstrate its performance on Kunwinjku
Lane & Bird, 2020	Interactive Word Completion for Morphologically Complex Languages	Develop a morphologically aware text input mechanism for polysynthetic languages
Bird, 2020	Decolonising Speech and Language Technology	Investigates Colonial ideologies in Language Technology and discusses postcolonial and cooperative approaches which prioritize agency for communities
Mosbach et al. 2020	A Closer Look at Linguistic Knowledge in Masked Language Models:The Case of Relative Clauses in American English	Investigates grammatical and semantic knowledge of popular pre-trained models on relative clauses in English
Kanojia et al. 2020	Harnessing Cross-lingual Features to Improve Cognate Detection for Low-resource Languages	Investigates cross-lingual embeddings as a means of detecting cognates in 14 Indian langauges
Ramachandran & de Melo, 2020	Cross-Lingual Emotion Lexicon Induction using Representation Alignment in Low-Resource Settings	Derive and Evaluate emotion lexicons in 350 low-resource languages using cross-lingual aligned word embeddings
Zhai et al. 2020	Detecting Non-literal Translations by Fine-tuning Cross-lingual Pre-trained Language Models	Automatically detects non-literal translations in parallel corpora by fine-tuning a pre-trained cross-lingual Language Model
Saunack et al. 2020	Analysing cross-lingual transfer in lemmatisation for Indian languages	Uses cross-lingual transfer learning for lemmatisation in 6 low-resource and morphologically rich Indian languages
Rama et al. 2020	Probing Multilingual BERT for Genetic and Typological Signals	Investigates the importance of genetic, geographical, and typological information in the represenations of mBERT by automatically generating language trees and conducting experiments to measure a word's diachronic meaning stability
Grießhaber et al. 2020	Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning	Uses pool-based active learning to fine-tune BERT for low-resource downstream tasks
Liang et al. 2020	Monolingual and Multilingual Reduction of Gender Bias in Contextualized Representations	Proposes a novel technique for identifying and reducing gender bias in BERT and tests this strategy in monolingual and multilingual contexts
Do et al. 2020	To What Degree Can Language Borders Be Blurred In BERT-based Multilingual Spoken Language Understanding?	Analyzes transfer of knowledge between languages in a BERT-based Multilingual model for spoken language
Lepori, 2020	Unequal Representations: Analyzing Intersectional Biases in Word Embeddings Using Representational Similarity Analysis	Investigates representational differences in word embeddings for evidence of intersectional biases toward Black women
Conia & Navigli, 2020	Conception: Multilingually-Enhanced, Human-Readable Concept Vector Representations	Creates language independent multilingual vector representations for semantic concepts which are leveraged to improve performance on Word Similarity and Word Sense Disambiguation tasks particularly in low-resource languages
Lepori and McCoy, 2020	Picking BERT’s Brain: Probing for Linguistic Dependencies in Contextualized Embeddings Using Representational Similarity Analysis	Investigates the extent to which BERT embeddings represent syntactic dependencies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readings

Readings

References

Reading Group Ideas

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally