Skip to content
WesScivetti edited this page Jan 5, 2021 · 22 revisions
Home Data Readings

Use this space for collecting papers related to the project. As you find papers, please add them to this list, and we will decide what we want to read and discuss together.

Readings

PaperRef Title and link Notes: why interesting?
Ruder et al. 2019 A Survey of Cross-Lingual Word Embedding Models excellent survey of methods for learning cross-lingual WEs
Kelly et al. 2019 Sigmorphon 2019 Task 2 system description paper: Morphological analysis in context for many languages, with supervision from only a few System description submission for our system for Sigmorphon Shared Task 2 2019
Mikolov, 2013a Efficient Estimation of Word Representations in Vector Space Introduction of Word2Vec, a pair of early (and still relevant) word embedding algorithms
Mikolov, 2013b Exploiting Similarities among Languages for Machine Translation Introduction of translation matrices, allowing for crosslingual transfer from one language to another
Arora, 2020 iNLTK: Natural Language Toolkit for Indic Languages Introduces iNLTK, a toolkit with pre-trained models for 13 major languages of India
Islam et al. 2017 A Study on Various Applications of NLP Developed for North-East Languages Gives an overview of previous work done in NE India on Electronic Dictionaries, POS Tagging, Machine Translation, WordNet, and Word Sense Disambiguation

References

PaperRef Title and link Notes:
Agić & Vulić, 2019 JW300: A Wide-Coverage Parallel Corpus for Low-Resource Languages paper introducing the JW300 corpus
Sylak-Glassman, 2016 The Composition and Use of the Universal Morphological Feature Schema (UniMorph Schema) The Unimorph Schema and Description

Reading Group Ideas

PaperRef Title and Link Notes:
Wu & Drezde, 2020 Are All Languages Created Equal in Multilingual BERT? Compares performance of mBERT on high and low resource languages
Devlin et al. 2018 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Introduces BERT as a state-of-the-art language model
Lin et al. 2019 Choosing Transfer Languages for Cross-Lingual Learning Explores typological and other considerations for choosing a higher resource transfer language for a low resource language model
Karthikeyan et al. 2019 CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY Examines effects of model depth and lexical overlap regarding performance of mBERT on Spanish, Hindi, and Russian
Wang et al. 2020 Extending Multilingual BERT to Low-Resource Languages Investigates methods for transferring mBERT onto 16 languages outside of the 104 used in mBERT pretraining
Pires et al. 2019 How multilingual is Multilingual BERT? Perform probing experiments for zero-shot transfer of fine-tuning in mBERT
Gerz et al. 2020 On the Relation between Linguistic Typology and (Limitations of) Multilingual Language Modeling Investigate performance of Language Models across morphologically diverse languages
Lane & Bird, 2020 Bootstrapping Techniques for Polysynthetic Morphological Analysis Create a morphological analyzer for polysynthetic languages and demonstrate its performance on Kunwinjku
Lane & Bird, 2020 Interactive Word Completion for Morphologically Complex Languages Develop a morphologically aware text input mechanism for polysynthetic languages
Bird, 2020 Decolonising Speech and Language Technology Investigates Colonial ideologies in Language Technology and discusses postcolonial and cooperative approaches which prioritize agency for communities
Mosbach et al. 2020 A Closer Look at Linguistic Knowledge in Masked Language Models:The Case of Relative Clauses in American English Investigates grammatical and semantic knowledge of popular pre-trained models on relative clauses in English
Kanojia et al. 2020 Harnessing Cross-lingual Features to Improve Cognate Detection for Low-resource Languages Investigates cross-lingual embeddings as a means of detecting cognates in 14 Indian langauges
Ramachandran & de Melo, 2020 Cross-Lingual Emotion Lexicon Induction using Representation Alignment in Low-Resource Settings Derive and Evaluate emotion lexicons in 350 low-resource languages using cross-lingual aligned word embeddings
Zhai et al. 2020 Detecting Non-literal Translations by Fine-tuning Cross-lingual Pre-trained Language Models Automatically detects non-literal translations in parallel corpora by fine-tuning a pre-trained cross-lingual Language Model
Saunack et al. 2020 Analysing cross-lingual transfer in lemmatisation for Indian languages Uses cross-lingual transfer learning for lemmatisation in 6 low-resource and morphologically rich Indian languages
Rama et al. 2020 Probing Multilingual BERT for Genetic and Typological Signals Investigates the importance of genetic, geographical, and typological information in the represenations of mBERT by automatically generating language trees and conducting experiments to measure a word's diachronic meaning stability
Grießhaber et al. 2020 Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning Uses pool-based active learning to fine-tune BERT for low-resource downstream tasks
Liang et al. 2020 Monolingual and Multilingual Reduction of Gender Bias in Contextualized Representations Proposes a novel technique for identifying and reducing gender bias in BERT and tests this strategy in monolingual and multilingual contexts
Do et al. 2020 To What Degree Can Language Borders Be Blurred In BERT-based Multilingual Spoken Language Understanding? Analyzes transfer of knowledge between languages in a BERT-based Multilingual model for spoken language
Lepori, 2020 Unequal Representations: Analyzing Intersectional Biases in Word Embeddings Using Representational Similarity Analysis Investigates representational differences in word embeddings for evidence of intersectional biases toward Black women
Conia & Navigli, 2020 Conception: Multilingually-Enhanced, Human-Readable Concept Vector Representations Creates language independent multilingual vector representations for semantic concepts which are leveraged to improve performance on Word Similarity and Word Sense Disambiguation tasks particularly in low-resource languages
Lepori and McCoy, 2020 Picking BERT’s Brain: Probing for Linguistic Dependencies in Contextualized Embeddings Using Representational Similarity Analysis Investigates the extent to which BERT embeddings represent syntactic dependencies

Clone this wiki locally