-
Notifications
You must be signed in to change notification settings - Fork 1
Readings
WesScivetti edited this page Jan 5, 2021
·
22 revisions
| Home | Data | Readings |
|---|
Use this space for collecting papers related to the project. As you find papers, please add them to this list, and we will decide what we want to read and discuss together.
| PaperRef | Title and link | Notes: why interesting? |
|---|---|---|
| Ruder et al. 2019 | A Survey of Cross-Lingual Word Embedding Models | excellent survey of methods for learning cross-lingual WEs |
| Kelly et al. 2019 | Sigmorphon 2019 Task 2 system description paper: Morphological analysis in context for many languages, with supervision from only a few | System description submission for our system for Sigmorphon Shared Task 2 2019 |
| Mikolov, 2013a | Efficient Estimation of Word Representations in Vector Space | Introduction of Word2Vec, a pair of early (and still relevant) word embedding algorithms |
| Mikolov, 2013b | Exploiting Similarities among Languages for Machine Translation | Introduction of translation matrices, allowing for crosslingual transfer from one language to another |
| Arora, 2020 | iNLTK: Natural Language Toolkit for Indic Languages | Introduces iNLTK, a toolkit with pre-trained models for 13 major languages of India |
| Islam et al. 2017 | A Study on Various Applications of NLP Developed for North-East Languages | Gives an overview of previous work done in NE India on Electronic Dictionaries, POS Tagging, Machine Translation, WordNet, and Word Sense Disambiguation |
| PaperRef | Title and link | Notes: |
|---|---|---|
| Agić & Vulić, 2019 | JW300: A Wide-Coverage Parallel Corpus for Low-Resource Languages | paper introducing the JW300 corpus |
| Sylak-Glassman, 2016 | The Composition and Use of the Universal Morphological Feature Schema (UniMorph Schema) | The Unimorph Schema and Description |
| PaperRef | Title and Link | Notes: |
|---|---|---|
| Wu & Drezde, 2020 | Are All Languages Created Equal in Multilingual BERT? | Compares performance of mBERT on high and low resource languages |
| Devlin et al. 2018 | BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | Introduces BERT as a state-of-the-art language model |
| Lin et al. 2019 | Choosing Transfer Languages for Cross-Lingual Learning | Explores typological and other considerations for choosing a higher resource transfer language for a low resource language model |
| Karthikeyan et al. 2019 | CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY | Examines effects of model depth and lexical overlap regarding performance of mBERT on Spanish, Hindi, and Russian |
| Wang et al. 2020 | Extending Multilingual BERT to Low-Resource Languages | Investigates methods for transferring mBERT onto 16 languages outside of the 104 used in mBERT pretraining |
| Pires et al. 2019 | How multilingual is Multilingual BERT? | Perform probing experiments for zero-shot transfer of fine-tuning in mBERT |
| Gerz et al. 2020 | On the Relation between Linguistic Typology and (Limitations of) Multilingual Language Modeling | Investigate performance of Language Models across morphologically diverse languages |
| Lane & Bird, 2020 | Bootstrapping Techniques for Polysynthetic Morphological Analysis | Create a morphological analyzer for polysynthetic languages and demonstrate its performance on Kunwinjku |
| Lane & Bird, 2020 | Interactive Word Completion for Morphologically Complex Languages | Develop a morphologically aware text input mechanism for polysynthetic languages |
| Bird, 2020 | Decolonising Speech and Language Technology | Investigates Colonial ideologies in Language Technology and discusses postcolonial and cooperative approaches which prioritize agency for communities |
| Mosbach et al. 2020 | A Closer Look at Linguistic Knowledge in Masked Language Models:The Case of Relative Clauses in American English | Investigates grammatical and semantic knowledge of popular pre-trained models on relative clauses in English |
| Kanojia et al. 2020 | Harnessing Cross-lingual Features to Improve Cognate Detection for Low-resource Languages | Investigates cross-lingual embeddings as a means of detecting cognates in 14 Indian langauges |
| Ramachandran & de Melo, 2020 | Cross-Lingual Emotion Lexicon Induction using Representation Alignment in Low-Resource Settings | Derive and Evaluate emotion lexicons in 350 low-resource languages using cross-lingual aligned word embeddings |
| Zhai et al. 2020 | Detecting Non-literal Translations by Fine-tuning Cross-lingual Pre-trained Language Models | Automatically detects non-literal translations in parallel corpora by fine-tuning a pre-trained cross-lingual Language Model |
| Saunack et al. 2020 | Analysing cross-lingual transfer in lemmatisation for Indian languages | Uses cross-lingual transfer learning for lemmatisation in 6 low-resource and morphologically rich Indian languages |
| Rama et al. 2020 | Probing Multilingual BERT for Genetic and Typological Signals | Investigates the importance of genetic, geographical, and typological information in the represenations of mBERT by automatically generating language trees and conducting experiments to measure a word's diachronic meaning stability |
| Grießhaber et al. 2020 | Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning | Uses pool-based active learning to fine-tune BERT for low-resource downstream tasks |
| Liang et al. 2020 | Monolingual and Multilingual Reduction of Gender Bias in Contextualized Representations | Proposes a novel technique for identifying and reducing gender bias in BERT and tests this strategy in monolingual and multilingual contexts |
| Do et al. 2020 | To What Degree Can Language Borders Be Blurred In BERT-based Multilingual Spoken Language Understanding? | Analyzes transfer of knowledge between languages in a BERT-based Multilingual model for spoken language |
| Lepori, 2020 | Unequal Representations: Analyzing Intersectional Biases in Word Embeddings Using Representational Similarity Analysis | Investigates representational differences in word embeddings for evidence of intersectional biases toward Black women |
| Conia & Navigli, 2020 | Conception: Multilingually-Enhanced, Human-Readable Concept Vector Representations | Creates language independent multilingual vector representations for semantic concepts which are leveraged to improve performance on Word Similarity and Word Sense Disambiguation tasks particularly in low-resource languages |
| Lepori and McCoy, 2020 | Picking BERT’s Brain: Probing for Linguistic Dependencies in Contextualized Embeddings Using Representational Similarity Analysis | Investigates the extent to which BERT embeddings represent syntactic dependencies |