This repository contains scripts and helpers to study intextual relations in the books of the Travelogues Corpus.
!!! Work in progress !!!
The tools folder contains scripts and helpers for studying intertextual relations between texts.
text-reusecontains code to detect verbatim text re-usespatial-focuscontains tools for geocoding and studying the spatial focus of textstopicscontains helpers for topic modelling
Detailed descriptions can be found in the README files in the specific subfolders.
The sample-data folder contains useful test material, sampled from the corpus.
random-samples18 random works sampled from the corpustwo-relatedtwo travelogues from 17th and 18th century, respectively (AC08439291 and AC10001446) where the later work is known to refer to the earlier onethree-17cthree selected travelogues from the 17th century (AC09750782, AC10232182, AC10307407)near-duplicatestwo selected texts that are known to contain a high amount of text re-use (~20%)
Contains sample results.
- Plot similarity CSVs as networks
- Plot similarity CSVs as histograms
- Implement spatial similarity through a nearest-neighbour approach
- Draft a report/workshop paper