The script file "text_lexical_proportions_analysis.py" contains the main code use in the article "Vers une estimation robuste des proportions lexicales" (submitted to JADT 2020).
This code will analyse a text file (by default, the Jule Vernes novel "De la Terre à la Lune" found on http://www.gutenberg.org/files/799/799-0.txt) to compute lexical proportions for different word properties. It will need a file containing french stopwords (we use here the file provided by Jacques Savoy http://members.unine.ch/jacques.savoy/clef/frenchST.txt).
The 3 lexical proportions studied in this code are :
- The proportion of stopwords.
- The proportion of words longer than the mean length.
- The proportion of words with an even length.
Link to the article will come in the future.