-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathresources.qmd
More file actions
1266 lines (1105 loc) · 51.8 KB
/
resources.qmd
File metadata and controls
1266 lines (1105 loc) · 51.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Resources"
subtitle: "A curated guide to tools, corpora, courses, and communities for language technology and text analysis"
toc: true
toc-depth: 2
---
```{=html}
<style>
/* ── Resource cards (generic) ────────────────────────────────────── */
.resource-grid {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(240px, 1fr));
gap: 16px;
margin: 1.5rem 0;
}
.resource-card {
background: #fff;
border: 1px solid #e8e4f0;
border-top: 4px solid #51247A;
border-radius: 6px;
padding: 18px;
}
.resource-card.aqua { border-top-color: #00A2C7; }
.resource-card.magenta { border-top-color: #962A8B; }
.resource-card.blue { border-top-color: #4085C6; }
.resource-card.green { border-top-color: #2EA836; }
.resource-card h4 { margin: 0 0 6px 0; font-size: 0.95rem; font-weight: 700; color: #51247A; }
.resource-card h4 a { color: #51247A; text-decoration: none; }
.resource-card h4 a:hover { color: #00A2C7; }
.resource-card .rc-badge {
display: inline-block;
font-size: 0.7rem;
padding: 2px 8px;
border-radius: 20px;
font-weight: 600;
margin-bottom: 6px;
}
.rc-badge.free { background: #e6f7ea; color: #1a7a22; }
.rc-badge.paid { background: #fff0e6; color: #a04a00; }
.rc-badge.partial { background: #fef9e6; color: #8a6a00; }
.resource-card p { margin: 0 0 8px 0; font-size: 0.85rem; color: #444; line-height: 1.55; }
.resource-card ul { margin: 0; padding-left: 16px; }
.resource-card ul li { font-size: 0.82rem; color: #555; line-height: 1.5; margin-bottom: 2px; }
.resource-card .rc-link {
display: inline-block;
margin-top: 10px;
font-size: 0.78rem;
color: #51247A;
font-weight: 600;
text-decoration: none;
}
.resource-card .rc-link:hover { color: #00A2C7; }
/* ── Section intro strip ─────────────────────────────────────────── */
.section-intro {
background: #f7f5fb;
border-left: 4px solid #51247A;
border-radius: 4px;
padding: 14px 20px;
margin: 0 0 1.5rem 0;
font-size: 0.875rem;
color: #444;
line-height: 1.6;
}
.section-intro a { color: #51247A; font-weight: 600; }
/* ── Tool feature list ───────────────────────────────────────────── */
.tool-detail {
background: #fff;
border: 1px solid #e8e4f0;
border-radius: 8px;
overflow: hidden;
margin-bottom: 16px;
}
.tool-detail-header {
background: #51247A;
padding: 16px 22px;
display: flex;
align-items: center;
gap: 14px;
}
.tool-detail-header.aqua { background: #007a9a; }
.tool-detail-header.magenta { background: #7a2272; }
.tool-detail-header h4 { margin: 0; color: white; font-size: 1rem; font-weight: 700; }
.tool-detail-header .tl-sub { color: rgba(255,255,255,0.75); font-size: 0.8rem; margin: 0; }
.tool-detail-header .tl-badge {
margin-left: auto;
font-size: 0.72rem;
padding: 3px 10px;
border-radius: 20px;
font-weight: 600;
white-space: nowrap;
flex-shrink: 0;
}
.tl-badge.free { background: #2EA836; color: white; }
.tl-badge.paid { background: #EB602B; color: white; }
.tl-badge.partial { background: #FBB800; color: #333; }
.tool-detail-body { padding: 18px 22px; }
.tool-detail-body p { font-size: 0.875rem; color: #444; line-height: 1.6; margin: 0 0 10px 0; }
.tool-detail-body ul { margin: 0 0 10px 0; padding-left: 18px; }
.tool-detail-body ul li { font-size: 0.875rem; color: #444; line-height: 1.6; margin-bottom: 3px; }
.tool-tags { display: flex; flex-wrap: wrap; gap: 6px; margin-top: 10px; }
.tool-tag {
background: #f0eaf7; color: #51247A;
font-size: 0.72rem; padding: 2px 9px;
border-radius: 20px; font-weight: 500;
}
.tool-link {
display: inline-block; margin-top: 10px;
background: #51247A; color: white !important;
font-size: 0.78rem; padding: 5px 14px;
border-radius: 4px; text-decoration: none !important;
font-weight: 500;
}
.tool-link:hover { background: #00A2C7; }
/* ── Learning path ───────────────────────────────────────────────── */
.learning-path {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
gap: 0;
margin: 1.5rem 0;
border: 1px solid #e8e4f0;
border-radius: 8px;
overflow: hidden;
}
.lp-step {
padding: 20px;
border-right: 1px solid #e8e4f0;
position: relative;
}
.lp-step:last-child { border-right: none; }
.lp-step-num {
background: #51247A; color: white;
font-size: 0.75rem; font-weight: 700;
width: 26px; height: 26px; border-radius: 50%;
display: flex; align-items: center; justify-content: center;
margin-bottom: 10px;
}
.lp-step h4 { margin: 0 0 6px 0; font-size: 0.875rem; color: #51247A; font-weight: 700; }
.lp-step p { margin: 0; font-size: 0.8rem; color: #555; line-height: 1.5; }
.lp-step ul { margin: 6px 0 0 0; padding-left: 16px; }
.lp-step ul li { font-size: 0.8rem; color: #555; line-height: 1.5; margin-bottom: 2px; }
/* ── Corpus table ────────────────────────────────────────────────── */
.corpus-grid {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(220px, 1fr));
gap: 12px;
margin: 1.5rem 0;
}
.corpus-card {
background: #fff;
border: 1px solid #e8e4f0;
border-radius: 6px;
padding: 14px 16px;
}
.corpus-card h4 { margin: 0 0 4px 0; font-size: 0.875rem; font-weight: 700; color: #51247A; }
.corpus-card h4 a { color: #51247A; text-decoration: none; }
.corpus-card h4 a:hover { color: #00A2C7; }
.corpus-card .c-meta { font-size: 0.78rem; color: #888; margin: 0 0 6px 0; }
.corpus-card p { margin: 0; font-size: 0.8rem; color: #555; line-height: 1.5; }
/* ── Compact link list ───────────────────────────────────────────── */
.link-list {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(220px, 1fr));
gap: 10px;
margin: 1.5rem 0;
}
.link-item {
background: #f7f5fb;
border-radius: 4px;
padding: 10px 14px;
display: flex;
align-items: flex-start;
gap: 10px;
}
.link-item .li-icon { font-size: 1rem; flex-shrink: 0; margin-top: 1px; }
.link-item a { font-weight: 600; font-size: 0.875rem; color: #51247A; text-decoration: none; display: block; }
.link-item a:hover { color: #00A2C7; }
.link-item p { margin: 2px 0 0 0; font-size: 0.78rem; color: #666; line-height: 1.4; }
/* ── CTA banner ──────────────────────────────────────────────────── */
.cta-banner {
background: linear-gradient(135deg, #51247A 0%, #3d1a5e 100%);
color: white; border-radius: 8px;
padding: 32px 40px;
display: flex; align-items: center;
justify-content: space-between; flex-wrap: wrap; gap: 20px;
margin: 2rem 0;
}
.cta-banner h3 { margin: 0 0 6px 0; color: white; font-size: 1.2rem; }
.cta-banner p { margin: 0; opacity: 0.85; font-size: 0.875rem; }
.cta-actions { display: flex; gap: 10px; flex-wrap: wrap; }
.btn-aqua {
background: #00A2C7; color: white !important;
padding: 10px 20px; border-radius: 4px; font-weight: 600;
font-size: 0.875rem; text-decoration: none !important; white-space: nowrap;
}
.btn-aqua:hover { background: #008faf; }
.btn-ghost-white {
background: transparent; color: white !important;
padding: 10px 20px; border-radius: 4px; font-weight: 600;
font-size: 0.875rem; text-decoration: none !important;
border: 2px solid rgba(255,255,255,0.5); white-space: nowrap;
}
.btn-ghost-white:hover { border-color: white; background: rgba(255,255,255,0.1); }
@media (max-width: 640px) {
.cta-banner { flex-direction: column; }
.learning-path { grid-template-columns: 1fr; }
.lp-step { border-right: none; border-bottom: 1px solid #e8e4f0; }
.lp-step:last-child { border-bottom: none; }
}
</style>
```
{width="30%" style="float:right; padding:10px"}
This curated collection brings together the best resources for language technology, text analytics, corpus linguistics, natural language processing, and computational methods in the humanities and social sciences. Whether you're a complete beginner or an experienced researcher, you'll find tools, tutorials, datasets, and communities to support your work.
---
## LDaCA: LADAL's Home Platform {#ldaca}
LADAL is part of the [**Language Data Commons of Australia (LDaCA)**](https://ldaca.edu.au/), a national research infrastructure providing access to language data and text analysis tools for Australian researchers. LDaCA emerged from the merger of the Australian Text Analytics Platform (ATAP) and PARADISEC, bringing together text analytics infrastructure and endangered language archives.
```{=html}
<div class="resource-grid">
<div class="resource-card">
<h4>🗂️ Data Discovery</h4>
<p>Search and access diverse language datasets including Australian English, Indigenous languages, migrant languages, oral history collections, and social media data.</p>
</div>
<div class="resource-card aqua">
<h4>📓 Jupyter Notebooks</h4>
<p>Browser-based interactive coding environment — no installation needed. Ready-to-use text analysis capabilities for researchers without strong coding backgrounds.</p>
</div>
<div class="resource-card magenta">
<h4>🎓 Training & Support</h4>
<p>Workshops, tutorials, documentation, and community support for language data researchers nationally. Free for Australian researchers.</p>
</div>
</div>
<p style="font-size:0.875rem; color:#555; margin-top:-0.5rem;">
Visit <a href="https://ldaca.edu.au/" style="color:#51247A; font-weight:600;">ldaca.edu.au</a> to explore data collections, create a free account, and access tools and training.
</p>
```
---
## Concordancing and Corpus Tools {#concordance-tools}
```{=html}
<div class="tool-detail">
<div class="tool-detail-header">
<div>
<h4>AntConc and AntLab Suite</h4>
<p class="tl-sub">Laurence Anthony · Waseda University</p>
</div>
<span class="tl-badge free">Free</span>
</div>
<div class="tool-detail-body">
<p><a href="https://www.laurenceanthony.net/software/antconc/" style="color:#51247A;font-weight:600;">AntConc</a> is the most widely-used free concordancing tool worldwide, cross-platform (Windows, Mac, Linux) and ideal for teaching and research.</p>
<p><strong>Core features:</strong> KWIC concordance · Collocates · Word clusters (n-grams) · Keyword analysis · Dispersion plots</p>
<p><strong>Other AntLab tools:</strong></p>
<ul>
<li><strong>AntFileConverter</strong> — convert PDF/Word/Excel to plain text</li>
<li><strong>AntPConc</strong> — parallel concordancer for translation studies</li>
<li><strong>AntWordProfiler</strong> — vocabulary profiling and analysis</li>
<li><strong>AntGram</strong> — n-gram and word frequency analysis</li>
<li><strong>FireAnt</strong> — download and organise online texts</li>
<li><strong>EncodeAnt</strong>, <strong>VariAnt</strong>, <strong>AntFileSplitter</strong>, <strong>AntMover</strong></li>
</ul>
<div class="tool-tags">
<span class="tool-tag">Concordancing</span>
<span class="tool-tag">Collocations</span>
<span class="tool-tag">Keywords</span>
<span class="tool-tag">Beginner-friendly</span>
</div>
<a href="https://www.laurenceanthony.net/software/antconc/" class="tool-link">Download AntConc →</a>
</div>
</div>
<div class="tool-detail">
<div class="tool-detail-header aqua">
<div>
<h4>#LancsBox</h4>
<p class="tl-sub">Lancaster University</p>
</div>
<span class="tl-badge free">Free</span>
</div>
<div class="tool-detail-body">
<p><a href="http://corpora.lancs.ac.uk/lancsbox/" style="color:#51247A;font-weight:600;">#LancsBox</a> is a next-generation corpus toolkit combining ease-of-use with advanced functionality and beautiful visualisations.</p>
<ul>
<li><strong>GraphColl</strong> — visualise collocational networks</li>
<li><strong>Whelk</strong> — powerful regex search</li>
<li><strong>Wizard</strong> — guided analysis for beginners</li>
<li>Built-in statistical tests · Multi-language support</li>
</ul>
<div class="tool-tags">
<span class="tool-tag">Network visualisation</span>
<span class="tool-tag">Collocations</span>
<span class="tool-tag">Regex</span>
</div>
<a href="http://corpora.lancs.ac.uk/lancsbox/" class="tool-link">Visit #LancsBox →</a>
</div>
</div>
<div class="tool-detail">
<div class="tool-detail-header magenta">
<div>
<h4>Sketch Engine</h4>
<p class="tl-sub">Lexical Computing</p>
</div>
<span class="tl-badge partial">Free trial</span>
</div>
<div class="tool-detail-body">
<p><a href="https://www.sketchengine.eu/" style="color:#51247A;font-weight:600;">Sketch Engine</a> is a comprehensive commercial platform with 90+ pre-loaded corpora and web access.</p>
<ul>
<li><strong>Word Sketches</strong> — grammatical/collocational summaries at a glance</li>
<li>Corpus building, terminology extraction, parallel corpora</li>
<li>Team collaboration and sharing features</li>
<li>Best for multilingual research and large-scale projects</li>
</ul>
<div class="tool-tags">
<span class="tool-tag">Multilingual</span>
<span class="tool-tag">Word sketches</span>
<span class="tool-tag">Terminology</span>
<span class="tool-tag">Translation</span>
</div>
<a href="https://www.sketchengine.eu/" class="tool-link">Visit Sketch Engine →</a>
</div>
</div>
<div class="tool-detail">
<div class="tool-detail-header">
<div>
<h4>WordSmith Tools</h4>
<p class="tl-sub">Mike Scott · Lexically.net</p>
</div>
<span class="tl-badge paid">Paid licence</span>
</div>
<div class="tool-detail-body">
<p><a href="https://lexically.net/wordsmith/" style="color:#51247A;font-weight:600;">WordSmith Tools</a> is the professional standard for corpus analysis, trusted by researchers since 1996. Windows only (runs via Wine on Mac).</p>
<ul>
<li><strong>Concord</strong> — concordancing with sophisticated search</li>
<li><strong>KeyWords</strong> — statistical keyword extraction</li>
<li><strong>WordList</strong> — frequency lists and statistics</li>
<li>Dispersion plots · Collocate analysis · Batch processing</li>
</ul>
<div class="tool-tags">
<span class="tool-tag">Professional</span>
<span class="tool-tag">Keywords</span>
<span class="tool-tag">Statistics</span>
<span class="tool-tag">Windows</span>
</div>
<a href="https://lexically.net/wordsmith/" class="tool-link">Visit WordSmith →</a>
</div>
</div>
```
### Online Concordancers {#online-concordancers}
No installation needed — use these directly in your browser.
```{=html}
<div class="resource-grid">
<div class="resource-card">
<span class="rc-badge free">Free with registration</span>
<h4><a href="https://www.english-corpora.org/">BYU Corpora Family</a></h4>
<p>COCA (1 billion words, 1990–present), COHA (400M words, 1820s–present), NOW Corpus, TV/Movie/Wikipedia corpora. Genre and time filtering.</p>
</div>
<div class="resource-card aqua">
<span class="rc-badge free">Free</span>
<h4><a href="https://www.lextutor.ca/">Lextutor</a></h4>
<p>Web-based concordancers, vocabulary profilers, and multiple corpora. Excellent for language learning and teaching.</p>
</div>
</div>
```
---
## Text Analysis and NLP Tools {#nlp-tools}
```{=html}
<div class="resource-grid">
<div class="resource-card">
<span class="rc-badge free">Free</span>
<h4><a href="https://voyant-tools.org/">Voyant Tools</a></h4>
<p>Zero installation — works in browser. Upload texts instantly for word clouds, trend graphs, network visualisations, and more. Perfect for digital humanities and teaching.</p>
<div class="tool-tags" style="margin-top:8px">
<span class="tool-tag">Browser-based</span>
<span class="tool-tag">Visualisation</span>
<span class="tool-tag">Beginner</span>
</div>
</div>
<div class="resource-card aqua">
<span class="rc-badge free">Free</span>
<h4><a href="https://orangedatamining.com/">Orange Data Mining</a></h4>
<p>Visual drag-and-drop tool for text analytics and machine learning — no coding required. Topic modeling (LDA), sentiment analysis, document clustering, word clouds.</p>
<div class="tool-tags" style="margin-top:8px">
<span class="tool-tag">No-code</span>
<span class="tool-tag">Topic modelling</span>
<span class="tool-tag">ML</span>
</div>
</div>
<div class="resource-card magenta">
<span class="rc-badge free">Free</span>
<h4><a href="https://gate.ac.uk/">GATE</a></h4>
<p>Open-source platform for large-scale NLP pipelines: information extraction, named entity recognition, relation extraction, semantic annotation, language identification.</p>
<div class="tool-tags" style="margin-top:8px">
<span class="tool-tag">NLP pipelines</span>
<span class="tool-tag">NER</span>
<span class="tool-tag">Annotation</span>
</div>
</div>
<div class="resource-card blue">
<span class="rc-badge free">Free</span>
<h4><a href="https://spacy.io/">spaCy</a></h4>
<p>Industrial-strength Python NLP library. Tokenisation, POS tagging, NER, dependency parsing, word vectors. Pre-trained models for 60+ languages. Fast and production-ready.</p>
<div class="tool-tags" style="margin-top:8px">
<span class="tool-tag">Python</span>
<span class="tool-tag">60+ languages</span>
<span class="tool-tag">Production</span>
</div>
</div>
<div class="resource-card">
<span class="rc-badge free">Free</span>
<h4><a href="https://www.nltk.org/">NLTK</a></h4>
<p>Python's foundational NLP learning platform. Comprehensive tutorials, many datasets included, wide range of algorithms. Ideal for learning NLP from scratch.</p>
<div class="tool-tags" style="margin-top:8px">
<span class="tool-tag">Python</span>
<span class="tool-tag">Educational</span>
<span class="tool-tag">Beginner</span>
</div>
</div>
<div class="resource-card aqua">
<span class="rc-badge free">Free</span>
<h4><a href="https://stanfordnlp.github.io/CoreNLP/">Stanford CoreNLP</a></h4>
<p>State-of-the-art NLP suite: tokenisation, POS, NER, parsing, coreference resolution, sentiment. Accessible via online demo, command line, Java, Python (stanza), or R.</p>
<div class="tool-tags" style="margin-top:8px">
<span class="tool-tag">Multi-language access</span>
<span class="tool-tag">Coreference</span>
<span class="tool-tag">Parsing</span>
</div>
</div>
</div>
```
### Specialised Tools {#specialized-tools}
```{=html}
<div class="resource-grid">
<div class="resource-card">
<span class="rc-badge free">Free</span>
<h4><a href="https://github.com/booknlp/booknlp">BookNLP</a></h4>
<p>NLP pipeline designed specifically for books and long documents. Character name clustering, speaker identification, referential gender inference, event tagging. GPU and CPU models available.</p>
<div class="tool-tags" style="margin-top:8px">
<span class="tool-tag">Literary analysis</span>
<span class="tool-tag">Character analysis</span>
<span class="tool-tag">Python</span>
</div>
</div>
<div class="resource-card aqua">
<span class="rc-badge partial">Web demo free</span>
<h4><a href="http://ucrel.lancs.ac.uk/claws/">CLAWS POS Tagger</a></h4>
<p>Lancaster's world-leading POS tagger — 96–97% accuracy. Tagged the British National Corpus. Web demo and batch processing available. Multiple tagsets.</p>
<div class="tool-tags" style="margin-top:8px">
<span class="tool-tag">POS tagging</span>
<span class="tool-tag">High accuracy</span>
</div>
</div>
<div class="resource-card magenta">
<span class="rc-badge partial">Academic free</span>
<h4><a href="http://ucrel.lancs.ac.uk/usas/">USAS Semantic Tagger</a></h4>
<p>UCREL Semantic Analysis System — automatic semantic tagging across 21 major discourse fields, multi-word expression recognition, multiple languages. See also <a href="http://ucrel.lancs.ac.uk/wmatrix/">WMatrix</a> for corpus comparison.</p>
<div class="tool-tags" style="margin-top:8px">
<span class="tool-tag">Semantic tagging</span>
<span class="tool-tag">Discourse</span>
</div>
</div>
<div class="resource-card blue">
<span class="rc-badge free">Free</span>
<h4><a href="http://smartool.github.io/smartool-rus-eng/">SMARTool</a></h4>
<p>Corpus-based tool for English-speaking learners of Russian. Handles rich Russian morphology, 3,000 basic vocabulary items, frequency-based learning.</p>
<div class="tool-tags" style="margin-top:8px">
<span class="tool-tag">Russian</span>
<span class="tool-tag">Language learning</span>
</div>
</div>
</div>
```
---
## Learning Resources and Courses {#courses}
```{=html}
<div class="tool-detail">
<div class="tool-detail-header aqua">
<div>
<h4>Applied Language Technology</h4>
<p class="tl-sub">University of Helsinki · Free · Self-paced</p>
</div>
<span class="tl-badge free">Free</span>
</div>
<div class="tool-detail-body">
<p><a href="https://applied-language-technology.mooc.fi/html/index.html" style="color:#51247A;font-weight:600;">Two courses</a> designed for linguists and humanists, with Jupyter notebooks and hands-on exercises. No prior programming knowledge needed.</p>
<ul>
<li><strong>Working with Text in Python</strong> — Python basics, regular expressions, text processing</li>
<li><strong>NLP for Linguists</strong> — NLP fundamentals, machine learning basics, deep learning for NLP</li>
</ul>
<a href="https://applied-language-technology.mooc.fi/html/index.html" class="tool-link">Start Learning →</a>
</div>
</div>
<div class="tool-detail">
<div class="tool-detail-header">
<div>
<h4>Introduction to Cultural Analytics & Python</h4>
<p class="tl-sub">Melanie Walsh · Free · Online textbook</p>
</div>
<span class="tl-badge free">Free</span>
</div>
<div class="tool-detail-body">
<p><a href="https://melaniewalsh.github.io/Intro-Cultural-Analytics/welcome.html" style="color:#51247A;font-weight:600;">Outstanding textbook</a> written specifically for humanities and social science scholars. Clear explanations, engaging datasets, continuously updated.</p>
<ul>
<li>Python basics · Text analysis and NLP · Social media analysis</li>
<li>Network analysis · Mapping · Web scraping · Data visualisation</li>
</ul>
<a href="https://melaniewalsh.github.io/Intro-Cultural-Analytics/welcome.html" class="tool-link">Read the Textbook →</a>
</div>
</div>
<div class="tool-detail">
<div class="tool-detail-header magenta">
<div>
<h4>The Programming Historian</h4>
<p class="tl-sub">Peer-reviewed · Available in EN, ES, FR, PT</p>
</div>
<span class="tl-badge free">Free</span>
</div>
<div class="tool-detail-body">
<p><a href="https://programminghistorian.org/en/lessons/" style="color:#51247A;font-weight:600;">Peer-reviewed, collaborative</a> lessons for digital humanists in Python, R, JavaScript, and more. Covers data management, distant reading, network analysis, mapping, GIS, web scraping, and visualisation.</p>
<a href="https://programminghistorian.org/en/lessons/" class="tool-link">Browse Lessons →</a>
</div>
</div>
```
### R and Statistics {#r-statistics}
```{=html}
<div class="resource-grid">
<div class="resource-card">
<span class="rc-badge free">Free</span>
<h4><a href="https://r4ds.had.co.nz/">R for Data Science</a></h4>
<p>By Hadley Wickham & Garrett Grolemund. Modern data science workflow using the tidyverse ecosystem. The standard starting point for R learners.</p>
</div>
<div class="resource-card aqua">
<span class="rc-badge free">Free</span>
<h4><a href="https://www.tidytextmining.com/">Text Mining with R</a></h4>
<p>By Julia Silge & David Robinson. Tidy approach to text analysis with practical, reproducible examples throughout.</p>
</div>
<div class="resource-card magenta">
<span class="rc-badge free">Free</span>
<h4><a href="https://tutorials.quanteda.io/">Quanteda Tutorials</a></h4>
<p>Official quanteda documentation with comprehensive corpus analysis guides — our recommended R framework for text analysis.</p>
</div>
<div class="resource-card blue">
<span class="rc-badge free">Free</span>
<h4><a href="https://adv-r.hadley.nz/">Advanced R</a></h4>
<p>By Hadley Wickham. Deep dive into R programming for experienced users wanting to understand the language fully.</p>
</div>
<div class="resource-card">
<span class="rc-badge free">Free</span>
<h4><a href="http://www.sthda.com/english/">STHDA</a></h4>
<p>Comprehensive R tutorials for statistical methods, data visualisation with ggplot2, and machine learning.</p>
</div>
<div class="resource-card aqua">
<span class="rc-badge free">Free</span>
<h4><a href="https://www.statmethods.net/">Quick-R</a></h4>
<p>Quick-reference R code snippets for data management, statistics, and visualisation by Rob Kabacoff.</p>
</div>
</div>
```
### Specialist Training Platforms {#specialist-training}
```{=html}
<div class="resource-grid">
<div class="resource-card">
<span class="rc-badge free">Free</span>
<h4><a href="https://glam-workbench.net/">GLAM Workbench</a></h4>
<p>Tools and tutorials for working with data from galleries, libraries, archives, and museums in Australia and New Zealand. Jupyter notebooks — click and run, no installation needed.</p>
</div>
<div class="resource-card aqua">
<span class="rc-badge free">Free</span>
<h4><a href="http://tapor.ca/home">TAPoR 3</a></h4>
<p>Curated directory of 1,500+ text analysis research tools with descriptions, reviews, and comparison features. Search by analysis type, platform, language, cost, or discipline.</p>
</div>
<div class="resource-card magenta">
<span class="rc-badge partial">Paid</span>
<h4><a href="http://ucrel.lancs.ac.uk/summerschool/">Lancaster Summer Schools</a></h4>
<p>Annual intensive corpus linguistics courses, beginner to advanced, with expert instruction and hands-on training at Lancaster University.</p>
</div>
</div>
```
---
## Research Centres and Labs {#centers}
```{=html}
<div class="resource-grid">
<div class="resource-card">
<h4><a href="http://ucrel.lancs.ac.uk/">UCREL, Lancaster University</a></h4>
<p>World-leading corpus linguistics research centre. Developers of CLAWS, USAS, and Wmatrix. Home of the British National Corpus and annual summer schools.</p>
<div class="tool-tags" style="margin-top:8px">
<span class="tool-tag">Corpus linguistics</span>
<span class="tool-tag">NLP tools</span>
</div>
</div>
<div class="resource-card aqua">
<h4><a href="https://www.helsinki.fi/en/researchgroups/varieng">VARIENG, University of Helsinki</a></h4>
<p>Research Unit for Variation, Contacts and Change in English. Home of the Helsinki Corpus, Corpus of Early English Correspondence, and multiple parsed corpora.</p>
<div class="tool-tags" style="margin-top:8px">
<span class="tool-tag">Language change</span>
<span class="tool-tag">Historical corpora</span>
</div>
</div>
<div class="resource-card magenta">
<h4><a href="https://sydneycorpuslab.com/">Sydney Corpus Lab</a></h4>
<p>Promotes corpus linguistics across Australia — workshops, training, research collaboration, and community building for corpus linguists.</p>
<div class="tool-tags" style="margin-top:8px">
<span class="tool-tag">Australia</span>
<span class="tool-tag">Community</span>
</div>
</div>
<div class="resource-card blue">
<h4><a href="https://www.cl.uzh.ch/en/TCC.html">Text Crunching Centre, University of Zurich</a></h4>
<p>NLP expertise as a service: consulting, custom tool development, text processing pipelines, sentiment analysis, topic modelling, and named entity recognition.</p>
<div class="tool-tags" style="margin-top:8px">
<span class="tool-tag">NLP consulting</span>
<span class="tool-tag">Custom pipelines</span>
</div>
</div>
<div class="resource-card">
<h4><a href="https://site.uit.no/acqvalab">AcqVA Aurora Lab, UiT</a></h4>
<p>Research in language acquisition, variation, and attrition. Offers methodological consultation, data collection facilities, and analysis support.</p>
<div class="tool-tags" style="margin-top:8px">
<span class="tool-tag">Acquisition</span>
<span class="tool-tag">Psycholinguistics</span>
</div>
</div>
<div class="resource-card aqua">
<h4><a href="https://litlab.stanford.edu/">Stanford Literary Lab</a></h4>
<p>Computational literary studies using quantitative methods. Research publications and innovative approaches to large-scale literary analysis.</p>
<div class="tool-tags" style="margin-top:8px">
<span class="tool-tag">Digital humanities</span>
<span class="tool-tag">Literary studies</span>
</div>
</div>
<div class="resource-card magenta">
<h4><a href="https://leibniz-hbi.de/en/research/research-programmes/media-research-methods-lab">Media Research Methods Lab, HBI</a></h4>
<p>Computational social science, social media analysis, automated content analysis, and network analysis at Leibniz Institute for Media Research.</p>
<div class="tool-tags" style="margin-top:8px">
<span class="tool-tag">Social media</span>
<span class="tool-tag">Computational SS</span>
</div>
</div>
<div class="resource-card blue">
<h4><a href="http://www.nactem.ac.uk/">NaCTeM</a></h4>
<p>UK's first publicly-funded text mining centre. Software tools, training materials, and services in literature-based discovery and biomedical text mining.</p>
<div class="tool-tags" style="margin-top:8px">
<span class="tool-tag">Text mining</span>
<span class="tool-tag">Biomedical NLP</span>
</div>
</div>
</div>
```
---
## Corpora and Datasets {#corpora}
### Major English Corpora
```{=html}
<div class="corpus-grid">
<div class="corpus-card">
<h4><a href="http://www.natcorp.ox.ac.uk/">British National Corpus (BNC)</a></h4>
<div class="c-meta">100M words · 1980s–90s · Written & spoken · POS tagged</div>
<p>Gold standard reference corpus for British English. BNC2014 (100M words, 2010s) available for modern comparisons.</p>
</div>
<div class="corpus-card">
<h4><a href="https://www.english-corpora.org/coca/">COCA</a></h4>
<div class="c-meta">1 billion words · 1990–2019 · Free with registration</div>
<p>Corpus of Contemporary American English. Balanced across spoken, fiction, magazines, newspapers, and academic genres.</p>
</div>
<div class="corpus-card">
<h4><a href="https://www.ice-corpora.uzh.ch/en.html">International Corpus of English (ICE)</a></h4>
<div class="c-meta">1M words per variety · 20+ national varieties</div>
<p>Parallel corpora across national varieties including GB, USA, Ireland, Canada, India, Hong Kong, and more. Comparable structure, grammatically annotated.</p>
</div>
<div class="corpus-card">
<h4><a href="https://books.google.com/ngrams">Google Books Ngram Corpus</a></h4>
<div class="c-meta">Trillions of words · 1500–2019 · Multiple languages</div>
<p>Phrase frequency over time across multiple languages. Excellent for diachronic studies. Dataset downloadable for offline analysis.</p>
</div>
</div>
```
### Historical Corpora
```{=html}
<div class="corpus-grid">
<div class="corpus-card">
<h4><a href="https://www.english-corpora.org/coha/">COHA</a></h4>
<div class="c-meta">400M words · 1820s–2000s · Balanced by decade</div>
<p>Corpus of Historical American English. Fiction, magazines, newspapers, and non-fiction for tracking language change over two centuries.</p>
</div>
<div class="corpus-card">
<h4><a href="https://www.textcreationpartnership.org/tcp-eebo/">Early English Books Online (EEBO)</a></h4>
<div class="c-meta">1473–1700 · 25,000+ texts</div>
<p>Covers the beginnings of English printing. Critical for historical linguistics and early modern English research.</p>
</div>
<div class="corpus-card">
<h4>Corpus of English Dialogues (CED)</h4>
<div class="c-meta">1560–1760 · University of Helsinki</div>
<p>Trial proceedings, drama, and didactic works representing real and simulated conversation in Early Modern English.</p>
</div>
</div>
```
### Specialised Corpora
```{=html}
<div class="corpus-grid">
<div class="corpus-card">
<h4><a href="https://quod.lib.umich.edu/m/micase/">MICASE</a></h4>
<div class="c-meta">1.8M words · Academic spoken English</div>
<p>Michigan Corpus of Academic Spoken English. Lectures, seminars, and study groups. Searchable online.</p>
</div>
<div class="corpus-card">
<h4><a href="https://www.univie.ac.at/voice/">VOICE</a></h4>
<div class="c-meta">1M words · 50+ L1 backgrounds</div>
<p>Vienna-Oxford International Corpus of English. Face-to-face interaction in English as a Lingua Franca.</p>
</div>
<div class="corpus-card">
<h4><a href="https://childes.talkbank.org/">CHILDES</a></h4>
<div class="c-meta">50+ languages · Longitudinal</div>
<p>Child Language Data Exchange System. Transcription standards and CLAN analysis tools for language acquisition research.</p>
</div>
<div class="corpus-card">
<h4><a href="https://corpus.mml.cam.ac.uk/efcamdat/">EFCAMDAT</a></h4>
<div class="c-meta">70M words · 150+ countries</div>
<p>EF-Cambridge Open Language Database. Large-scale learner corpus for L2 English research.</p>
</div>
</div>
```
### Multilingual and Parallel Corpora
```{=html}
<div class="corpus-grid">
<div class="corpus-card">
<h4><a href="https://universaldependencies.org/">Universal Dependencies</a></h4>
<div class="c-meta">100+ languages · Open source</div>
<p>Syntactically annotated treebanks with cross-linguistically consistent annotation. Regular releases.</p>
</div>
<div class="corpus-card">
<h4><a href="https://corpora.uni-leipzig.de/en">Leipzig Corpora Collection</a></h4>
<div class="c-meta">136 languages · Web-crawled</div>
<p>Freely downloadable web-crawled text corpora in 136 languages of various sizes.</p>
</div>
<div class="corpus-card">
<h4><a href="https://opus.nlpl.eu/">OPUS</a></h4>
<div class="c-meta">90+ languages · Free download</div>
<p>Open parallel corpus collection: movie subtitles, Bible translations, EU documents, OpenSubtitles, and more.</p>
</div>
<div class="corpus-card">
<h4><a href="https://www.statmt.org/europarl/">Europarl</a></h4>
<div class="c-meta">21 languages · Sentence-aligned</div>
<p>European Parliament proceedings — large-scale, sentence-aligned parallel corpus for translation research and MT.</p>
</div>
</div>
```
---
## Blogs and Communities {#blogs}
```{=html}
<div class="link-list">
<div class="link-item">
<div class="li-icon">📝</div>
<div>
<a href="https://corpling.hypotheses.org/">Around the Word (Hypotheses)</a>
<p>Guillaume Desagulier's corpus linguistics notebook — usage-based methods, R for corpus analysis, cognitive linguistics, construction grammar.</p>
</div>
</div>
<div class="link-item">
<div class="li-icon">📝</div>
<div>
<a href="https://linguisticswithacorpus.wordpress.com/">Linguistics with a Corpus</a>
<p>Companion to <em>Doing Linguistics with a Corpus</em> by Egbert, Larsson & Biber. Corpus methodology, research design, statistical issues, methodological debates.</p>
</div>
</div>
<div class="link-item">
<div class="li-icon">📧</div>
<div>
<a href="https://www.hit.uib.no/corpora/">Corpora Mailing List</a>
<p>Long-running discussion list for the corpus linguistics community — tool announcements, conference info, job postings, corpus releases.</p>
</div>
</div>
<div class="link-item">
<div class="li-icon">📝</div>
<div>
<a href="https://aneesha.medium.com/">Aneesha Bakharia (Medium)</a>
<p>Data science, topic modelling, deep learning, and learning analytics with practical tutorials.</p>
</div>
</div>
<div class="link-item">
<div class="li-icon">🔬</div>
<div>
<a href="https://research.qut.edu.au/digitalobservatory/">Digital Observatory, QUT</a>
<p>Social media data collection, workshops, tool updates, open office hours, and research methods from QUT's Digital Observatory.</p>
</div>
</div>
<div class="link-item">
<div class="li-icon">📊</div>
<div>
<a href="https://simplystatistics.org/">Simply Statistics</a>
<p>Thoughtful commentary on statistics and data science from Jeff Leek, Roger Peng, and Rafa Irizarry.</p>
</div>
</div>
<div class="link-item">
<div class="li-icon">🌐</div>
<div>
<a href="https://digitalhumanitiesnow.org/">Digital Humanities Now</a>
<p>News aggregator for the DH community — project highlights, announcements, and curated content.</p>
</div>
</div>
<div class="link-item">
<div class="li-icon">📧</div>
<div>
<a href="https://dhhumanist.org/">Humanist Discussion Group</a>
<p>Long-running digital humanities mailing list (since 1987). Thoughtful, sustained discussions on DH theory and practice.</p>
</div>
</div>
</div>
```
---
## Additional Tools {#additional}
### Visualisation
```{=html}
<div class="resource-grid">
<div class="resource-card">
<span class="rc-badge free">Free</span>
<h4><a href="https://ggplot2.tidyverse.org/">ggplot2 (R)</a></h4>
<p>Grammar of Graphics implementation for R. Publication-quality, highly customisable plots. The standard for R visualisation.</p>
</div>
<div class="resource-card aqua">
<span class="rc-badge free">Free</span>
<h4><a href="https://gephi.org/">Gephi</a></h4>
<p>Interactive network visualisation for large graphs. Open source, widely used for social network and co-occurrence analysis.</p>
</div>
<div class="resource-card magenta">
<span class="rc-badge free">Free</span>
<h4><a href="https://plotly.com/r/">Plotly (R & Python)</a></h4>
<p>Interactive, web-ready graphs in both R and Python. Excellent for sharing interactive visualisations online.</p>
</div>
<div class="resource-card blue">
<span class="rc-badge free">Free</span>
<h4><a href="https://shiny.rstudio.com/">Shiny (R)</a></h4>
<p>Build interactive web apps from R — no web development skills needed. Great for sharing research tools.</p>
</div>
</div>
```
### Annotation Tools
```{=html}
<div class="resource-grid">
<div class="resource-card">
<span class="rc-badge free">Free</span>
<h4><a href="https://webanno.github.io/webanno/">WebAnno</a></h4>
<p>Web-based multi-user annotation with inter-annotator agreement metrics. Supports many annotation types.</p>
</div>
<div class="resource-card aqua">
<span class="rc-badge free">Free</span>
<h4><a href="https://inception-project.github.io/">INCEpTION</a></h4>
<p>Semantic annotation with knowledge base integration and active learning recommendations. Open source.</p>
</div>
<div class="resource-card magenta">
<span class="rc-badge free">Free</span>
<h4><a href="https://brat.nlplab.org/">brat</a></h4>
<p>Browser-based linguistic annotation for entities, relationships, and coreference chains.</p>
</div>
<div class="resource-card blue">
<span class="rc-badge paid">Paid (edu discount)</span>
<h4><a href="https://prodi.gy/">Prodigy</a></h4>
<p>Modern annotation tool with active learning for rapid, efficient data labelling. From the makers of spaCy.</p>
</div>
</div>
```
### Data Management
```{=html}
<div class="resource-grid">
<div class="resource-card">
<span class="rc-badge free">Free</span>
<h4><a href="https://osf.io/">Open Science Framework (OSF)</a></h4>
<p>Project management, preregistration, DOI minting, and version control for open science workflows.</p>
</div>
<div class="resource-card aqua">
<span class="rc-badge free">Free</span>
<h4><a href="https://zenodo.org/">Zenodo</a></h4>
<p>Long-term data preservation with DOI minting. GitHub integration. Free storage for datasets and code.</p>
</div>
<div class="resource-card magenta">
<span class="rc-badge free">Free</span>
<h4><a href="https://github.com/">GitHub</a></h4>
<p>Version control, collaboration, and open science. Host code, data, and project websites. Essential for reproducible research.</p>
</div>
</div>
```
### Machine Learning and Deep Learning
```{=html}
<div class="resource-grid">
<div class="resource-card">
<span class="rc-badge free">Free</span>
<h4><a href="https://huggingface.co/">Hugging Face</a></h4>
<p>Transformers library, pre-trained models (BERT, GPT, RoBERTa, XLM), datasets, and a community model hub. The go-to platform for modern NLP.</p>
</div>
<div class="resource-card aqua">
<span class="rc-badge free">Free</span>
<h4><a href="https://pytorch.org/">PyTorch</a></h4>
<p>Research-friendly deep learning framework with dynamic computation graphs. Growing adoption in NLP research.</p>
</div>
<div class="resource-card magenta">
<span class="rc-badge free">Free</span>