DARIAH-Campus

Digging for Gold - Knowledge Extraction from Text
EN
This three-day international training school in Knowledge Extraction from Text from the CLS Infra project offered a crash course in how to “Dig for Gold” in a corpus of texts. From Stylometry to Natural Language Processing, learners will be able to follow along using 'plug and play' tools, while also getting a brief introduction to Python and R.
Authors, editors, and contributors
Guillermo Marco Remon
Alvaro Pérez
Artjoms Šeļa
EHRI in TEITOK
EN
This blog examines TEITOK, which is a corpus framework used as an alternative to Omeka. TEITOK is centered around texts and is similar to the Omeka interface – both allow you to search through the documents, and display the transcription. The main difference is that Omeka treats the transcription as an object description, whereas TEITOK not only shows that a word appears in a document, but also where it appears and how it is used.
Authors, editors, and contributors
Maarten Janssen
CLS-INFRA Training School on Data and Annotation
EN
This event, organised and provided by the CLS INFRA project, offers an introductory course to textual data annotation. The workshop introduces learners to how to edit, annotate, and query a text corpus without a single line of code, how to structure texts with the XML-TEI, and how to run an NLP tool to add linguistic information.
Authors, editors, and contributors
Lisanne van Rossum
Maarten Janssen
Silvie Cinková

Search