Linguistics Tools
Many corpora come with tools for extracting, searching, or otherwise manipulating the data in the corpus. Please check the "About" page and/or the corpus index page for information about tools specific to the corpus of interest to you.
Official CQP demos:
The official demos are hosted by the Computational Corpus Linguistics Group at FAU Erlangen-Nürnberg, Germany, and use the sample encoded corpora available on the CWB SourceForge site.
-
DICKENS
(English, 3.4M tokens)A collection of novels by Charles Dickens used as the main example corpus in the CQP Query Language Tutorial.
-
BUNDESTAG
(German, 5.7M tokens)Debates of the German parliament (1994–1998) with rich morphosyntactic annotation and shallow parsing. Suitable as a substitute for the smaller
GLAW-NEW
corpus of law texts in the CQP Query Language Tutorial. -
EUROPARL
(6 languages, ca. 40M tokens each)Web GUI for the annotated Europarl Corpus, Version 3 containing debates of the European Parliament from the years 1996–2006 (currently, only six languages are included in the GUI). This interface also supports the simplified CEQL syntax, aligned context display and word lists with automatic generation of translation candidates. The Europarl corpus will be used by future editions of the CQP Query Language Tutorial to introduce query and display options for aligned copora.
- VISL CorpusEye – a friendly interface to annotated corpora in multiple languages
- BNCweb (U Lancaster) – see info here and complete form for a guest account
Instructions: Corpus Workbench is available for download from SourceForge, along with support packages, the web GUI (CQPweb) and sample encoded corpora. Their site also provides documentation. A YouTube channel containing 27 tutorial videos (as of April 2018) is available.
Transcriber is developed with the scripting language Tcl/Tk and C extensions. It relies on the Snack sound extension, which allows support for most common audio formats, and on the tcLex lexer generator. TranscriberAG runs on multiple platforms (Windows XP, Mac OS X and Linux). It is developed in C++ using the GTK+ library for the GUI and the AGlib for the annotation file management.
Instructions: Visit the TranscriberAG site for download and installation instructions.
Instructions: Visit the 7-zip site for download and installation instructions.
Instructions: Visit the AntConc site for download and access to several other related tools.
Instructions: It is free to use, and runs on Mac OS/X, Windows 7, Windows 8, and Linux. The software and instructions are available for download at http://tla.mpi.nl/tools/tla-tools/elan/download/