Linguistic Applications and Resources

Here you can find some linguistic applications and resources which I have implemented or compiled in my free time. Some of this stuff is free available, some not. If you want to get access to the non-free applications or resources mentioned here, feel free to contact me.

CityViz - Visualization of City Names

CityViz is a web application for visualizing the geographic distribution of city name patterns over Germany. More countries / regions will be added soon.

Log in (you need an account).

Dynamic Corpus Analyzer - DCA

The Dynamic Corpus Analyzer (DCA) is a web application for the analysis and visualization of TCF files, created by WebLicht.

Log in (you need an account).


The webapplication WhoIsInTheNews downloads newsfeeds from national German newspages every day at 3 p.m. The newsfeeds are annotated with WebLicht and statistical information about the named entities are stored in a database. WhoIsInTheNews offers several ways of searching and visualizing named entities, including geographical visualizations of locations and comparing the frequencies of named entities in a timeline.

You need an account for login in to WhoIsInTheNews.


LemmaGraphs is a sophisticated way of visualizing lemmas within their context, e.g. concordances or "Keyword in Context" (KWIC). LemmaGraphs is able to integrate frequencies and intersections of concordances into nice looking graphs. More to come.


Sprichworte is a collection of German sayings. It contains 18665 sayings of 352 different kinds. Sprichworte includes also syntactic and morphological variations. The whole dataset was extracted from the TüBa-D/DC text corpus. You need an account for login in.


Spreadsheet2Map is a web application which creates maps out of Excel spreadsheets. It can be used for displaying results of dialect analysis or any other geographical visualizations. Spreadsheet2Maps is free available.

Named Entity Model for German, Politics (NEMGP)

The Named Entity Model for German, Politics (NEMGP) is a collection of texts from Wikipedia and WikiNews, manually annotated with named entity information. The NEMGP was used to create models for the machine-learning based tools OpenNLP and the Stanford Named Entity Recognizer.

Click here for the original raw data as well as the trained binary models.

Wiktionary in a Database - WikDB

Putting the whole Wiktionary in a database - project is on hold.