TV Archive

Any Open Data is good data, right? So here's an archive of the daily TV program of German private broadcasters Pro7, Sat1, and Kabel1, starting April 2018.

Just looking for the download link? Download the Archive

Update 2020

Starting from November 17th 2020, the archive also contains data for the following broadcasters: ARD, ZDF, WDR, NDR, 3sat, arte, phoenix, vox, RTL and RTL2. This data is sourced from various undocumented APIs and differs in format from channel to channel. For some broadcasters, the daily archived data contains program information of the past day as well. For all others, re-ingested data is available as for the originally supported channels.

Source & Motivation

For the last few years I have been providing an app for German TV program listings for the rather niche SailfishOS (see rec0de.net/upnext). In the process of live-scraping the current shows from a dozen broadcaster websites, I discovered an obscure API used by the ProSiebenSat.1 media group that returns JSON-formatted program information about a week into past and future. This API endpoint, should you be interested in getting your own data, can be found at epgservice.7tv.de and also provides data for the smaller channels of the group.

As I was already caching these API calls on my server, I decided I might as well archive them to create a very complete dataset of historic broadcasts for whoever might be interested in such a thing. To be honest, I don't really know what this data might be useful for, but I like the idea of having it around just in case. It's your turn to do something cool with it!

Data & Format

The archive consists of separate JSON files for each broadcaster and day - named [channel]_yyyy-mm-dd_[timestamp_retrieved].json where [channel] is one of 'pro7', 'sat1', or 'kabel1' and [timestamp_retrieved] is the UNIX timestamp the data was archived. The files contain the unedited API responses. Data is scraped as early on the day in question as possible - for most files, this should be around ten minutes after midnight. In the event of server failure on either side, the API call is reattempted every five minutes. Additionally, the previous day is archived again at the same time one day later and saved to a new file to capture short-term program changes (note that re-ingested data and retrieval timestamps are only available starting April 13th, 2018).

Information present in the API results includes: Broadcast title, description, start and end timestamps, thumbnail images (links) as well as series, episode and season IDs (if applicable). Here's some example data representing an episode of The Simpsons:


{
	"id": "35823",
	"type": "broadcast",
	"rerun": false,
	"promamsBroadcastId": "346928911004",
	"title": "Die Geburtstags\\u00fcberraschung",
	"description": "Barts M\\u00fctze in der Waschmaschine verf\\u00e4rbt [shortened for demo]",
	"startTime": 1523402400,
	"endTime": 1523403600,
	"images": [
		{
			"type": "image",
			"subType": "logo",
			"url": "http:\\/\\/i5-img.7tv.de\\/pis\\/mw\\/c403jq [shortened]",
			"copyright": "und TM Twentieth Century Fox Film Corporation - Alle Rechte vorbehalten"
		}
	],
	"tvShow": {
		"id": "1044020",
		"promamsId": "9096",
		"title": "Die Simpsons"
	},
	"episode": {
		"promamsId": "35823"
	},
	"season": {
		"promamsId": "67098"
	},
	"links": [
		{
			"type": "subhome",
			"url": "\\/tv\\/simpsons"
		},
		{
			"type": "episodes",
			"url": "\\/tv\\/simpsons\\/episoden\\/staffel-3\\/die-geburtstagsueberraschung"
		},
		{
			"type": "maxdome",
			"url": "http:\\/\\/www.maxdome.de\\/4961804??cm_mmc_o=VImv [shortened]"
		}
	]
}

Copyright Considerations

The copyright and licensing situation of EPG data in Germany has historically been quite murky - while essential information like title or start and end times are considered (as far as I know) public and free-to-use, additional material like images, trailers and text material is subject to licensing fees. However, it appears that only usecases that use additional material "for the purposes of announcing or advertising radio or TV programming" are subject to those fees. Essentially, I believe that since the archived data mostly falls into the 'essential data' category and is both already public and no longer relevant, using this dataset should be fine for most usecases, especially academic and personal ones. That being said, maybe don't build your business on this.

Apart from that, I claim no copyright on any of these files and, as far as I am concerned, you are free to use / share / remix them in any way you like. In the unlikely event that you actually use this in an academic project, consider dropping a footnote :)

Download the archive here: rec0de.net/data/tvarchive