GroceryDB

Ever wondered how many grocery stores there are in Germany? Where the northernmost supermarket is? What the average location of all the grocery stores in Germany would be? I thought so - so here's GroceryDB, a somewhat complete database of most german grocery stores. All Open Data for you to play with, of course.

Data

GroceryDB includes information about ~18.000 Edeka, REWE, Penny, Netto, Kaufland, REAL, tegut, denn's, Alnatura, Globus and Basic markets in Germany (listed by market count). All entries include the store type, name, full address (street, zip, city), latitude and longitude and the store ID as used in the brands respective database. About 72% of all entries also include a phone number, 42% include a fax number. For 62% of all stores, the database contains some information about the store owner and for 34% some store-specific additional information is present. This may include e-mail addresses, store or inventory size, or other amenities provided. All data was collected April-May 2018.

Downloads

Download the GroceryDB database as an SQL dump:
zip (1MB) gzip (1MB) bzip2 (740kB)

Methodology

All data was, in one way or another, collected from the official website of the respective markets. In most cases, machine readable data can be extracted from the store finder offered on the websites. For some brands, however, data collection proved to be more difficult. While I won't be explaining the exact methods used, I will try to give a rough overview over the methodology used to obtain the data. Replicating my results should not be too difficult for someone with programming / web-scraping experience.

Edeka

Getting data for Edeka markets was probably the most straightforward: The search endpoint used by the market finder on the official website returns a very large JSON document containing huge amounts of information on all of their stores. This notably includes store size, employee count and e-mail addresses for many, but not all locations.
edeka.de/marktsuche.jsp

REWE Group

REWE and Penny both use similar store finders that return JSON data for some maximum amount of stores per request, all within a given distance of the supplied zip code. A brute-force search of all german zip codes takes some time, but yields good results.
shop.rewe.de/mc/api/markets-stationary/[zip goes here]
penny.de/marktsuche

Netto

The Netto store finder has a maximum search radius of 120km. By extracting the right function from the store finder script, this restriction can be bypassed. As the server response cuts off after a certain amount of data, several queries have to be performed from different search locations to gather a complete dataset.
netto-online.de/jquery/store_finder.js

Kaufland

Similarly to Edeka, the Kaufland store finder loads a static JSON file with basic information on all stores.
kaufland.de/.storefinder.json

Real

Real also loads a JSON data document containing all the information on the client side, although the purpose of the GET parameter in the URL is not clear to me and its value might change in the future.
real.de/markt/markt-aendern/?type=1357643188

tegut

Tegut turned out to be one of the more frustrating brands: There appears to be no easy to parse representation of all stores other than about 260 separate javascript variable declarations in the main HTML of the store finder. However it is possible to extract these definitions quite easily and with some regular expression magic most of the data can be extracted. An additional request for each store is necessary to get a phone number.
tegut.com/maerkte/marktsuche.html

denn's

Data for denn's markets can be easily obtained from their store finder, which loads a plain json file.
denns-biomarkt.de/?eID=apertoMarkets&map-standalone=1

Alnatura

The Alnatura store finder uses a search API that is passed geo coordinates and a search radius that can be set to an unreasonably high value to include all stores. The response is JSON formatted but only contains very basic data (coordinates, name, and a link to the store page), so an additional request is necessary for each market to fill in phone number and address. To make things a bit more complicated, this data is not formatted consistently, which requires loose matching with regular expressions and some manual corrections.
alnatura.de/de-de/m%C3%A4rkte/filialfinder

Globus

Data for the relatively few Globus stores can be extracted by calling globusApp.getNavigationMarketInfo(); on the store finder page, which returns a javascript object representing all stores.
globus.de/de/maerkte/maerkte.html

Basic

There does not seem to be a straightforward way to get json-formatted store data from the denn's website, but data of the few stores can easily be parsed from source code.
basicbio.de/de-DE/Maerkte

Evaluation

Edeka Group

The EDEKA Group lists 7.227 markets in its 2017 business report, split into 5.791 independent stores and 1.436 'Regie' stores. This figure seems to include some non-food and b2b retailers. The JSON file from the Edeka store finder includes 7 independent and 188 'Regie' locations from these categories, leaving 5.784 and 1.248 stores unaccounted for. The database includes 5.182 / 1.273 stores of each category, leaving 602 independent retailers apparently missing.

In the same document, the number of Netto discount markets is said to be around 4.200, matching the 4.169 stores in the database.

REWE Group

In its summarized situation report ('Zusammengefasster Lagebericht für das Geschäftsjahr 2016', page 4) the REWE Group lists 1.728 REWE and TEMMA branded stores as well as 2.073 REWE-partner and Nahkauf stores. Both TEMMA and Nahkauf are not part of this dataset because Nahkauf is more similar to a convenience store and TEMMA appears to only have a few stores nationwide. The Nahkauf store finder returns 434 markets which leaves 3367 stores that should be part of this dataset. In fact, the database contains 3.530 REWE branded stores, an increase of 163 stores compared to the 2016 figure.

The same document lists 2.148 Penny markets (as of 2016), which does fit the 2.183 markets currently in the database.

tegut

The store finder page of tegut claims to have 'around 290' markets nationwide, however only 273 are present in the page source, all of which are contained in this database.

Alnatura

On its website, Alnatura claims to have 'more than 126' stores, which matches nicely with the 130 stores in the database.

Denn's

Denn's claims to have 'more than 250' markets, which matches the 263 markets in the database.

Notably missing

The big discounters Aldi and Lidl have no straightforward way to get large-scale store data in a machine-readable format. I hope I can add them to the dataset someday, but for now they are missing.