Ever wondered how many grocery stores there are in Germany? Where the northernmost supermarket is? What the average location of all the grocery stores in Germany would be? I thought so - so here's GroceryDB, a somewhat complete database of most german grocery stores. All Open Data for you to play with, of course.
GroceryDB includes information about ~18.000 Edeka, REWE, Penny, Netto, Kaufland, REAL, tegut, denn's, Alnatura, Globus and Basic markets in Germany (listed by market count). All entries include the store type, name, full address (street, zip, city), latitude and longitude and the store ID as used in the brands respective database. About 72% of all entries also include a phone number, 42% include a fax number. For 62% of all stores, the database contains some information about the store owner and for 34% some store-specific additional information is present. This may include e-mail addresses, store or inventory size, or other amenities provided. All data was collected April-May 2018.
Download the GroceryDB database as an SQL dump:
zip (1MB) gzip (1MB) bzip2 (740kB)
All data was, in one way or another, collected from the official website of the respective markets. In most cases, machine readable data can be extracted from the store finder offered on the websites. For some brands, however, data collection proved to be more difficult. While I won't be explaining the exact methods used, I will try to give a rough overview over the methodology used to obtain the data. Replicating my results should not be too difficult for someone with programming / web-scraping experience.
Getting data for Edeka markets was probably the most straightforward: The search endpoint used by the market finder on the official website returns a very large JSON document containing huge amounts of information on all of their stores. This notably includes store size, employee count and e-mail addresses for many, but not all locations.
REWE and Penny both use similar store finders that return JSON data for some maximum amount of stores per request, all within a given distance of the supplied zip code. A brute-force search of all german zip codes takes some time, but yields good results.
shop.rewe.de/mc/api/markets-stationary/[zip goes here]
The Netto store finder has a maximum search radius of 120km. By extracting the right function from the store finder script, this restriction can be bypassed. As the server response cuts off after a certain amount of data, several queries have to be performed from different search locations to gather a complete dataset.
Similarly to Edeka, the Kaufland store finder loads a static JSON file with basic information on all stores.
Real also loads a JSON data document containing all the information on the client side, although the purpose of the GET parameter in the URL is not clear to me and its value might change in the future.
Data for denn's markets can be easily obtained from their store finder, which loads a plain json file.
The Alnatura store finder uses a search API that is passed geo coordinates and a search radius that can be set to an unreasonably high value to include all stores. The response is JSON formatted but only contains very basic data (coordinates, name, and a link to the store page), so an additional request is necessary for each market to fill in phone number and address. To make things a bit more complicated, this data is not formatted consistently, which requires loose matching with regular expressions and some manual corrections.
Data for the relatively few Globus stores can be extracted by calling
There does not seem to be a straightforward way to get json-formatted store data from the denn's website, but data of the few stores can easily be parsed from source code.
The EDEKA Group lists 7.227 markets in its 2017 business report, split into 5.791 independent stores and 1.436 'Regie' stores. This figure seems to include some non-food and b2b retailers. The JSON file from the Edeka store finder includes 7 independent and 188 'Regie' locations from these categories, leaving 5.784 and 1.248 stores unaccounted for. The database includes 5.182 / 1.273 stores of each category, leaving 602 independent retailers apparently missing.
In the same document, the number of Netto discount markets is said to be around 4.200, matching the 4.169 stores in the database.
In its summarized situation report ('Zusammengefasster Lagebericht für das Geschäftsjahr 2016', page 4) the REWE Group lists 1.728 REWE and TEMMA branded stores as well as 2.073 REWE-partner and Nahkauf stores. Both TEMMA and Nahkauf are not part of this dataset because Nahkauf is more similar to a convenience store and TEMMA appears to only have a few stores nationwide. The Nahkauf store finder returns 434 markets which leaves 3367 stores that should be part of this dataset. In fact, the database contains 3.530 REWE branded stores, an increase of 163 stores compared to the 2016 figure.
The same document lists 2.148 Penny markets (as of 2016), which does fit the 2.183 markets currently in the database.
The store finder page of tegut claims to have 'around 290' markets nationwide, however only 273 are present in the page source, all of which are contained in this database.
On its website, Alnatura claims to have 'more than 126' stores, which matches nicely with the 130 stores in the database.
Denn's claims to have 'more than 250' markets, which matches the 263 markets in the database.
The big discounters Aldi and Lidl have no straightforward way to get large-scale store data in a machine-readable format. I hope I can add them to the dataset someday, but for now they are missing.