GeoIP database

Create MaxMind-compatible GeoIP databases for IP geolocation from scratch, using only freely available address registry information and Whois-based Geofeeds.

Seems that a lot of internet services still rely on the outdated or basically unavailable GeoIP databases from MaxMind for IP geolocation. While they do not disclose their sources, IPv4 block to country registry information as well as so-called Geofeeds are freely available. So time to create a new version from scratch!

MaxMind-compatible GeoIP.dat databases are still widely used by various tools, for example for access restriction or analytics. However, support was dropped completely in the meantime. By directly using authoritative IPv4 address space registry information, no conversion from a third-party geolocation provider is involved, which possibly could taint contents or licensing.

A regularly updated database export is also available for download and direct online lookup.

Comparison

In order to validate results, the reported country code for all IPv4 addresses is compared, not considering reserved blocks or continent code fallbacks – with an ancient 2015 MaxMind and a recent 2024 DB-IP database. Shown is the number of IPs in millions: How often all lookups give the same result, where two agree, and where one has a unique result.

Venn diagram of shared lookup results

Most noteworthy seems that the great majority is covered with the same values by any source – or both, if comparing two databases. Otherwise, the performances are relatively similar, and further interpretation would need an ultimately authoritative baseline to tell who’s most accurate or which ranges are especially important or volatile.

Usage and Quickstart

The whole processing pipeline bases on a Makefile with a collection of small Python scripts. Apart from a recent Python environment (tested with 3.8 and 3.10), no additional requirements are needed.

TLDR: Running make will automatically download all needed files (i.e., IPv4 address space registry information as well as 3rd-party geofeeds) and convert them into MaxMind-formatted CSV files:

make -j
python3 -m src.parse_locations --in-file data/countryInfo.txt --out-file out/GeoLite2-Country-Locations-en.csv
python3 -m src.parse_transfers --in-files data/transfers-*.json --out-file data/transfers.json
python3 -m src.parse_delegations --format address-space --source iana --in-file data/ipv4-address-space.csv --out-file data/ipv4-address-space.json
python3 -m src.parse_delegations --format delegated --source afrinic --in-file data/delegated-afrinic.txt --out-file data/delegated-afrinic.json
python3 -m src.parse_delegations --format delegated --source apnic --in-file data/delegated-apnic.txt --out-file data/delegated-apnic.json
python3 -m src.parse_delegations --format delegated --source arin --in-file data/delegated-arin.txt --out-file data/delegated-arin.json
python3 -m src.parse_delegations --format delegated --source lacnic --in-file data/delegated-lacnic.txt --out-file data/delegated-lacnic.json
python3 -m src.parse_delegations --format delegated --source ripe --in-file data/delegated-ripe.txt --out-file data/delegated-ripe.json
python3 -m src.parse_whois --in-file data/whois-afrinic.db.gz --out-file data/whois-afrinic.json
python3 -m src.parse_whois --in-file data/whois-apnic.db.gz --out-file data/whois-apnic.json
python3 -m src.parse_whois --in-file data/whois-arin.db.gz --out-file data/whois-arin.json
python3 -m src.parse_whois --in-file data/whois-lacnic.db.gz --out-file data/whois-lacnic.json
python3 -m src.parse_whois --in-file data/whois-ripe.db.gz --out-file data/whois-ripe.json
python3 -m src.fetch_geofeeds --in-files data/whois-*.json --out-file data/geofeeds.json --out-dir data/
python3 -m src.parse_geofeeds --in-file data/geofeeds.json --out-file data/geofeed.json
python3 -m src.merge_delegations --transfers-file data/transfers.json --in-files data/delegated-*.json data/geofeed.json --out-file data/delegated.json
python3 -m src.merge_countries --format range --address-space-file data/ipv4-address-space.json --in-file data/delegated.json --out-file out/ranges.json
python3 -m src.merge_countries --format net --address-space-file data/ipv4-address-space.json --in-file data/delegated.json --out-file out/networks.json
python3 -m src.dump_csv --format range --location-file out/GeoLite2-Country-Locations-en.csv --in-file out/ranges.json --out-file out/geoip-ranges.csv
python3 -m src.dump_csv --format net --location-file out/GeoLite2-Country-Locations-en.csv --in-file out/networks.json --out-file out/geoip-networks.csv
python3 -m src.dump_csv --format geoip2 --location-file out/GeoLite2-Country-Locations-en.csv --in-file out/networks.json --out-file out/GeoLite2-Country-Blocks-IPv4.csv
python3 -m src.dump_csv --format legacy --location-file out/GeoLite2-Country-Locations-en.csv --in-file out/ranges.json --out-file out/GeoIP.csv

While other artifacts could also be of interest, most importantly, this will create:

GeoLite2-Country-Locations-en.csv
Most recent MaxMind2-compatible country information, originating from the GeoNames countryInfo export.
GeoLite2-Country-Blocks-IPv4.csv
IPv4 network to country mappings as MaxMind GeoLite CSV database. GeoName-IDs refer to the also created locations file.
GeoIP.csv
All-in-one IPv4 range to country database in MaxMind “legacy” format. Suitable for generating a .dat file in the next step.

Given that for example the geoip-bin Ubuntu or Debian package is installed, make release will create a corresponding GeoIP.dat MaxMind “legacy” database:

make release
/usr/lib/geoip/geoip-generator -o out/GeoIP.dat out/GeoIP.csv

The result of geoip-generator can directly be checked by using the geoiplookup tool:

geoiplookup -f out/GeoIP.dat 1.2.3.4
GeoIP Country Edition: AU, Australia

Running make clean removes all output files, make reallyclean also removes cached input files such that they will be freshly downloaded again. For more information on the involved input and output files, see below.

Background

The approach is quite straight-forward: Parse address space country delegations, merge results by extending ranges, enrich with country codes and names, and dump into the different CSV formats. Additionally, ISPs or similar institutions can publish so-called Geofeeds via Whois entries, which are used to refine the results. Multiple simple standalone Python3 scripts are involved, which are orchestrated by a single call to make for a simple unified interface, parallelism (make -j), and caching.

IP Address Space Registry Input Files

Everything bases on the following authoritative input files, mostly representing the current address space country delegations:

All input sources will be fetched automatically during the first run.

GeoIP Database Output Files

The following GeoIP database files are provided as output and cover the most recent address space registry information in a MaxMind-compatible format:

See the above section for more details and how to run the corresponding scripts. In addition, the following files might also be of interest:

networks.json, ranges.json
The whole IPv4 address space as JSON in address/mask or start-end notation, respectively. Maps all addresses to either a country code, a registry (if not reported as country-delegated), or RESERVED (e.g., local or multicast networks).
geoip-networks.csv, geoip-ranges.csv
All-in-one CSV exports of the whole address space in address/mask or start-end notation, respectively. This format has each entry already resolved to country or continent Alpha-2 ISO codes (or ZZ). As these files contain the “most” information, they are the recommended ones for further processing.

Note that multiple netmask-based notations might be needed to represent a single address range. The range format is thus more expressive and leads to less “duplicate” consecutive entries (currently 137343 vs. 175817 in total).

Automatically updated database exports are also available for download.

Custom Extension: Region Codes

The resulting “extended” databases cover the whole IPv4 address space, instead of only the networks for which country information is available. This is done by setting at least the continent code as country code fallback, corresponding to the responsible registry for that block. Also, reserved ranges – for example local or multicast networks – are marked as such. Despite additional information, after optimizing/merging results, there are overall fewer entries than in the “original”.

However, the GeoIP.csv and GeoIP.dat “legacy” formats cannot profit from this and still cover the usual concrete countries only: The hardcoded set of country codes (and names) in the underlying libGeoIP library does not allow for using additional continents, regions, or reserved codes. Using any of the other output formats can thus give better results. On the other hand, this ensures that there is no ambiguity when working with Alpha-2 ISO codes, for example regarding AF for either Africa or Afghanistan.

Geofeed: IP Geolocation Feeds

Country codes do not always indicate where an address might be used from, but the place of the institution that is responsible for it. In addition, for example an ISP can serve multiple regions by also having multiple blocks at its disposal – this seems to be relatively common, e.g., in Europe. The location where an IP prefix is provisioned can thus change on short notice, while the internet registry information is relatively static and not optimized towards timely dynamic updates.

As defined by RFC 8805, Geofeeds are simple CSV files that assign ISO country codes to IP address prefixes. Thereby, service providers can publish the current state by a plain file that can be downloaded from an arbitrary webserver. Per RFC 9092, Geofeed URLs are announced in the remarks or geofeed RPSL (“Whois”) attributes.

The approach that uses Geofeed data for refining geolocation results is as follows:

Code & Download