Matomo GeoIP Dockerfile

Build an Matomo image with GeoIP library and database support. Also stream from syslog for live on-premise analytics.

The Matomo docker image bases on PHP8.1, which currently lacks builtin GeoIP support. Also, running docker-php-ext-enable on the pecl package fails due to PHP8 incompatibilities.

The Matomo FAQ recommends to build the MaxMind library and PHP extension from source. This is also easily doable in a post-processing Dockerfile, which can also already download the current GeoIP2 database from DBIP.

ENV DBIP_VERSION="2024-01"
ENV LIBMAXMIND_VERSION="1.8.0"

RUN curl -fsSL -o libmaxminddb.tar.gz "https://github.com/maxmind/libmaxminddb/releases/download/${LIBMAXMIND_VERSION}/libmaxminddb-${LIBMAXMIND_VERSION}.tar.gz" && \
    mkdir /libmaxminddb && tar -xzf libmaxminddb.tar.gz --strip-components=1 -C /libmaxminddb && \
    cd /libmaxminddb && ./configure && make && make install && ldconfig && \
    rm -rf /libmaxminddb.tar.gz /libmaxminddb

RUN curl -fsSL -o maxminddbreader.tar.gz "https://github.com/maxmind/MaxMind-DB-Reader-php/archive/refs/heads/main.tar.gz" && \
    mkdir /maxminddbreader && tar -xzf maxminddbreader.tar.gz --strip-components=1 -C /maxminddbreader && \
    cd /maxminddbreader/ext && phpize && ./configure && make && make install && \
    rm -rf /maxminddbreader.tar.gz /maxminddbreader

RUN echo "extension=maxminddb.so" > /usr/local/etc/php/conf.d/maxminddb.ini

RUN curl -fsSL -o - "https://download.db-ip.com/free/dbip-city-lite-${DBIP_VERSION}.mmdb.gz" | \
    zcat > /var/www/html/misc/DBIP-City.mmdb

Tested for PHP-FPM on Debian Bookworm.

Matomo via nginx via syslog

Matomo provides a log analytics script that can be used to import existing access logs, periodically crawl log files, or to stream from log input. With the latter, live statistics can be collected without tracking code and only using data that is available anyway.

The provided Rsyslog example however uses legacy functionality that will result in an individual script invocation per line, instead of relying on a single instance (with a “recorder pool”) for reasonable throughput.

Nginx configuration for sending access logs in general could look like:

log_format afmt "$remote_addr $http_host $request_time [$time_local] \"$request\" $status $body_bytes_sent \"$http_referer\" \"$http_user_agent\"";
access_log syslog:server=unix:/dev/log,facility=daemon,severity=info,tag=http,nohostname afmt;

Instead of accepting a single line as argument, the import_logs.py wrapper script reads from standard input – this still allows to initially import existing logs by pipes. As a more complete example:

#!/bin/bash
export LC_ALL=C
exec python3 /usr/local/bin/import_logs.py "$@" \
     --log-format-regex='^ *(?P<ip>[0-9]\S+) (?P<host>\S+) (?P<generation_time_secs>\S+) \[(?P<date>\S+) (?P<timezone>\S+)\] "(?P<method>[A-Z]+) (?P<path>\S+) \S+" (?P<status>\S+) (?P<length>\S+) "(?P<referrer>[^"]*)" "(?P<user_agent>[^"]*)".*' \
     --log-date-format='%d/%b/%Y:%H:%M:%S' \
     --url 'http://matomo.example.com' --idsite 42 --token-auth 'd41d8cd98f00b204e9800998ecf8427e' \
     --enable-static --enable-bots --enable-http-errors --enable-http-redirects \
     --recorders 2 --recorder-max-payload-size 20 --retry-max-attempts 10 --retry-delay 30 --show-progress-delay 3600 -

The Rsyslog configuration (such as in /etc/rsyslogd.d/10-matomo.conf) can then use omprog for an additional access log pipe to the Matomo live dashboard:

module(load="omprog")
template(
    name="messageOnly"
    type="string"
    string="%msg:::drop-last-lf%\n"
)
if ($programname == 'http') and ($hostname == 'www') then {
    action(
        type="omprog"
        forceSingleInstance="on"
        binary="/usr/local/bin/matomo.sh"
        template="messageOnly"
    )
}

When using a central logging server, this method also allows to make the whole Matomo installation only accessible from an internal network.