GCC Profiling and Coverage

Makefile snippets for branch prediction and call-graph profiling. The results can be used for performance analysis, optimized builds, and coverage QA.

For analyzing the performance of C/C++ binaries, the GCC ecosystem offers multiple options. Widely available and easy to use are in particular the callgrind (cf. valgrind) and GCC -fprofile-generate/-fprofile-use toolchains. Apart from manual profile and call-graph reviews, results can also guide compiler optimization.

Callgrind Function Profiling

Call-graph screenshot by callgrind

An annotated call-graph can help to quickly determine “hot” paths or bottlenecks. With callgrind (part of the valgrind suite), programs can be instrumented to collect function call and branching data for performance benchmarking.

After running the binary, the resulting profile can easily be transformed into a textual description or a call-graph image. Using dot for plotting, the output format is flexible and can be for example png or svg.

Assuming $(NAME) --benchmark should be profiled and a target to build $(NAME) with CFLAGS and LFLAGS exists:

.PHONY: callgrind
callgrind: CFLAGS += -Og -g
callgrind: LFLAGS += -Og
callgrind: callgrind.txt callgrind.png
callgrind.out: $(NAME)
    valgrind --tool=callgrind --callgrind-out-file=callgrind.out \
    ./$(<) --benchmark

callgrind.txt: callgrind.out
    callgrind_annotate --inclusive=yes --show-percs=yes --tree=both $(<) > $(@)

callgrind.png: callgrind.out
    gprof2dot --format=callgrind --node-label=total-time-percentage --strip -n 0.1 -e 0.0 ${<} | \
    dot -Tpng -o ${@}

.PHONY: clean
clean:
    @rm -vf -- $(wildcard callgrind.*)

gprof GNU Profiler

As alternative to callgrind, gprof is widely available, simple to use, and also allows call-graph or line-by-line profiling. However, it does not support threads and is generally not recommended for evaluating performance of new projects nowadays.[citation needed] Profile data can be created by the -pg compiler and linker flags, from which a textual call-graph is generated:

.PHONY: gprof
gprof: CFLAGS += -Og -pg -g
gprof: LFLAGS += -Og -pg
gprof: gmon.txt
gmon.out: $(NAME)
    ./$(<) --benchmark

gmon.txt: gmon.out
    gprof --brief $(NAME) $(<) > $(@)

.PHONY: clean
clean:
    @rm -vf -- $(wildcard gmon.*)

GCC Profiling and Coverage

HTML report screenshot by lcov

GCC features various convenient instrumentation options for benchmarking. The underlying gcov, lcov, and genhtml toolchain provides branch prediction, function profiling, and line coverage reports.

Apart from readable annotated HTML output as shown, there also are parseable formats for further evaluation, e.g., in QA pipelines. In addition to performance profiling, the implicitly collected coverage can also prove useful when enabled during (unit-)testing.

Given that the $(NAME) --benchmark binary should be profiled and a target to build $(NAME) with CFLAGS and LFLAGS exists, the following will produce a corresponding report in profile/.

.PHONY: profile
profile: CFLAGS += -fprofile-generate -fprofile-arcs -ftest-coverage
profile: LFLAGS += -fprofile-generate -fprofile-arcs
profile: $(NAME).gcda $(NAME).cpp.gcov profile/index.html
$(NAME).gcda: $(NAME)
    @rm -f -- *.gcda
    ./$(<) --benchmark

$(NAME).cpp.gcov: $(NAME).gcda
    gcov --branch-probabilities --function-summaries --use-colors --use-hotness-colors --demangled-names --relative-only *.gcda >/dev/null

$(NAME).info: $(NAME).gcda
    lcov --capture --no-external --rc lcov_branch_coverage=1 --directory . --output-file $(@)

profile/index.html: $(NAME).info
    genhtml --legend --branch-coverage --function-coverage --missed --demangle-cpp --output-directory $(dir $(@)) $(<)

.PHONY: clean
clean:
    @rm -vf -- $(wildcard *.gcda) $(wildcard *.gcno) $(wildcard *.gcov) $(wildcard *.info)
    @rm -rf -- ./profile/

Optimized Profile-Builds

The generated .gcda profile files can also noticeably improve the performance of a subsequent – and correspondingly optimized – build. Amongst others, improved branch prediction and using feedback on identified “hotspots” can be enabled by GCC’s profile optimization options.

# XXX: beneficial but not widely supported yet: -fprofile-partial-training
.PHONY: profiled
profiled: CFLAGS += -Wno-coverage-mismatch -Wno-missing-profile -fprofile-use -fprofile-correction
profiled: LFLAGS += -fprofile-use -fprofile-correction
profiled: $(NAME)