Zero to CMake
Mastery of your build tools is essential for mastery of a programming language. C++ makes this particularly difficult, partly because of the language's age. CMake is often treated as opaque magic, constructed from copy-pasted incantations and half-remembered shell commands. In this article, I will pry open a small part of the inner workings of building a C++ project.
This is not a CMake tutorial. Henry Schreiner's Modern CMake is a better introduction than I could hope to write. Instead, this post is aimed at beginner and intermediate C++ users, who perhaps have only worked on an existing CMake codebase, and who want to understand how their build system comes together from fundamental building blocks.
CMake is a build system generator. That is, CMake does not compile your code directly, but instead generates files used by another build system, which in turn runs the commands to compile your code. This article is organized in reverse, starting with the raw commands used to compile a project, then showing how to hand write one build system (Make), and finally showing the CMake project used to generate those Make files.
Building a Single File
The simplest way to compile C++ is by using a compiler directly. The compiler's job is to take human-readable source code and convert it into the machine code your computer understands. Say you have the following source code:
// main.cpp
#include <iostream>
#include <string>
/// Print "Hello {name}." to stdout
void say_hello(const std::string& name) {
std::cout << "Hello " << name << ".";
}
int main(int argc, char* argv[]) {
// Say hi to our reader
say_hello("reader");
return EXIT_SUCCESS;
}
Saving it to main.cpp
and using G++ (the GNU C++ compiler), you can compile and run it like so:
$ g++ -o hello main.cpp
$ ./hello
Hello reader.
Here we tell G++ to take the file main.cpp
and output it (-o
) as an
executable in a file named hello
. This is fine for a one-file project, but
as your program grows, you will probably want to split it into multiple files.
Multiple files
Suppose you wanted to extract the say_hello()
function into its own file:
// main.cpp
#include <iostream>
int main(int argc, char* argv[]) {
// Say hi to our reader
say_hello("reader");
// Exit with no error
return EXIT_SUCCESS;
}
// say_hello.cpp
#include <iostream>
#include <string>
/// Print "Hello {name}." to stdout
void say_hello(const std::string& name) {
std::cout << "Hello " << name << ".";
}
g++
lets us compile multiple files at once by listing each one:
$ g++ -o hello main.cpp say_hello.cpp
main.cpp: In function ‘int main(int, char**)’:
main.cpp:6:3: error: ‘say_hello’ was not declared in this scope
6 | say_hello("reader");
| ^~~~~~~~~
Oh no, it seems the main.cpp
file cannot find say_hello()
. This is because
C++ splits the concept of definition and declaration. While main.cpp
can see
say_hello.cpp
's definitions (though a process called linking that will be
covered later), it cannot see the declarations. Let's add a declaration of
say_hello()
to main.cpp
:
// main.cpp
#include <cstdlib>
#include <string>
void say_hello(const std::string& name);
// ^ Declaration of `say_hello`. A function can have many declarations,
// but only one definition.
int main(int argc, char* argv[]) {
// Say hi to our reader
say_hello("reader");
// Exit with no error
return EXIT_SUCCESS;
}
When you re-run g++ -o hello main.cpp say_hello.cpp
, the program will
successfully compile. With only two files and one function, putting the
declaration in the file is okay, but as your program grows that would quickly
become a burden.
Header files
A header file, distinguished by a .h
extension, is a fragment of a C++ source
file containing only declarations. When the compiler encounters the line
#include "say_hello.h"
, that directive is replaced with the contents of the
file say_hello.h
.
System Headers
We have already seen system headers
included such as #include <string>
. This looks for a file called string
in
the system header directory. On my Fedora laptop, it is at
/usr/include/c++/14/string
, but the file's exact location varies depending on
your Linux distribution.
Create a simple header file for say_hello()
:
// say_hello.h
#pragma once
#include <string>
void say_hello(const std::string& name);
And modify main.cpp
to use it:
-void say_hello(const std::string& name);
+#include "say_hello.h"
All this has done is move the declaration of say_hello()
into the header file.
Semantically, the version of main.cpp
using the header file is identical to
the one without it. The inclusion of the header file is done by the
preprocessor before the compiler attempts to understand the C++ source.
Compile with the same command as before. You will notice that the header file is
not listed as an input file on the command line. This is because the header cannot be
compiled by itself. It is instead inlined into main.cpp
.
This is an important difference between header files and source files.
The Source So Far
// say_hello.h
#pragma once
#include <string>
void say_hello(const std::string& name);
// say_hello.cpp
#include <iostream>
#include <string>
// Notice that the header file is also included in the
// file with the definitions. This will help the compiler
// figure out when the declaration and definitions have
// drifted.
#include "say_hello.h"
/// Print "Hello {name}." to stdout
void say_hello(const std::string& name) {
std::cout << "Hello " << name << ".";
}
// main.cpp
#include <cstdlib>
#include "say_hello.h"
int main(int argc, char* argv[]) {
// Say hi to our reader
say_hello("reader");
// Exit with no error
return EXIT_SUCCESS;
}
# Compile and run:
gcc -o hello main.cpp say_hello.cpp
./hello
Linking
As your project grows, your compilation times will grow. C++ is a notoriously
tricky language to parse, and compilation times can be slow. Suppose you only
want to change the name in main.cpp
without changing the implementation of
say_hello()
. It would be good to do this without recompiling all of
say_hello.cpp
.
G++ can compile these files separately so that they can be later combined in a
process called linking. The first step is to create object
files from each
source file. These object files contain the compiled machine code for each
function in the source file, as well as some information on the name and type
signature of that symbol.
You create object files by specifying the -c
command line flag:
$ g++ -c -o say_hello.o say_hello.cpp
$ g++ -c -o main.o main.cpp
$ ls *.o
main.o say_hello.o
Disassembly
You can look at the contents of these files using the objdump
command.
Running objdump -d say_hello.o
will print the assembly code generated
from the say_hello()
function. It is not important that you understand
this dump, and is only used here as a learning aid to understand the
purpose of the .o
files.
For those not following along at a terminal, here is the disassembly from my
machine. You will notice that say_hello
has changed its name to
_Z9say_helloRKNSt7__...
, this is because of
name mangling,
which C++ uses to distinguish overloaded functions.
$ objdump -d say_hello.o
say_hello.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <_Z9say_helloRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 83 ec 10 sub $0x10,%rsp
8: 48 89 7d f8 mov %rdi,-0x8(%rbp)
c: be 00 00 00 00 mov $0x0,%esi
11: bf 00 00 00 00 mov $0x0,%edi
16: e8 00 00 00 00 call 1b <_Z9say_helloRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x1b>
1b: 48 89 c2 mov %rax,%rdx
1e: 48 8b 45 f8 mov -0x8(%rbp),%rax
22: 48 89 c6 mov %rax,%rsi
25: 48 89 d7 mov %rdx,%rdi
28: e8 00 00 00 00 call 2d <_Z9say_helloRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x2d>
2d: be 00 00 00 00 mov $0x0,%esi
32: 48 89 c7 mov %rax,%rdi
35: e8 00 00 00 00 call 3a <_Z9say_helloRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x3a>
3a: 90 nop
3b: c9 leave
3c: c3 ret
You can then link these files into an executable. Again, G++ can be used for this:
$ g++ -o hello main.o say_hello.o
$ ./hello
Hello reader.
This way, if you make a change to main.cpp
, you only need to recompile
main.o
and then relink. Likewise for changes to say_hello.cpp
. For example,
if you change the name "reader" to "C++ expert", you would only run the
following:
$ g++ -c -o main.o main.cpp
$ g++ -o hello main.o say_hello.o
$ ./hello
Hello C++ expert.
Disassembly
Linking is the process of combining the symbols (i.e. functions) from a bunch of object
files into a single file while updating the pointers in each file to be correct. You
can search for say_hello
in the final executable to confirm this:
$ objdump -d hello | grep say_hello
40124c: e8 0e 03 00 00 call 40155f <_Z9say_helloRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE>
000000000040155f <_Z9say_helloRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE>:
That first line is the call to say_hello()
inside of main()
. The second line
is the beginning of the definition from the previous disassembly of
say_hello.o
. However, the address (the number ending in 40155f
) is different
from before, when it was at address zero. Also, this 40155f
shows up in the call
statement. This is the essence of linking: ensuring each symbol has a unique
address and then updating all the references to the new address.
This may feel like overkill for such a small project, but once you have hundreds of files, you will be thankful for these changes.
Make files
The process of building even this simple program has ballooned. You could put all the commands into a bash script, but then you would lose the advantage of splitting the compiling and linking phases: not having to recompile unchanged files. Luckily, this is exactly what Make was designed for.
A Make file follows a simple pattern:
output: dependencies
command
That says, "When any of dependencies changes, generate the file output by running command". Make knows when files have changed (using the modified time in the metadata) and reruns the commands needed to regenerate the files that depend on them. It will then do this recursively if any files depend on the newly generated outputs.
Here is an example Make file for the program:
# Makefile
hello: main.o say_hello.o
g++ -o hello main.o say_hello.o
main.o: main.cpp
g++ -c -o main.o main.cpp
say_hello.o: say_hello.cpp
g++ -c -o say_hello.o say_hello.cpp
Save this to a file called Makefile
and run make hello
to build the program. You should see a list of the commands run:
$ make hello
g++ -c -o main.o main.cpp
g++ -c -o say_hello.o say_hello.cpp
g++ -o hello main.o say_hello.o
$ ./hello
Hello C++ expert.
Now, if you change main.cpp back to "reader" and rerun make hello
, you will see it does not recompile say_hello.o
.
$ make hello
g++ -c -o main.o main.cpp
g++ -o hello main.o say_hello.o
$ ./hello
Hello reader.
People who like Make files will cringe at the redundancy here. If you are interested in using Make further, I recommend Prof. Bruce Maxwell's short tutorial as an introduction. However, for this post, the simple version will suffice as CMake will generate Make files for us.
CMake
CMake is a build system generator. It does not directly call G++ but rather generates Make files. This is because CMake wants to support multiple operating systems, and Make is normally only found on Unix. For example, on Windows instead of using G++ and Make, CMake uses MSVC and MSBuild respectively.
Let's create a simple CMakeLists.txt
file for this project:
# Tell CMake what version to use
cmake_minimum_required(VERSION 3.30)
# Configure the project
project(
# Project name
ZeroToCMake
# Project version
VERSION 1.0
# This is a C++ project
LANGUAGES CXX
)
# Instruct CMake to create an executable called `hello`
# using the source in `main.cpp` and `say_hello.cpp`
add_executable(hello
main.cpp say_hello.cpp)
You can then use CMake to generate Make files:
$ cmake -S . -B build
-- The CXX compiler identification is GNU 14.2.1
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done (0.4s)
-- Generating done (0.0s)
-- Build files have been written to: <long path>/build
The Make files CMake generates are a lot more complicated, but fundamentally do
the same thing as the handwritten file. For example, in build/CMakeFiles/build.make
you
will find
CMakeFiles/hello.dir/main.cpp.o: CMakeFiles/hello.dir/flags.make
CMakeFiles/hello.dir/main.cpp.o: /home/fred/Documents/myblog/projects/zero-to-cmake/part-7/main.cpp
CMakeFiles/hello.dir/main.cpp.o: CMakeFiles/hello.dir/compiler_depend.ts
@$(CMAKE_COMMAND) -E cmake_echo_color "--switch=$(COLOR)" --green --progress-dir=/home/fred/Documents/myblog/projects/zero-to-cmake/part-7/build/CMakeFiles --progress-num=$(CMAKE_PROGRESS_1) "Building CXX object CMakeFiles/hello.dir/main.cpp.o"
/usr/bin/c++ $(CXX_DEFINES) $(CXX_INCLUDES) $(CXX_FLAGS) -MD -MT CMakeFiles/hello.dir/main.cpp.o -MF CMakeFiles/hello.dir/main.cpp.o.d -o CMakeFiles/hello.dir/main.cpp.o -c /home/fred/Documents/myblog/projects/zero-to-cmake/part-7/main.cpp
Here you can see instructions to build main.cpp.o
using main.cpp
(and two
other files flags.make
and compiler_depend.ts
) by running /usr/bin/c++
(an
alias for G++). Most of the extra complexity comes from handling updates to
header files (something the handwritten Make files could not do) and showing
detailed progress to the user.
You can build and run the generated Make file just like before:
$ cd build && make
[ 33%] Building CXX object CMakeFiles/hello.dir/main.cpp.o
[ 66%] Building CXX object CMakeFiles/hello.dir/say_hello.cpp.o
[100%] Linking CXX executable hello
[100%] Built target hello
$ ./hello
Hello reader.
Like before, you can find the object files under build/CMakeFiles/hello.dir
.
TL;DR
- CMake generates Make files during the configure operation.
- Make files specify how to build files given their dependencies and a command. These are run as part of the build operation.
- C++ splits compilation and linking into two discrete steps, and separating them explicitly can improve compile times.
- This knowledge can help you debug. An error during configuration has to be an error in your CMake scripts. A compile-time error is a problem with your source, and a link-time error is most likely a problem with your CMake scripts.