Zero to CMake

Published: 2024-12-10
Reading Time: 12 minutes.
Written By: Fred Buchanan
Edited By: Nat Monahan

Mastery of your build tools is essential for mastery of a programming language. C++ makes this particularly difficult, partly because of the language's age. CMake is often treated as opaque magic, constructed from copy-pasted incantations and half-remembered shell commands. In this article, I will pry open a small part of the inner workings of building a C++ project.

This is not a CMake tutorial. Henry Schreiner's Modern CMake is a better introduction than I could hope to write. Instead, this post is aimed at beginner and intermediate C++ users, who perhaps have only worked on an existing CMake codebase, and who want to understand how their build system comes together from fundamental building blocks.

CMake is a build system generator. That is, CMake does not compile your code directly, but instead generates files used by another build system, which in turn runs the commands to compile your code. This article is organized in reverse, starting with the raw commands used to compile a project, then showing how to hand write one build system (Make), and finally showing the CMake project used to generate those Make files.

Building a Single File

The simplest way to compile C++ is by using a compiler directly. The compiler's job is to take human-readable source code and convert it into the machine code your computer understands. Say you have the following source code:

// main.cpp
#include <iostream>
#include <string>

/// Print "Hello {name}." to stdout
void say_hello(const std::string& name) {
  std::cout << "Hello " << name << ".";
}

int main(int argc, char* argv[]) {
  // Say hi to our reader
  say_hello("reader");
  return EXIT_SUCCESS;
}

Saving it to main.cpp and using G++ (the GNU C++ compiler), you can compile and run it like so:

$ g++ -o hello main.cpp
$ ./hello
Hello reader.

Here we tell G++ to take the file main.cpp and output it (-o) as an executable in a file named hello. This is fine for a one-file project, but as your program grows, you will probably want to split it into multiple files.

Multiple files

Suppose you wanted to extract the say_hello() function into its own file:

// main.cpp
#include <iostream>

int main(int argc, char* argv[]) {
  // Say hi to our reader
  say_hello("reader");

  // Exit with no error
  return EXIT_SUCCESS;
}

// say_hello.cpp
#include <iostream>
#include <string>

/// Print "Hello {name}." to stdout
void say_hello(const std::string& name) {
  std::cout << "Hello " << name << ".";
}

g++ lets us compile multiple files at once by listing each one:

$ g++ -o hello main.cpp say_hello.cpp
main.cpp: In function ‘int main(int, char**)’:
main.cpp:6:3: error: ‘say_hello’ was not declared in this scope
    6 |   say_hello("reader");
      |   ^~~~~~~~~

Oh no, it seems the main.cpp file cannot find say_hello(). This is because C++ splits the concept of definition and declaration. While main.cpp can see say_hello.cpp's definitions (though a process called linking that will be covered later), it cannot see the declarations. Let's add a declaration of say_hello() to main.cpp:

// main.cpp
#include <cstdlib>
#include <string>

void say_hello(const std::string& name);
// ^ Declaration of `say_hello`. A function can have many declarations,
// but only one definition.

int main(int argc, char* argv[]) {
  // Say hi to our reader
  say_hello("reader");

  // Exit with no error
  return EXIT_SUCCESS;
}

When you re-run g++ -o hello main.cpp say_hello.cpp, the program will successfully compile. With only two files and one function, putting the declaration in the file is okay, but as your program grows that would quickly become a burden.

Header files

A header file, distinguished by a .h extension, is a fragment of a C++ source file containing only declarations. When the compiler encounters the line #include "say_hello.h", that directive is replaced with the contents of the file say_hello.h.

System Headers

We have already seen system headers included such as #include <string>. This looks for a file called string in the system header directory. On my Fedora laptop, it is at /usr/include/c++/14/string, but the file's exact location varies depending on your Linux distribution.

Create a simple header file for say_hello():

// say_hello.h
#pragma once
#include <string>

void say_hello(const std::string& name);

And modify main.cpp to use it:

-void say_hello(const std::string& name);
+#include "say_hello.h"

All this has done is move the declaration of say_hello() into the header file. Semantically, the version of main.cpp using the header file is identical to the one without it. The inclusion of the header file is done by the preprocessor before the compiler attempts to understand the C++ source.

Compile with the same command as before. You will notice that the header file is not listed as an input file on the command line. This is because the header cannot be compiled by itself. It is instead inlined into main.cpp. This is an important difference between header files and source files.

The Source So Far

// say_hello.h
#pragma once
#include <string>

void say_hello(const std::string& name);

// say_hello.cpp
#include <iostream>
#include <string>

// Notice that the header file is also included in the
// file with the definitions. This will help the compiler
// figure out when the declaration and definitions have
// drifted.
#include "say_hello.h"

/// Print "Hello {name}." to stdout
void say_hello(const std::string& name) {
  std::cout << "Hello " << name << ".";
}

// main.cpp
#include <cstdlib>

#include "say_hello.h"

int main(int argc, char* argv[]) {
  // Say hi to our reader
  say_hello("reader");

  // Exit with no error
  return EXIT_SUCCESS;
}

# Compile and run:
gcc -o hello main.cpp say_hello.cpp
./hello

Linking

As your project grows, your compilation times will grow. C++ is a notoriously tricky language to parse, and compilation times can be slow. Suppose you only want to change the name in main.cpp without changing the implementation of say_hello(). It would be good to do this without recompiling all of say_hello.cpp.

G++ can compile these files separately so that they can be later combined in a process called linking. The first step is to create object files from each source file. These object files contain the compiled machine code for each function in the source file, as well as some information on the name and type signature of that symbol.

You create object files by specifying the -c command line flag:

$ g++ -c -o say_hello.o say_hello.cpp
$ g++ -c -o main.o main.cpp
$ ls *.o
main.o say_hello.o

Disassembly

You can look at the contents of these files using the objdump command. Running objdump -d say_hello.o will print the assembly code generated from the say_hello() function. It is not important that you understand this dump, and is only used here as a learning aid to understand the purpose of the .o files.

For those not following along at a terminal, here is the disassembly from my machine. You will notice that say_hello has changed its name to _Z9say_helloRKNSt7__..., this is because of name mangling, which C++ uses to distinguish overloaded functions.

$ objdump -d say_hello.o
say_hello.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <_Z9say_helloRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	48 83 ec 10          	sub    $0x10,%rsp
   8:	48 89 7d f8          	mov    %rdi,-0x8(%rbp)
   c:	be 00 00 00 00       	mov    $0x0,%esi
  11:	bf 00 00 00 00       	mov    $0x0,%edi
  16:	e8 00 00 00 00       	call   1b <_Z9say_helloRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x1b>
  1b:	48 89 c2             	mov    %rax,%rdx
  1e:	48 8b 45 f8          	mov    -0x8(%rbp),%rax
  22:	48 89 c6             	mov    %rax,%rsi
  25:	48 89 d7             	mov    %rdx,%rdi
  28:	e8 00 00 00 00       	call   2d <_Z9say_helloRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x2d>
  2d:	be 00 00 00 00       	mov    $0x0,%esi
  32:	48 89 c7             	mov    %rax,%rdi
  35:	e8 00 00 00 00       	call   3a <_Z9say_helloRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x3a>
  3a:	90                   	nop
  3b:	c9                   	leave
  3c:	c3                   	ret

You can then link these files into an executable. Again, G++ can be used for this:

$ g++ -o hello main.o say_hello.o
$ ./hello
Hello reader.

This way, if you make a change to main.cpp, you only need to recompile main.o and then relink. Likewise for changes to say_hello.cpp. For example, if you change the name "reader" to "C++ expert", you would only run the following:

$ g++ -c -o main.o main.cpp
$ g++ -o hello main.o say_hello.o
$ ./hello
Hello C++ expert.

Disassembly

Linking is the process of combining the symbols (i.e. functions) from a bunch of object files into a single file while updating the pointers in each file to be correct. You can search for say_hello in the final executable to confirm this:

$ objdump -d hello | grep say_hello
  40124c:       e8 0e 03 00 00          call   40155f <_Z9say_helloRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE>
000000000040155f <_Z9say_helloRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE>:

That first line is the call to say_hello() inside of main(). The second line is the beginning of the definition from the previous disassembly of say_hello.o. However, the address (the number ending in 40155f) is different from before, when it was at address zero. Also, this 40155f shows up in the call statement. This is the essence of linking: ensuring each symbol has a unique address and then updating all the references to the new address.

This may feel like overkill for such a small project, but once you have hundreds of files, you will be thankful for these changes.

Make files

The process of building even this simple program has ballooned. You could put all the commands into a bash script, but then you would lose the advantage of splitting the compiling and linking phases: not having to recompile unchanged files. Luckily, this is exactly what Make was designed for.

A Make file follows a simple pattern:

output: dependencies
  command

That says, "When any of dependencies changes, generate the file output by running command". Make knows when files have changed (using the modified time in the metadata) and reruns the commands needed to regenerate the files that depend on them. It will then do this recursively if any files depend on the newly generated outputs.

Here is an example Make file for the program:

# Makefile

hello: main.o say_hello.o
	g++ -o hello main.o say_hello.o

main.o: main.cpp
	g++ -c -o main.o main.cpp

say_hello.o: say_hello.cpp
	g++ -c -o say_hello.o say_hello.cpp

Save this to a file called Makefile and run make hello to build the program. You should see a list of the commands run:

$ make hello
g++ -c -o main.o main.cpp
g++ -c -o say_hello.o say_hello.cpp
g++ -o hello main.o say_hello.o
$ ./hello
Hello C++ expert.

Now, if you change main.cpp back to "reader" and rerun make hello, you will see it does not recompile say_hello.o.

$ make hello
g++ -c -o main.o main.cpp
g++ -o hello main.o say_hello.o
$ ./hello
Hello reader.

People who like Make files will cringe at the redundancy here. If you are interested in using Make further, I recommend Prof. Bruce Maxwell's short tutorial as an introduction. However, for this post, the simple version will suffice as CMake will generate Make files for us.

CMake

CMake is a build system generator. It does not directly call G++ but rather generates Make files. This is because CMake wants to support multiple operating systems, and Make is normally only found on Unix. For example, on Windows instead of using G++ and Make, CMake uses MSVC and MSBuild respectively.

Let's create a simple CMakeLists.txt file for this project:

# Tell CMake what version to use
cmake_minimum_required(VERSION 3.30)

# Configure the project
project(
  # Project name
  ZeroToCMake

  # Project version
  VERSION 1.0

  # This is a C++ project
  LANGUAGES CXX
)

# Instruct CMake to create an executable called `hello`
# using the source in `main.cpp` and `say_hello.cpp`
add_executable(hello
  main.cpp say_hello.cpp)

You can then use CMake to generate Make files:

$ cmake -S . -B build
-- The CXX compiler identification is GNU 14.2.1
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done (0.4s)
-- Generating done (0.0s)
-- Build files have been written to: <long path>/build

The Make files CMake generates are a lot more complicated, but fundamentally do the same thing as the handwritten file. For example, in build/CMakeFiles/build.make you will find

CMakeFiles/hello.dir/main.cpp.o: CMakeFiles/hello.dir/flags.make
CMakeFiles/hello.dir/main.cpp.o: /home/fred/Documents/myblog/projects/zero-to-cmake/part-7/main.cpp
CMakeFiles/hello.dir/main.cpp.o: CMakeFiles/hello.dir/compiler_depend.ts
	@$(CMAKE_COMMAND) -E cmake_echo_color "--switch=$(COLOR)" --green --progress-dir=/home/fred/Documents/myblog/projects/zero-to-cmake/part-7/build/CMakeFiles --progress-num=$(CMAKE_PROGRESS_1) "Building CXX object CMakeFiles/hello.dir/main.cpp.o"
	/usr/bin/c++ $(CXX_DEFINES) $(CXX_INCLUDES) $(CXX_FLAGS) -MD -MT CMakeFiles/hello.dir/main.cpp.o -MF CMakeFiles/hello.dir/main.cpp.o.d -o CMakeFiles/hello.dir/main.cpp.o -c /home/fred/Documents/myblog/projects/zero-to-cmake/part-7/main.cpp

Here you can see instructions to build main.cpp.o using main.cpp (and two other files flags.make and compiler_depend.ts) by running /usr/bin/c++ (an alias for G++). Most of the extra complexity comes from handling updates to header files (something the handwritten Make files could not do) and showing detailed progress to the user.

You can build and run the generated Make file just like before:

$ cd build && make
[ 33%] Building CXX object CMakeFiles/hello.dir/main.cpp.o
[ 66%] Building CXX object CMakeFiles/hello.dir/say_hello.cpp.o
[100%] Linking CXX executable hello
[100%] Built target hello
$ ./hello
Hello reader.

Like before, you can find the object files under build/CMakeFiles/hello.dir.

TL;DR

CMake generates Make files during the configure operation.
Make files specify how to build files given their dependencies and a command. These are run as part of the build operation.
C++ splits compilation and linking into two discrete steps, and separating them explicitly can improve compile times.
This knowledge can help you debug. An error during configuration has to be an error in your CMake scripts. A compile-time error is a problem with your source, and a link-time error is most likely a problem with your CMake scripts.