Post

Cool C++ Features and Weird Details

Some useful and cursed C++ features

Cool C++ Features and Weird Details

C++ has a lot of small features that are either useful, cursed, or both.

This post is just a collection of things I want to remember.

Array Indexing is Symmetric

1
2
3
4
int arr[5] = {1, 2, 3, 4, 5};

printf("%d\n", arr[3]); // 4
printf("%d\n", 3[arr]); // 4

This works because:

1
arr[3]

is defined as:

1
*(arr + 3)

And:

1
3[arr]

is defined as:

1
*(3 + arr)

Pointer addition is commutative here, so both access the same element.

This is valid C++, but please do not write 3[arr] unless you are trying to summon demons.

Struct Alignment and Padding

Consider this struct:

1
2
3
4
5
6
7
struct A {
    char a; // 1 byte
    int  b; // 4 bytes, wants 4-byte alignment
    char c; // 1 byte
};

printf("%zu\n", sizeof(A)); // usually 12

The size is not just:

1
1 + 4 + 1 = 6

because members need to satisfy alignment requirements.

A typical layout is:

1
2
a      padding      b b b b      c      padding
1 byte 3 bytes      4 bytes      1 byte 3 bytes

So the total becomes 12 bytes.

The compiler also pads the end of the struct so that arrays work correctly:

1
A arr[10];

Each A object must still have proper alignment.

Reordering Members

We can reduce padding by grouping smaller members together:

1
2
3
4
5
6
7
struct B {
    char a; // 1 byte
    char c; // 1 byte
    int  b; // 4 bytes
};

printf("%zu\n", sizeof(B)); // usually 8

Typical layout:

1
2
a      c      padding      b b b b
1 byte 1 byte 2 bytes      4 bytes

So B is smaller than A.

Struct member order can affect memory usage. This matters more when you store millions of objects.

Compiler Optimizations

1
2
int x = 5;
int y = x * 2;

The compiler may optimize this to:

1
int y = 10;

because x * 2 can be known at compile time.

This kind of optimization is called constant folding.

Of course, real compilers do way more than this:

  • remove unused code
  • inline functions
  • simplify expressions
  • unroll loops
  • vectorize loops

The important idea: C++ source code is not a literal list of CPU instructions. The optimizer is allowed to transform your code as long as the observable behavior stays the same.

Unsafe C Library Functions

Some old C functions are very unsafe if used carelessly.

strcpy

1
2
3
char buffer[10];

strcpy(buffer, "This is a long string that exceeds the buffer size!");

strcpy does not check whether the destination buffer is large enough.

If the source string is too long, it writes past the end of the array.

That is a buffer overflow.

Prefer C++ types when possible:

1
std::string s = "This is safe";

atoi

1
2
char str[] = "xyz";
int num = atoi(str);

atoi gives poor error handling. If the input is invalid, it just returns 0, which is ambiguous.

Better alternatives:

1
std::stoi("123");

or, for low-level parsing:

1
std::from_chars(...);

Old C APIs are powerful, but many of them trust the programmer way too much.

Digraphs and Trigraphs

C++ has alternative spellings for some symbols.

SymbolDigraphTrigraph
{<%??<
}%>??>
[<:??(
]:>??)
#%:??=

Digraph example:

1
2
3
4
5
6
%:include <iostream>

int main() <%
    int a<:3:> = {1, 2, 3};
    return 0;
%>

This is equivalent to:

1
2
3
4
5
6
#include <iostream>

int main() {
    int a[3] = {1, 2, 3};
    return 0;
}

Trigraphs existed for old systems where some characters were hard to type.

Digraphs still exist. Trigraphs were removed in C++17. Either way, do not use them unless you enjoy cursed archaeology.

main is Not the Real Start

We usually think the program starts here:

1
2
3
int main(int argc, char** argv) {
    // ...
}

But the operating system does not directly “start C++” from main.

A lower-level entry point, often called _start, runs first.

Conceptually:

1
2
3
4
5
void _start() {
    setup_runtime();
    int result = main(argc, argv);
    exit(result);
}

Before main, the runtime may:

  • set up stack/environment
  • initialize global/static objects
  • initialize libc / C++ runtime
  • prepare argc and argv

After main, it also:

  • destroys static objects
  • flushes streams
  • exits the process

main is the C++ entry point. _start is closer to the real OS-level entry point.

A Byte is Not Always 8 Bits

In C++:

1
sizeof(char) == 1

is always true.

But this means:

1
sizeof(char) == 1 byte

not necessarily:

1
1 byte == 8 bits

The number of bits in a byte is given by:

1
2
3
#include <climits>

CHAR_BIT

On almost all modern machines:

1
CHAR_BIT == 8

But the C++ standard does not require this.

In normal competitive programming and desktop programming, assuming 8-bit bytes is fine. But technically, C++ only guarantees sizeof(char) == 1.

Integer Literal Prefixes

1
2
3
auto binary = 0b1010; // binary, 10
auto octal  = 012;    // octal, 10
auto hex    = 0xA;    // hexadecimal, 10

Integer literal prefixes:

PrefixBaseExample
0b / 0B20b1010
leading 08012
0x / 0X160xA

The octal one is the most dangerous.

1
int x = 010; // 8, not 10

Leading zero means octal. This is one of the most annoying C/C++ legacy traps.

Recursive Lambda with Deducing this (C++23)

Before C++23, recursive lambdas often needed tricks like y_combinator or passing self manually.

In C++23, we can write:

1
2
3
4
5
6
7
auto dfs = [&](this auto&& self, int u) -> void {
    for (int v : graph[u]) {
        self(v);
    }
};

dfs(0);

Here, self refers to the lambda itself.

This makes recursive lambdas much cleaner.

Older style:

1
2
3
4
5
6
7
auto dfs = [&](auto&& self, int u) -> void {
    for (int v : graph[u]) {
        self(self, v);
    }
};

dfs(dfs, 0);

C++23 version removes the annoying extra self(self, ...).

This is very nice for DFS-style code, but online judges may not support C++23 yet.

Three-Way Comparison <=> (C++20)

The spaceship operator can generate comparisons automatically.

1
2
3
4
5
6
7
#include <compare>

struct Node {
    int x, y, id;

    auto operator<=>(const Node&) const = default;
};

This compares members in declaration order:

1
x first, then y, then id

With = default, C++ can generate comparison operators for us.

Custom Ordering

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <compare>

struct Point {
    int x, y;

    std::strong_ordering operator<=>(const Point& other) const {
        if (auto cmp = x <=> other.x; cmp != 0) {
            return cmp;
        }
        return other.y <=> y; // y descending
    }

    bool operator==(const Point& other) const = default;
};

This sorts by:

1
2
x ascending
y descending

If you write custom <=>, also default or define operator==.

ranges::sort with Projection (C++20)

Normally, to sort by .second:

1
2
3
sort(a.begin(), a.end(), [](auto const& x, auto const& y) {
    return x.second < y.second;
});

With ranges projection:

1
2
3
ranges::sort(a, {}, [](auto const& p) {
    return p.second;
});

The middle {} means “use the default comparator”.

So this means:

1
sort by projected key p.second

For structs, member pointer projection is even cleaner:

1
2
3
4
5
6
7
struct Edge {
    int u, v, w;
};

vector<Edge> e;

ranges::sort(e, {}, &Edge::w);

This sorts edges by weight.

Use auto const& p in the projection if the element is large. auto p copies the element.

if / switch Initializer (C++17)

C++17 lets us declare a variable inside an if condition:

1
2
3
if (auto it = mp.find(x); it != mp.end()) {
    cout << it->second << '\n';
}

The variable it only exists inside the if / else statement.

This avoids leaking temporary variables into the outer scope.

Equivalent older style:

1
2
3
4
auto it = mp.find(x);
if (it != mp.end()) {
    cout << it->second << '\n';
}

Use case:

1
2
3
if (auto [it, ok] = st.insert(x); ok) {
    // inserted successfully
}

This is useful when the variable is only needed for the condition.

<bit> Utilities (C++20)

C++20 added useful bit functions in <bit>.

1
2
3
4
5
#include <bit>

std::popcount(x);
std::countl_zero(x);
std::countr_zero(x);

Common ones:

FunctionMeaning
std::popcount(x)number of set bits
std::countl_zero(x)leading zero bits
std::countr_zero(x)trailing zero bits
std::has_single_bit(x)whether x is a power of two
std::bit_width(x)number of bits needed to represent x

Example:

1
2
3
4
unsigned x = 12; // 1100

std::popcount(x);    // 2
std::countr_zero(x); // 2

These functions are safer than compiler builtins like:

1
2
__builtin_clz(x);
__builtin_ctz(x);

because the standard functions are well-defined for 0.

1
std::countr_zero(0u); // OK

But builtins like __builtin_ctz(0) are undefined behavior.

These functions work on unsigned integer types. Prefer unsigned values when doing bit tricks.

Summary

Useful features:

  • if (init; condition) keeps temporary variables scoped.
  • <bit> gives safe standard bit operations.
  • <=> reduces comparison boilerplate.
  • ranges::sort projections make sorting by key cleaner.
  • C++23 deducing this makes recursive lambdas nicer.

Cursed but useful details:

  • arr[i] and i[arr] are equivalent.
  • struct padding can change object size.
  • main is not the true low-level entry point.
  • sizeof(char) == 1 does not mean one byte is always 8 bits.
  • leading 0 means octal.
This post is licensed under CC BY 4.0 by the author.