Legal Move Generation in Chess Using Bit Manipulation

Apr 20, 2025
17 minute read

This post is part of my series on making Shiro, a chess engine written in Rust.

Recently, watching Sebastian Lague's Chess Coding Adventure video has inspired me to make my own chess engine. I'm naming the engine Shiro (white in Japanese (しろ/白)), after a certain chess-playing anime character. I set my sights set high, aiming to achieve 3000 elo.

To play like a grandmaster though, aside from having strong heuristics and search optimizations, an engine must be fast. Searching millions of positions is no easy task and with every bit of additional computation capacity, an engine grows slightly stronger. Hence, I chose to use Rust due to its speed and rising popularity.

More important than language, the implementation must also be efficient, so I made it a point to not cut corners and use the fastest implementations at each stage of the engine. With goals set in place, we can begin implementing Shiro. We begin with move generation, the first feature of any chess engine.

What is Move Generation?

Move generation is the process of taking in a chess position (board state) and generating the available moves of each piece. Move generation can either be legal (only legal moves allowed) or pseudo-legal (some moves may be illegal, leaving the king in check).

What is counted as move generation?

At the time of writing, I learned that its typically better to do pseudo-legal generation and check legality later due to the nature of alpha-beta pruning combined with move-ordering. However, I did not know about this while writing my move generator, so I chose to generate only legal moves.

There are many ways to perform move generation, but they all boil down to teaching a computer program the rules of chess. The hard part of move generation is going from pseudo-legal moves to legal moves, and the handling of some special moves like enpassant (seriously, why is that in the game) and castles.

One method would be to generate all pseudo-legal moves, then prune out illegal moves by simulating each move and checking whether our king is in check (can be captured by an opponent move). This is extremely slow since generating a single legal move involves generating all the pseudo-legal moves of the next depth as well. There are techniques to improve this method, such as ensuring king safety by checking the squares along axis of the king, which make it viable. With optimizations, this form of move generation can usually search up to 100 million nodes per second (nps).

How do we benchmark move generation?

"Search up to 100 million nps" is a bit misleading. The typical benchmark for move generation is perft, which counts how many possible board configurations there are after x ply. Move generators running perft will usually not explore the last layer of search (leaf nodes). Instead, they opt to simply count up configurations, making a big difference to perceived speed.

Despite this being the most common class of move generators, there is another method to go much much faster. Using a bitboard representation of the game, precomputed lookup maps, and some clever translation of chess rules into bitwise operations, move generators can go incredibly fast. The fastest move generators in the world use the techniques outlined here. The fastest, Gigantua at 1.6 billion nps, has a great explanation very similar to the techniques discussed in this post.

Bitboards

Chess as a computation problem presents unique opportunities for optimization.

Using a single u64 integer as a bitmask, we can represent the occupancy status of all 64 squares on a chess board. So with just 8 u64s, we can encode the positions of all 6 piece types and the 2 colors. Specifically, we encode the positions of all pawns (regardless of color) into a single bitboard, and then bitwise AND the pawn bitboard with a color bitboard to get all pawns of a color.

Not only are bitboards extremely space-efficient, they also lend themselves to bitwise operations to easily alter gamestate and query information. For example, to generate all single pawn pushes for black, we can simply do ((BLACK & PAWNS) >> 8) & EMPTY. This is saying "select pieces that are both black and pawns, move them down a square, and ensure the target square is empty." The resultant bitboard will have a bit set everywhere a black pawn can move to in a single push. In just a few cpu instructions, we generated (pseudo-legal) pawn moves that would otherwise take a different board representation dozens of if-statements and far more memory.

White's occupancy is represented by 0x0000201840014206
  . . . . . . . .  => 0 0
  . . . . . . . .  => 0 0
  . . X . . . . .  => 2 0
  . . . X X . . .  => 1 8
  . X . . . . . .  => 4 0
  . . . . . . . X  => 0 1
  . X . . . . X .  => 4 2
  . . . . . X X .  => 0 6

Using functions like .trailing_zeroes() combined with other bit operations allows us to loop over the set bits, extracting individual piece positions or moves from bitboards.

// Generating and extracting pawn pushes (contrived example)
let mut bitboard: u64 = (BLACK & PAWNS) >> 8;
while bitboard != 0 {
    let pos = 63 - bitboard.trailing_zeroes();
    bitboard &= bitboard - 1;   // turns off rightmost 1

    moves.add(MoveType::PAWN, pos - 8, pos);
}

Leading vs. Trailing Zeroes

Trailing zeroes is non-trivially faster than leading zeroes due to hardware support (TZCNT and BSLR). Note, we subtract from 63 since counting square position starting from the top-left corner is more intuitive.

Hopefully I have convinced you of the speed and elegance bitboards offer. Next, lets see how we can use bitboards to generate moves, or in other words, encode the rules of chess into bitboards.

Pseudo-Legal Move Generation

With bitboards, we can generate pseudo-legal moves very efficiently. The trick is to use precomputed lookup maps to query for the pseudo-legal moves for a given piece. In other words, given a piece's position, we can instantly look up all of its possible move inside an array.

For the knights and kings, these lookup maps look something like:

knight move mask — The 34th entry of the knight move lookup array should be this mask

knight move mask 2 — Some masks will have fewer moves near the edge

king move mask — King moves are encoded the same way

Both black and white pieces can use the same lookup maps since they move the same way. These lookup maps move computation from runtime to compile-time, making pseudo-legal move generation practically free.

The slider pieces (rooks, bishops, queens) work a bit differently since they can be blocked. A rook can move horizontally and vertically, but its attack rays are stopped by blockers (both ally and opponent pieces block line of sight). We could simply store these rays in lookup tables and then check along the rays for presence of blockers, but that's far too slow. We would require 4 additional loops per piece, each having to do condition checks. Doing that would defeat the point of bitboards as it would probably be faster to do this looping using other board representations.

A note on queens

Queen moves are best generated by counting a queen as both a rook and bishop. This is simple, works well, and avoids extra logic.

Luckily, with some cleverness, bitboards can handle slider moves faster than any other representation. To take advantage of bitboards and generate slider moves in the presence of blockers, we can incorporate the presence of blockers into the lookup map. In other words, before, we had a function that mapped piece position to possible moves. Now our goal is to make a function that maps piece position and blocker positions to possible moves.

Notice for any slider piece, only the 14 squares on its attack axes need to be checked for blockers. Since each square is either empty or occupied, there are $2^{14} = 4096$ possible permutations of blockers. If we could somehow perfectly hash every permutation of occupancy down to a range of $[0,4096)$, we could make a lookup map for sliders. Specifically, our goal is this:

static ROOK_MOVES_LOOKUP: [[u64; 4096]; 64] = ...;

pub fn get_rook_moves(rook_pos: u64, occupied: u64) {
    let rook_attack_axes = ORTHOGONAL_RAY_LOOKUP[rook_pos]; // 14 bits will be set
    let blockers = rook_attack_axes & occupied; 
    let hash = perfect_hash(blockers);
    assert!(0 <= hash && hash < 4096);
    return ROOK_MOVES_LOOKUP[rook_pos][hash];
}

A 2MiB static array may seem like a lot, but it's the necessary price we pay for performance. Far more attention-worthy is the implementation of this perfect_hash function. How can we perfectly hash every permutation blockers in reasonable time? Enter PDEP/PEXT instructions...

PEXT Hashing

In my opinion, one of the few great things to come out of CISC instruction sets is the pext/pdep instructions. These machine instructions can rearrange the bits of one register according to the mask provided by another. As a result, we can very quickly and perfectly hash occupancy bitboards.

To visualize how PEXT works, here's a small 16-bit example:

input  : 0b_0010_1010_0011_0101
mask   : 0b_0010_1100_0010_1110
output : 0b_0000_0000_0110_1010

The bits of the input are extracted according the mask and then put together to form the 7 right bits of the output (there are 7 bits set in the mask). PDEP works the exact opposite, instead depositing the least significant bits of the input onto the output.

With these two instructions, we can efficiently hash the blocker mask to generate slider moves.

Architecture support

PEXT and PDEP came along in Intel's bmi2 extension set. It first appeared around 2013 and is widely supported by most modern x86 CPUs today. The instruction(s) are extremely niche but for chess engines, they are the best thing since sliced bread. In the rare case a microarchitecture does not support PEXT, other perfect hash methods exist, like magic number hashing and first rank attacks.

highlighted blockers — Blocker occupancy mask

rook's possible moves — Rook's pseudo-legal moves.

I won't bore you with the construction of these lookup maps; they can be dumb, slow, and simple since they are computed only once (ideally at compile-time).

The result is pseudo-legal move lookups in just a few assembly instructions.

pub fn get_rook_moves(pos: usize, occupied: u64) -> u64 {
    *PEXT_ORTHO_LOOKUP[pos][_pext_u64(occupied, ORTHO_RAY_LOOKUP[pos])]
}

A few things of note though, the bishop lookup map can be 4 times smaller---512KiB---since we don't need to care about the presence of blockers on the edge of the board when it comes to bishops (take a moment to convince yourself of this fact). The rook lookup however, does need the additional squares on the edge since a rook on an edge needs to check for blockers along that edge.

Another interesting implementation detail: I used a build.rs script to generate these lookup maps as literals, since Rust's support for const fn is pretty limited, particularly when it comes to loops and intrinsics. Surprisingly, rustc was able to parse through the additional megabytes of code fairly quickly.

Special Moves

Pawn moves do not use lookup maps; instead, we can use bit shifting to generate pushes, double-pushes, captures, and promotions for all pawns at once.

As a quick example, to generate pseudo-legal pawn captures for white, we do

let left_captures  = ((WHITE & PAWNS & !FILEA) << 9) & BLACK;
let right_captures = ((WHITE & PAWNS & !FILEH) << 7) & BLACK;

To find possible promotions, we can mask out pawn_moves & RANK8.

It turns out, applying chess knowledge as bit manipulation is surprisingly elegant and enables fast pseudo-legal move generation. However, our goal is legal move generation, which is the real challenge.

What about castling and enpassant?

enpassant and castling must be more-or-less hardcoded. However, there are a few caveats so we will revisit these two later.

Legal Move Generation

The next step is to prune out illegal moves. Surprisingly (or maybe not), move legality can also be checked quickly using bit magic. To get from pseudo-legal moves to legal moves, we really just have to check one condition: will this move leave the king in check?

There are 3 types of pseudo-legal moves that leave the king in check:

Not taking care of an existing check (blocking, capturing, or moving king).
Moving a pinned piece.
Moving the king into enemy line of sight.

The core idea to handling all 3 of these cases it to make special masks that mask out illegal bits. When preparing to generate moves for a position, we will generate a check mask, pin mask, and seen mask to handle each of the three cases respectively.

Check Mask

The check mask is used to prune illegal moves when the king is in check. Specifically, we want to keep only moves that block the check or capture the attacking piece. Our goal is to be able to prune pseudo legal moves with a simple & operation:

let legal_moves = pseudo_legal_moves & checkmask;

There are 3 cases to consider:

No checks: If the king is not in check, every bit of the check mask is set (ANDing will do nothing).
One check: This is the interesting case; we will cover this in depth.
Two checks: No piece other than the king can move in double-check, so the check mask is 0.

pseudo legal moves of a knight — The knight's pseudo-legal moves

knight legal moves — AND to get knight's legal moves

Note, if the king were checked by a knight, the check mask would have the knight's position's bit set.

With the check mask, we can prune all illegal moves when the king is in check with a single bitwise AND. There is a little overhead to computing the check mask, but it's only done once per position, and can be done simultaneously with the pin mask, which we will look at next.

Pin Mask

The pin mask is the same idea but there's a bit more nuance. With the pin mask, we want to be able to select all pinned pieces and then generate legal moves of pinned pieces.

The pinned mask is defined as follows:

A bit of the pinned mask is set if it lies on a ray from an opponent slider to the ally king AND that ray contains exactly one blocking piece. The opponent slider is included, and the ally king is excluded.

Let's see an example (black to move):

orthogonal pinmask — Orthogonal pin mask

Observe, a pinned piece can only move along its pinned axis. To select pinned pieces, we can do

let pinned_rooks = ally_mask & ROOKS & pin_mask;

Then to generate its moves (separate from unpinned pieces):

let pinned_rook_moves = get_rook_moves(pinned_rooks, opponent_occupancy) & pin_mask_ortho;

Note, we actually have 2 pin masks: one for orthogonal rays and one for diagonal rays. (The combined pin mask is pin_mask_ortho | pin_mask_diag). This is to prevent cases like re4 being a legal move in the example above.

Pinned knights can never move since they can't move on any pinned axis. Pinned queens don't need any special handling if they are treated as a combined rook and bishop. Pinned pawns are a little trickier, but generally, use pin_mask_ortho for pushes, pin_mask_diag for captures, and for enpassant... we'll talk about it later.

Computing check and pin masks

Computing these masks fast is important. Try to squeeze as much of the computation into lookup tables and bit tricks as possible. My implementation first checks whether an enemy pawn can capture the king, then knights, and finally looping through the rays of the king.

for attacker_sqr in opponent_sliders_ortho:
    let attack_mask = ortho_ray_lookup(attacker_sqr);
    if attack_mask & ally_king != 0 {   // Same axis, potential check found
        let ray = get_ray_between(king_sqr, attacker_sqr); // 64x64 lookup table
        let blockers = ray & occupied;
        match blockers.count_ones() {   // How many blockers along ray?
            0 => {
                checks += 1;    // Count checks to determine if in double check
                checkmask |= ray;
                checkmask.set_bit(attacker_sqr);
            }
            1 => {
                pinmask_hv |= ray;
                pinmask_hv.set_bit(attacker_sqr);
            }
            _ => {} // blocked by > 1 piece (safe)
        }
    }
}

Putting it All Together

With check and pin masks at our disposal, we can generate legal moves for knights, sliders, and pawns! To prove my point, I'll run through some of the trickiest examples:

To generate legal double pushes for white pawns,

let pin_mask_horiz = pin_mask_ortho & rank_of_square(our_king);
let movable_pawns = PAWNS & WHITE & !(pin_mask_diag | pin_mask_horiz);
let single_pushes = (movable_pawns << 8) & empty;
let double_pushes = (single_pushes << 8) & empty & RANK4 & checkmask;

Note, we generate single pushes first, since if we went straight to double pushes, our pawns pawns would be able to jump over other pieces. Additionally, we extract the horizontal pin mask, since horizontally pinned pawns can never move (but vertically pinned ones can).

To generate legal captures for black pawns,

let relevant_pawns = BLACK & PAWNS & !pin_mask_ortho;
let pinned = relevant_pawns & pin_mask_diag;
let unpinned = relevant_pawns & !pin_mask_diag;
let left_captures = (   ((unpinned & !FILEA) >> 7) | 
                        ((pinned & !FILEA) >> 7) & pinmask_diag 
                    ) & WHITE & checkmask
let right_captures = (   ((unpinned & !FILEA) >> 9) | 
                        ((pinned & !FILEA) >> 9) & pinmask_diag 
                    ) & WHITE & checkmask
// bonus: promotions
let promotions = (left_captures | right_captures) & RANK1;

For all moves except pawn pushes, pinned and unpinned pieces should be handled separately.

And finally, to run through an entire example for the sake of completeness,

fn generate_ortho_slider_moves() {
    // get all relevant masks and bitboards
    let sliders_ortho = ally_mask & ROOKS & QUEENS;
    let mut pinned = sliders_ortho & pin_mask_ortho;
    let mut unpinned = sliders_ortho ^ pinned;

    while pinned != 0 {
        let pos = pinned.pop_rightmost_bit();

        let mut moves = get_rook_moves(pos, occupied)
            & (!ally_mask)          // we can land on enemy or empty squares
            & (checkmask)
            & (pinmask_ortho);
        while moves != 0 {
            let target = moves.pop_rightmost_bit();
            add_move(Variant::Normal, pos, target)
        }
    }
    while unpinned != 0 {
        let pos = unpinned.pop_rightmost_bit();

        let mut moves = pext_hv_lookup(pos, occupied) & (!ally_mask) & (checkmask);
        while moves != 0 {
            let target = moves.pop_rightmost();
            add_move(Variant::Normal, pos, target)
        }
    }
}

Notice that in all these examples, the only operations going on are simple bit ops. There's a few branches in the form of bit loops, but they are unavoidable. As a result, move generation is blazingly fast.

King Moves

There's one last move type and mask we haven't talked about yet: king moves and the seen mask. King moves are unique since the king cannot move to a square that is "seen" by an opponent piece. The fastest way to handle king moves is to build a "seen" mask which has a bit set for every square an enemy piece can move to. (This includes pinned enemy pieces; ie, we cannot move into the line of sight of a pinned enemy rook). Then, to generate king moves, we do:

let moves = king_lookup(king_pos) & enemy_or_empty & !seen_by_enemy;

Another benefit of this seen_by_enemy mask is that we can use it to generate castle moves. Since a king cannot castle if the squares between it and the rook are attacked by an opponent, we can AND the castle path with the seen_by_enemy mask to find whether we are allowed to castle.

Computing the seen mask

The seen_by_enemy mask is easily computed using the lookup maps for each piece. Just loop over each piece and OR the result of its lookup map to the seen mask. Pawns can be computed all at once with shifting.

Enpassant

The dreaded enpassant. The good news is, we now have all the tools necessary to efficiently query the board information necessary for enpassant. The bad news is, enpassant logic and edge cases still suck. Honestly, there's no way other than to hard code it.

One important edge case is that enpassant can cause two pieces to be removed off an axis at once:

en passant edge case — Black's enpassant capture...

Our pinmasks do not account for this (2 blockers = no pin detected). Instead, to prevent this edge case, we can remove both pawns from the occupancy, then see if there is an unblocked ray from a rook/queen to the king.

One other consideration that must be made for enpassant is when making a move, the captured piece is on a square different from the one moved to. Enpassant is the only type of move that does this, so the make_move() function must uniquely account for this.

In terms of performance, enpassant handling is very fast---double pushes are rare, and double pushes with an opponent pawn adjacent to it are even rarer. This means we can early-return and avoid enpassant logic in most search paths.

Final Notes

By encoding chess logic with bit manipulation, we can elegantly avoid all unnecessary branches and use the fewest instructions possible to generate moves. Adding a sprinkle of template programming and some efficient make_move() implementation and we have a recipe to compete with the fastest move generators on the planet.

There's some stuff I left out, such as actually making a move by manipulating the board state, but I believe the interesting part and bottle-neck is typically move generation, not move making. However, its worth noting that the ideas of bit manipulation are applicable to move making.

In the next post, I will discuss optimizing bitboard-based move generation with a focus on Rust.