Skip to main content

How Base64 encoding works

Base64 converts arbitrary binary data into a string of printable ASCII characters by re-grouping bits and mapping small values to a 64-character alphabet. This page walks through the encoding algorithm step by step, shows the lookup table, explains padding, and covers the URL-safe variant.

PropertyValue
Input unit3 bytes (24 bits)
Output unit4 characters (6 bits each)
Size expansion~33% (4 chars per 3 bytes)
Alphabet size64 characters + = padding
URL-safe substitution+ → - and / → _ (RFC 4648 §5)
Algorithm complexityO(n) — one pass over input bytes
DecodingExact inverse of encoding — no key needed

The core idea

A byte holds 8 bits, which means 256 possible values (0–255). Not all 256 values map to printable, safe ASCII characters — many are control codes or have special meaning in protocols. Base64 sidesteps this by using only 6 bits per output character, which gives 2⁶ = 64 possible values — exactly the 64 characters that make up the alphabet.

Because each output character carries 6 bits instead of 8, Base64 needs 4 characters to represent the same information as 3 bytes (4 × 6 = 24 bits = 3 × 8). This 4:3 expansion is where the ~33% size increase comes from.

The encoding algorithm

Step 1 — Serialize to bytes

Base64 encodes bytes, not characters. If the input is text, it must first be serialized to a byte sequence. UTF-8 is the standard choice: it correctly encodes every Unicode codepoint including emoji, CJK characters, and accented letters.

Input: "hi!" → UTF-8 bytes
h  →  0x68  →  104  →  01101000
i  →  0x69  →  105  →  01101001
!  →  0x21  →  33   →  00100001

Step 2 — Group into 3-byte blocks

The bytes are processed three at a time (24 bits). If the total byte count is not divisible by three, the last group is padded with zero bits to reach 24 bits.

3 bytes concatenated = 24 bits
01101000 01101001 00100001

Step 3 — Split into 6-bit groups

The 24-bit block is divided into four 6-bit values. Each value is a number between 0 and 63 that will index into the Base64 alphabet.

24 bits → four 6-bit values
011010 | 000110 | 100100 | 100001
  26        6       36      33

Step 4 — Look up in the alphabet

Each 6-bit value is mapped to its Base64 character using the lookup table below.

26→a, 6→G, 36→k, 33→h → "aGkh"
26 → a
 6 → G
36 → k
33 → h

Output: aGkh
Decode "aGkh" in the tool

The Base64 alphabet

The 64 characters are split into four groups. The full table (value → character):

0–25A–Z
ABCDEFGHIJKLMNOPQRSTUVWXYZ
26–51a–z
abcdefghijklmnopqrstuvwxyz
52–610–9
0123456789
62–63+ /
+/

The = character is a padding marker — it is not part of the 64-value alphabet, but is appended to the output to make its length a multiple of 4.

Padding with =

If the total number of input bytes is not a multiple of 3, one or two bytes of zero-padding are added before encoding, and the corresponding output characters are replaced by =:

Input bytesRemainder mod 3Padding charsExample
Multiple of 30none"yes" → eWVz
Remainder 11=="a" → YQ==
Remainder 22="ab" → YWI=
Verify "a" → "YQ==" in the tool

Decoding — running the algorithm in reverse

Decoding is the exact inverse of encoding:

  1. Strip any trailing = padding characters.
  2. Look up each character in the alphabet to get its 6-bit value.
  3. Concatenate groups of four 6-bit values to form 24-bit blocks, then split each block back into three bytes.
  4. Decode the resulting byte sequence using the target charset (UTF-8 by default).

If any character in the input is not in the alphabet (and is not a padding =), the decoder should reject the input as invalid.

URL-safe Base64 (Base64URL)

Standard Base64 uses + (index 62) and / (index 63). Both characters have reserved meanings in URLs: + represents a space in query strings, and / is the path separator. Using them unescaped in a URL causes parsing errors or silent corruption.

RFC 4648 §5 defines URL-safe Base64, known as Base64URL, which simply substitutes:

Standard → URL-safe substitutions
+  →  -   (index 62)
/  →  _   (index 63)

All other characters remain the same. JSON Web Tokens (JWTs), OAuth tokens, and many authentication systems use Base64URL. The "URL-safe" toggle in base64tool applies this substitution automatically.

Toggle URL-safe mode in the tool

UTF-8, ASCII, and Latin-1

Base64 operates on bytes. The character set (charset) setting controls how the input string is serialized into bytes before encoding, and how bytes are interpreted after decoding:

  • UTF-8 — The default. Correctly encodes every Unicode character, including emoji (🚀), accented letters (é), and CJK characters. Always use UTF-8 unless you have a specific reason not to.
  • ASCII — Characters above codepoint 127 are truncated to their low 7 bits. Suitable only for pure ASCII input.
  • Latin-1 — Maps characters to their ISO 8859-1 byte values. Useful for compatibility with legacy systems that produced Latin-1 encoded Base64.

If you are decoding Base64 that was produced by an older system and the output looks garbled, try switching the charset to Latin-1.

Performance characteristics

Base64 encoding and decoding is O(n) in the number of input bytes — each byte is processed exactly once with no look-back or lookahead. Modern browsers execute Base64 operations at hundreds of megabytes per second using native btoa / atob implementations. base64tool adds a UTF-8 serialization step via TextEncoder and TextDecoder, which is also O(n) and hardware-accelerated in all major browsers.

Frequently asked questions

Why does Base64 output increase file size by ~33%?

Base64 maps every 3 bytes (24 bits) to 4 characters (each representing 6 bits). 4 characters ÷ 3 bytes = 1.333… — so any Base64-encoded payload is ~33% larger than the original binary.

Can Base64 encode any type of file?

Yes. Base64 operates on raw bytes and is file-format-agnostic. You can encode images, PDFs, videos, executables, or any arbitrary binary data. The encoded output is always valid printable ASCII regardless of the input content.

What happens when you decode invalid Base64?

A correct decoder should throw an error or return a failure signal. Invalid characters (anything outside the 64-char alphabet plus =) or incorrect padding will cause a decoding failure. base64tool surfaces this as an error in the status strip.

Is Base64 decoding just the encoding algorithm run backwards?

Yes. Decoding reverses every step: each Base64 character is looked up in the alphabet to get its 6-bit value, groups of four 6-bit values are concatenated to form 24 bits, and those 24 bits are split back into 3 bytes. Padding characters are ignored during this reconstruction.

When should I use URL-safe Base64 instead of standard Base64?

Use URL-safe Base64 (Base64URL) whenever the encoded string will appear in a URL, cookie, HTTP header, filename, or JWT. Standard Base64's + and / characters have special meanings in URLs and need to be percent-encoded (%2B and %2F) if left unchanged — URL-safe Base64 avoids this by substituting - and _ instead.