Encoding Schemes
Understanding different encoding schemes is crucial for system design. From shortening URLs to generating unique IDs, encoding helps us represent data efficiently while meeting specific constraints.
Why Encoding Matters
Common Use Cases
URL Shortening
- • Convert long URLs to short codes
- • bit.ly:
2VhK8pQ - • YouTube:
dQw4w9WgXcQ
Unique ID Generation
- • Human-readable identifiers
- • Distributed system IDs
- • Session tokens & API keys
Common Encoding Schemes
1. Base10 (Decimal)
Standard Decimal
Character Set: 0-9 (10 chars)
Use Cases: Human-readable numbers
Example: 12345678
✅ Universal understanding
✅ Easy validation
❌ Long representation
2. Base16 (Hexadecimal)
Hexadecimal
Character Set: 0-9, A-F (16 chars)
Use Cases: Memory addresses, color codes
Example: 4A3F2B1C
✅ Compact for binary data
✅ Direct byte mapping
⚠️ Case sensitivity issues
3. Base32
Base32 Encoding
Character Set: A-Z, 2-7 (32 chars)
Use Cases: Case-insensitive systems
Example: JBSWY3DPEBLW64TM
✅ No case sensitivity
✅ Avoids ambiguous chars
❌ 20% longer than Base64
📝 Note: Excludes 0, 1, 8, 9 to avoid confusion with O, I, B, g
4. Base62
Base62 - The URL Shortener's Choice
Character Set: 0-9, a-z, A-Z (62 chars)
Use Cases: URL shorteners, readable IDs
Example: 3D7xmK9p
✅ URL-safe without encoding
✅ High density
✅ Human-friendly
Why Base62 for URLs?
- • No special characters that need URL encoding
- • Case-sensitive for maximum density
- • 62^7 = 3.5 trillion combinations in just 7 characters
5. Base64
Base64 Encoding
Character Set: A-Z, a-z, 0-9, +, / (64 chars)
Use Cases: Binary data in text format
Example: SGVsbG8gV29ybGQh
✅ Efficient for binary
✅ Standard padding with =
❌ Not URL-safe (+, /)
Base64 Variants
Standard Base64
Uses + and / (requires URL encoding)
URL-Safe Base64
Uses - and _ instead of + and /
Encoding Density Comparison
| Encoding | Bits per Char | 8 Bytes (64 bits) Encoded Length | Efficiency |
|---|---|---|---|
| Base10 | ~3.32 bits | 20 characters | 41.5% |
| Base16 (Hex) | 4 bits | 16 characters | 50% |
| Base32 | 5 bits | 13 characters | 62.5% |
| Base62 | ~5.95 bits | 11 characters | 74.4% |
| Base64 | 6 bits | 11 characters | 75% |
Implementation Examples
Base62 Encoding/Decoding
class Base62Encoder:
# Character set for Base62
CHARSET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
BASE = 62
@staticmethod
def encode(num):
"""Convert a number to Base62 string"""
if num == 0:
return "0"
result = []
while num > 0:
remainder = num % Base62Encoder.BASE
result.append(Base62Encoder.CHARSET[remainder])
num = num // Base62Encoder.BASE
return ''.join(reversed(result))
@staticmethod
def decode(encoded):
"""Convert Base62 string back to number"""
num = 0
for char in encoded:
num = num * Base62Encoder.BASE + Base62Encoder.CHARSET.index(char)
return num
# Example usage
encoder = Base62Encoder()
# URL shortener use case
url_id = 125432985 # Database ID
short_code = encoder.encode(url_id) # "8KpQ5"
# Decode back
original_id = encoder.decode(short_code) # 125432985Custom Base Encoding
class CustomBaseEncoder:
def __init__(self, charset):
"""Create encoder with custom character set"""
self.charset = charset
self.base = len(charset)
# Create reverse lookup for decoding
self.char_to_index = {char: i for i, char in enumerate(charset)}
def encode(self, num):
if num == 0:
return self.charset[0]
result = []
while num > 0:
result.append(self.charset[num % self.base])
num //= self.base
return ''.join(reversed(result))
def decode(self, encoded):
num = 0
for char in encoded:
num = num * self.base + self.char_to_index[char]
return num
# Example: Crockford's Base32 (excludes I, L, O, U to avoid confusion)
crockford_charset = "0123456789ABCDEFGHJKMNPQRSTVWXYZ"
crockford = CustomBaseEncoder(crockford_charset)
# Example: URL-safe Base64
urlsafe_base64_charset = (
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"
)
urlsafe = CustomBaseEncoder(urlsafe_base64_charset)Choosing the Right Encoding
🎯 Decision Matrix
Use Base62 when:
- ✓ Building URL shorteners
- ✓ Need human-readable IDs
- ✓ Want maximum density without special characters
- ✓ Case-sensitivity is acceptable
Use Base32 when:
- ✓ Case-insensitive systems
- ✓ Voice/phone communication of codes
- ✓ QR codes or OCR systems
- ✓ Need to avoid ambiguous characters
Use Base64 when:
- ✓ Encoding binary data (images, files)
- ✓ Email attachments (MIME)
- ✓ JWT tokens
- ✓ Data URIs in web development
Use Hexadecimal when:
- ✓ Debugging binary data
- ✓ Color codes (#FF5733)
- ✓ Memory addresses
- ✓ Cryptographic hashes
Real-World Applications
🔗 TinyURL / bit.ly
Uses Base62 to convert numeric IDs to short codes
ID: 125432985
Base62: "8KpQ5"
URL: https://bit.ly/8KpQ5
🎥 YouTube Video IDs
11-character Base64 variant for video IDs
Video ID: dQw4w9WgXcQ
~64 bits of entropy
2^64 possible videos
🔑 API Keys
Base64 encoding of random bytes
Random: 32 bytes
Base64: sk_live_4eC39HqLyjWDarjtT1zdp7dc
🎟️ Ticket/Coupon Codes
Base32 for case-insensitive, typo-resistant codes
Code: SAVE-2KQ3-XM9P-7TRY
No 0/O, 1/I confusion
Voice-friendly
Summary
Encoding schemes are fundamental building blocks for many system design problems. Choose based on:
- Density requirements: How short does it need to be?
- Character constraints: URL-safe? Case-sensitive?
- Human factors: Will people type it? Say it over phone?
- Use case: Binary data? Numeric IDs? Random tokens?
💡 Pro Tip: For distributed ID generation, Base62 offers the best balance of density and usability. For binary data transmission, Base64 is the standard. For human communication, consider Base32 variants.