Step 1: Scope & Requirements
Step 1 of 6: S - Scope & Requirements
Define what our distributed file storage system needs to do and how well it should perform
📋 Requirements Gathering
💭 Interviewer's Context
"We want to build a file storage and synchronization service similar to Dropbox or Google Drive. Users should be able to upload files, access them from multiple devices, share with others, and collaborate in real-time. Think about both the technical challenges and user experience."
🎯 Functional Requirements
Core Requirements (MVP):
- • File Upload/Download: Support files up to several GB
- • File Synchronization: Automatically sync files between local and remote storage across multiple devices
- • File Sharing: Share files/folders with permissions
- • File Versioning: Keep historical versions with rollback
- • Folder Management: Create, delete, move, rename folders
- • Search: Find files by name and metadata
- • Offline Access: Local caching with sync on reconnection
Nice-to-Have Features:
- • Collaborative Editing: Real-time document editing
- • File Comments: Add comments to files/folders
- • Advanced Sharing: Link expiration, password protection
- • Smart Sync: Selective sync, cloud-only files
- • API Access: Third-party integrations
- • Full-text Search: Search inside document content
⚡ Non-Functional Requirements
📈 Scale
- • 500M total users
- • 1M daily active users
- • 500 files per user
- • 10MB average file size
- • 5GB storage per user
🚀 Performance
- • Upload: 1-10 MB/s
- • Download: 50-100 MB/s
- • Sync latency: < 1 second
- • Search: < 200ms
- • File access: < 100ms
🔒 Reliability
- • 99.99% uptime
- • 99.999% durability
- • Zero data loss
- • Automatic failover
- • Disaster recovery
🔐 Security
- • End-to-end encryption
- • Access control
- • Audit logging
- • Data residency
- • Compliance (GDPR)
🧮 Back-of-Envelope Estimation
📊 Storage Requirements
Total users: 500M
Storage per user: 5GB
Total storage: 2.5 EB
With 3x replication: 7.5 EB
Files per user: 500 files
Total files: 250 billion
⚡ Traffic Estimates
Daily active users: 1M
File operations/user/day: 10
Total operations/day: 10M
Operations per second: 116 QPS
Read:Write ratio: 100:1
Data transfer/month: 500 PB
🖥️ Infrastructure Needs
Application servers: 100+
Database servers: 50+
Cache servers: 20+
Load balancers: 10+
CDN edge locations: 200+
Storage nodes: 1000+
🎯 Key Design Decisions from Requirements
✅ Read-Heavy Optimization
100:1 read/write ratio suggests heavy use of caching layers and CDN for downloads
✅ File Chunking Required
Large files (GB scale) need chunking for efficient transfer, resume capability, and deduplication
✅ Eventually Consistent
Real-time sync and multi-device access requires eventual consistency with conflict resolution
❓Critical Questions to Ask the Interviewer▼
📁 File Characteristics
- • What's the maximum file size we need to support?
- • What file types are most common? (documents, media, code)
- • How important is deduplication? (shared files, similar content)
- • Do we need to support file previews/thumbnails?
🔄 Sync Behavior
- • How quickly should changes sync between devices?
- • Should we support offline editing with conflict resolution?
- • Do we need real-time collaborative editing?
- • How many versions should we keep per file?
🌍 Global Requirements
- • Do we need to support global users? (latency requirements)
- • Are there data residency/compliance requirements?
- • Should files be accessible from web browsers?
- • Do we need mobile app support?
💰 Business Constraints
- • What's our budget for storage and bandwidth?
- • Should we build custom storage or use cloud providers?
- • How important is cost optimization vs performance?
- • Do we have existing infrastructure to leverage?