Step 1: Scope & Requirements

Step 1 of 6: S - Scope & Requirements

Define what our distributed file storage system needs to do and how well it should perform

📋 Requirements Gathering

💭 Interviewer's Context

"We want to build a file storage and synchronization service similar to Dropbox or Google Drive. Users should be able to upload files, access them from multiple devices, share with others, and collaborate in real-time. Think about both the technical challenges and user experience."

🎯 Functional Requirements

Core Requirements (MVP):

  • File Upload/Download: Support files up to several GB
  • File Synchronization: Automatically sync files between local and remote storage across multiple devices
  • File Sharing: Share files/folders with permissions
  • File Versioning: Keep historical versions with rollback
  • Folder Management: Create, delete, move, rename folders
  • Search: Find files by name and metadata
  • Offline Access: Local caching with sync on reconnection

Nice-to-Have Features:

  • Collaborative Editing: Real-time document editing
  • File Comments: Add comments to files/folders
  • Advanced Sharing: Link expiration, password protection
  • Smart Sync: Selective sync, cloud-only files
  • API Access: Third-party integrations
  • Full-text Search: Search inside document content

⚡ Non-Functional Requirements

📈 Scale
  • • 500M total users
  • • 1M daily active users
  • • 500 files per user
  • • 10MB average file size
  • • 5GB storage per user
🚀 Performance
  • • Upload: 1-10 MB/s
  • • Download: 50-100 MB/s
  • • Sync latency: < 1 second
  • • Search: < 200ms
  • • File access: < 100ms
🔒 Reliability
  • • 99.99% uptime
  • • 99.999% durability
  • • Zero data loss
  • • Automatic failover
  • • Disaster recovery
🔐 Security
  • • End-to-end encryption
  • • Access control
  • • Audit logging
  • • Data residency
  • • Compliance (GDPR)

🧮 Back-of-Envelope Estimation

📊 Storage Requirements

Total users: 500M
Storage per user: 5GB
Total storage: 2.5 EB
With 3x replication: 7.5 EB
Files per user: 500 files
Total files: 250 billion

⚡ Traffic Estimates

Daily active users: 1M
File operations/user/day: 10
Total operations/day: 10M
Operations per second: 116 QPS
Read:Write ratio: 100:1
Data transfer/month: 500 PB

🖥️ Infrastructure Needs

Application servers: 100+
Database servers: 50+
Cache servers: 20+
Load balancers: 10+
CDN edge locations: 200+
Storage nodes: 1000+

🎯 Key Design Decisions from Requirements

✅ Read-Heavy Optimization

100:1 read/write ratio suggests heavy use of caching layers and CDN for downloads

✅ File Chunking Required

Large files (GB scale) need chunking for efficient transfer, resume capability, and deduplication

✅ Eventually Consistent

Real-time sync and multi-device access requires eventual consistency with conflict resolution

Critical Questions to Ask the Interviewer

📁 File Characteristics

  • • What's the maximum file size we need to support?
  • • What file types are most common? (documents, media, code)
  • • How important is deduplication? (shared files, similar content)
  • • Do we need to support file previews/thumbnails?

🔄 Sync Behavior

  • • How quickly should changes sync between devices?
  • • Should we support offline editing with conflict resolution?
  • • Do we need real-time collaborative editing?
  • • How many versions should we keep per file?

🌍 Global Requirements

  • • Do we need to support global users? (latency requirements)
  • • Are there data residency/compliance requirements?
  • • Should files be accessible from web browsers?
  • • Do we need mobile app support?

💰 Business Constraints

  • • What's our budget for storage and bandwidth?
  • • Should we build custom storage or use cloud providers?
  • • How important is cost optimization vs performance?
  • • Do we have existing infrastructure to leverage?