Step 2: API Design & Entities
Step 2 of 6: A - API Design & Core Entities
Design the core entities and API endpoints that power our file operations
šļø Core Entities & Relationships
š¤ User
Account information and storage quota tracking
š File
File metadata and version information
š§© ChunkMetadata
Metadata and S3/GCS URLs for file chunks (actual data in blob storage)
š± Device
User's devices for sync coordination
š Permission
File sharing and access control
āļø Blob Storage
S3/GCS for actual file data, database only stores URLs
š Important Architecture Decision: We store only metadata in our database (file info, chunk checksums, S3/GCS URLs). The actual file data is stored in blob storage services like AWS S3, Google Cloud Storage, or Azure Blob Storage. This separation allows: ⢠Cost-effective storage (blob storage is cheaper) ⢠Direct CDN integration for fast downloads ⢠Horizontal scaling without database bloat ⢠Leveraging cloud provider's durability guarantees
š REST API Design
šÆ Who Uses This API?
ā Client Applications
Desktop apps, mobile apps, web JavaScript, third-party integrations
These make API calls on behalf of users
ā NOT End Users Directly
Users never see or call these endpoints
Users interact with UI, not raw APIs
Example Flow: User drags 5GB file ā Client handler chunks into 1,280 pieces (4MB each) ā Makes 1,280 API calls to POST /api/files/upload ā Shows single progress bar to user
š File Operations
/api/files/uploadUpload file chunks with metadata
š Note: Official client apps automatically chunk files (4MB each). Third-party developers can choose to implement chunking or upload smaller files directly.
Request/Response Example
Request:
{
"name": "document.pdf",
"size": 5242880,
"mime_type": "application/pdf",
"parent_folder_id": "folder_123",
"chunks": [
{
"chunk_number": 1,
"size": 4194304,
"checksum": "sha256:abc123..."
},
{
"chunk_number": 2,
"size": 1048576,
"checksum": "sha256:def456..."
}
]
}Response:
{
"file_id": "file_a1b2c3d4e5",
"status": "upload_initialized",
"presigned_urls": [
{
"chunk_number": 1,
"upload_url": "<s3 presigned url>",
"expires_in": 3600
}
],
"deduplication_results": {
"chunk_1": "upload_required",
"chunk_2": "already_exists"
},
"upload_session_id": "session_xyz789"
}/api/files/{file_id}Get file metadata and download URL
/api/files/{file_id}Update file metadata or content
/api/files/{file_id}Delete file (move to trash)
š Sync Operations
/api/sync/changesGet changes since last sync timestamp
Response Example
{
"changes": [
{
"file_id": "file_456",
"change_type": "modified",
"timestamp": 1234567890,
"version": 3
}
],
"cursor": "sync_token_789"
}/api/sync/register-deviceRegister device for sync notifications
/api/sync/conflictsGet unresolved sync conflicts
/api/sync/resolve-conflictResolve sync conflict with chosen version
š¤ Sharing Operations
/api/sharing/create-linkCreate shareable link with permissions
/api/sharing/invite-userInvite user to collaborate on folder
/api/sharing/permissions/{file_id}Get current sharing permissions
/api/sharing/permissions/{file_id}Update sharing permissions
š Folder Operations
/api/foldersCreate new folder
/api/folders/{folder_id}/contentsList folder contents
/api/folders/{folder_id}/moveMove folder to new location
/api/searchSearch files and folders
šØāš»Deep Dive: How Client Apps Handle File Uploadā¼
Here's a simplified example of how client applications (desktop app, web interface) handle file uploads with automatic chunking:
// Client-side JavaScript/TypeScript implementation
class FileUploader {
private CHUNK_SIZE = 4 * 1024 * 1024; // 4MB chunks
private MAX_PARALLEL_UPLOADS = 4; // Upload 4 chunks simultaneously
async uploadFile(file: File): Promise<void> {
// Step 1: Initialize upload and get file ID
const fileId = await this.initializeUpload(file);
// Step 2: Calculate total chunks
const totalChunks = Math.ceil(file.size / this.CHUNK_SIZE);
// Step 3: Chunk and upload with parallelization
const uploadPromises: Promise<void>[] = [];
let chunksUploaded = 0;
for (let i = 0; i < totalChunks; i++) {
// Create chunk
const start = i * this.CHUNK_SIZE;
const end = Math.min(start + this.CHUNK_SIZE, file.size);
const chunk = file.slice(start, end);
// Control parallelization
if (uploadPromises.length >= this.MAX_PARALLEL_UPLOADS) {
await Promise.race(uploadPromises);
}
// Upload chunk and track progress
const uploadPromise = this.uploadChunk(fileId, chunk, i, totalChunks)
.then(() => {
chunksUploaded++;
this.updateProgressBar(chunksUploaded / totalChunks * 100);
// Remove completed promise from array
const index = uploadPromises.indexOf(uploadPromise);
uploadPromises.splice(index, 1);
});
uploadPromises.push(uploadPromise);
}
// Wait for all remaining chunks
await Promise.all(uploadPromises);
// Step 4: Finalize upload
await this.finalizeUpload(fileId);
}
private async uploadChunk(
fileId: string,
chunk: Blob,
chunkNumber: number,
totalChunks: number
): Promise<void> {
const checksum = await this.calculateSHA256(chunk);
// Check if chunk already exists (deduplication)
const exists = await this.checkChunkExists(checksum);
if (exists) {
// Just register the chunk, don't upload data
await this.registerChunk(fileId, chunkNumber, checksum);
return;
}
// Upload with retry logic
let retries = 3;
while (retries > 0) {
try {
await fetch('/api/files/upload', {
method: 'POST',
headers: {
'Content-Type': 'application/octet-stream',
'X-File-Id': fileId,
'X-Chunk-Number': chunkNumber.toString(),
'X-Total-Chunks': totalChunks.toString(),
'X-Checksum': checksum,
},
body: chunk
});
break; // Success
} catch (error) {
retries--;
if (retries === 0) throw error;
await this.delay(Math.pow(2, 3 - retries) * 1000); // Exponential backoff
}
}
}
private updateProgressBar(percentage: number): void {
// Update UI - single progress bar for entire file
document.getElementById('upload-progress').style.width = percentage + '%';
document.getElementById('upload-text').innerText =
`Uploading... ${Math.round(percentage)}%`;
}
}
// Usage - what happens when user drops a file
dropzone.addEventListener('drop', async (e) => {
const file = e.dataTransfer.files[0]; // User drops a 5GB video
const uploader = new FileUploader();
await uploader.uploadFile(file); // Handles everything transparently
alert('Upload complete!'); // User sees success
});ā Smart Features
- ⢠Parallel uploads (4 chunks at once)
- ⢠Deduplication check before upload
- ⢠Automatic retry with exponential backoff
- ⢠Single progress bar for user
š§ Production Enhancements
- ⢠Resume from last chunk on failure
- ⢠Adaptive chunk size based on network
- ⢠Priority queue for small files
- ⢠Background upload service
šļø Database Technology Comparison
| Aspect | SQL (PostgreSQL) | NoSQL (DynamoDB) | Hybrid Approach |
|---|---|---|---|
| Metadata Storage | ACID transactions, complex queries | Fast single-item operations | SQL for metadata, NoSQL for chunks |
| Scalability | Vertical scaling, read replicas | Horizontal scaling, auto-sharding | Best of both worlds |
| Consistency | Strong consistency | Eventual consistency | Strong for metadata, eventual for content |
| Query Flexibility | Complex joins and aggregations | Key-value access patterns | Complex queries where needed |
| Use Case | File relationships, permissions | Chunk locations, high-volume ops | Production-ready compromise |
ā Recommended: Hybrid Approach
Use PostgreSQL for file metadata, user management, and permissions (requires ACID). Use DynamoDB/Cassandra for chunk locations and high-volume operations (requires scale). This gives us strong consistency where it matters and scale where we need it.
šÆ Key Design Decisions
ā File Chunking
Client app automatically splits files into 4MB chunks for parallel transfer, deduplication, and resumable uploads. Transparent to users.
ā Separate Metadata
Store file metadata separately from content for fast queries and atomic operations
ā Permission Model
Flexible permission system supporting user-based and link-based sharing