Step 2: API Design & Entities

Step 2 of 6: A - API Design & Core Entities

Design the core entities and API endpoints that power our file operations

🏗️ Core Entities & Relationships

👤 User

Account information and storage quota tracking

📄 File

File metadata and version information

🧩 ChunkMetadata

Metadata and S3/GCS URLs for file chunks (actual data in blob storage)

📱 Device

User's devices for sync coordination

🔐 Permission

File sharing and access control

☁️ Blob Storage

S3/GCS for actual file data, database only stores URLs

📌 Important Architecture Decision: We store only metadata in our database (file info, chunk checksums, S3/GCS URLs). The actual file data is stored in blob storage services like AWS S3, Google Cloud Storage, or Azure Blob Storage. This separation allows: • Cost-effective storage (blob storage is cheaper) • Direct CDN integration for fast downloads • Horizontal scaling without database bloat • Leveraging cloud provider's durability guarantees

🔌 REST API Design

🎯 Who Uses This API?

✅ Client Applications

Desktop apps, mobile apps, web JavaScript, third-party integrations

These make API calls on behalf of users

❌ NOT End Users Directly

Users never see or call these endpoints

Users interact with UI, not raw APIs

Example Flow: User drags 5GB file → Client handler chunks into 1,280 pieces (4MB each) → Makes 1,280 API calls to POST /api/files/upload → Shows single progress bar to user

📁 File Operations

POST/api/files/upload

Upload file chunks with metadata

📌 Note: Official client apps automatically chunk files (4MB each). Third-party developers can choose to implement chunking or upload smaller files directly.

Request/Response Example

Request:

{
"name": "document.pdf",
"size": 5242880,
"mime_type": "application/pdf",
"parent_folder_id": "folder_123",
"chunks": [
  {
    "chunk_number": 1,
    "size": 4194304,
    "checksum": "sha256:abc123..."
  },
  {
    "chunk_number": 2,
    "size": 1048576,
    "checksum": "sha256:def456..."
  }
]
}

Response:

{
"file_id": "file_a1b2c3d4e5",
"status": "upload_initialized",
"presigned_urls": [
  {
    "chunk_number": 1,
    "upload_url": "<s3 presigned url>",
    "expires_in": 3600
  }
],
"deduplication_results": {
  "chunk_1": "upload_required",
  "chunk_2": "already_exists"
},
"upload_session_id": "session_xyz789"
}

GET/api/files/{file_id}

Get file metadata and download URL

PUT/api/files/{file_id}

Update file metadata or content

DELETE/api/files/{file_id}

Delete file (move to trash)

🔄 Sync Operations

GET/api/sync/changes

Get changes since last sync timestamp

Response Example

{
"changes": [
  {
    "file_id": "file_456",
    "change_type": "modified",
    "timestamp": 1234567890,
    "version": 3
  }
],
"cursor": "sync_token_789"
}

POST/api/sync/register-device

GET/api/sync/conflicts

Get unresolved sync conflicts

POST/api/sync/resolve-conflict

Resolve sync conflict with chosen version

🤝 Sharing Operations

POST/api/sharing/create-link

Create shareable link with permissions

POST/api/sharing/invite-user

Invite user to collaborate on folder

GET/api/sharing/permissions/{file_id}

Get current sharing permissions

PUT/api/sharing/permissions/{file_id}

Update sharing permissions

📂 Folder Operations

POST/api/folders

Create new folder

GET/api/folders/{folder_id}/contents

List folder contents

PUT/api/folders/{folder_id}/move

Move folder to new location

GET/api/search

Search files and folders

👨‍💻Deep Dive: How Client Apps Handle File Upload▼

Here's a simplified example of how client applications (desktop app, web interface) handle file uploads with automatic chunking:

// Client-side JavaScript/TypeScript implementation
class FileUploader {
private CHUNK_SIZE = 4 * 1024 * 1024; // 4MB chunks
private MAX_PARALLEL_UPLOADS = 4;      // Upload 4 chunks simultaneously

async uploadFile(file: File): Promise<void> {
  // Step 1: Initialize upload and get file ID
  const fileId = await this.initializeUpload(file);

  // Step 2: Calculate total chunks
  const totalChunks = Math.ceil(file.size / this.CHUNK_SIZE);

  // Step 3: Chunk and upload with parallelization
  const uploadPromises: Promise<void>[] = [];
  let chunksUploaded = 0;

  for (let i = 0; i < totalChunks; i++) {
    // Create chunk
    const start = i * this.CHUNK_SIZE;
    const end = Math.min(start + this.CHUNK_SIZE, file.size);
    const chunk = file.slice(start, end);

    // Control parallelization
    if (uploadPromises.length >= this.MAX_PARALLEL_UPLOADS) {
      await Promise.race(uploadPromises);
    }

    // Upload chunk and track progress
    const uploadPromise = this.uploadChunk(fileId, chunk, i, totalChunks)
      .then(() => {
        chunksUploaded++;
        this.updateProgressBar(chunksUploaded / totalChunks * 100);
        // Remove completed promise from array
        const index = uploadPromises.indexOf(uploadPromise);
        uploadPromises.splice(index, 1);
      });

    uploadPromises.push(uploadPromise);
  }

  // Wait for all remaining chunks
  await Promise.all(uploadPromises);

  // Step 4: Finalize upload
  await this.finalizeUpload(fileId);
}

private async uploadChunk(
  fileId: string,
  chunk: Blob,
  chunkNumber: number,
  totalChunks: number
): Promise<void> {
  const checksum = await this.calculateSHA256(chunk);

  // Check if chunk already exists (deduplication)
  const exists = await this.checkChunkExists(checksum);
  if (exists) {
    // Just register the chunk, don't upload data
    await this.registerChunk(fileId, chunkNumber, checksum);
    return;
  }

  // Upload with retry logic
  let retries = 3;
  while (retries > 0) {
    try {
      await fetch('/api/files/upload', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/octet-stream',
          'X-File-Id': fileId,
          'X-Chunk-Number': chunkNumber.toString(),
          'X-Total-Chunks': totalChunks.toString(),
          'X-Checksum': checksum,
        },
        body: chunk
      });
      break; // Success
    } catch (error) {
      retries--;
      if (retries === 0) throw error;
      await this.delay(Math.pow(2, 3 - retries) * 1000); // Exponential backoff
    }
  }
}

private updateProgressBar(percentage: number): void {
  // Update UI - single progress bar for entire file
  document.getElementById('upload-progress').style.width = percentage + '%';
  document.getElementById('upload-text').innerText =
    `Uploading... ${Math.round(percentage)}%`;
}
}

// Usage - what happens when user drops a file
dropzone.addEventListener('drop', async (e) => {
const file = e.dataTransfer.files[0]; // User drops a 5GB video
const uploader = new FileUploader();
await uploader.uploadFile(file);      // Handles everything transparently
alert('Upload complete!');            // User sees success
});

✅ Smart Features

• Parallel uploads (4 chunks at once)
• Deduplication check before upload
• Automatic retry with exponential backoff
• Single progress bar for user

🔧 Production Enhancements

• Resume from last chunk on failure
• Adaptive chunk size based on network
• Priority queue for small files
• Background upload service

🗄️ Database Technology Comparison

Aspect	SQL (PostgreSQL)	NoSQL (DynamoDB)	Hybrid Approach
Metadata Storage	ACID transactions, complex queries	Fast single-item operations	SQL for metadata, NoSQL for chunks
Scalability	Vertical scaling, read replicas	Horizontal scaling, auto-sharding	Best of both worlds
Consistency	Strong consistency	Eventual consistency	Strong for metadata, eventual for content
Query Flexibility	Complex joins and aggregations	Key-value access patterns	Complex queries where needed
Use Case	File relationships, permissions	Chunk locations, high-volume ops	Production-ready compromise

✅ Recommended: Hybrid Approach

Use PostgreSQL for file metadata, user management, and permissions (requires ACID). Use DynamoDB/Cassandra for chunk locations and high-volume operations (requires scale). This gives us strong consistency where it matters and scale where we need it.

🎯 Key Design Decisions

✅ File Chunking

Client app automatically splits files into 4MB chunks for parallel transfer, deduplication, and resumable uploads. Transparent to users.

✅ Separate Metadata

Store file metadata separately from content for fast queries and atomic operations

✅ Permission Model

Flexible permission system supporting user-based and link-based sharing

← Previous: Requirements

Next: Core Architecture →