Step 2: API Design & Entities

Step 2 of 6: A - API Design & Core Entities

Design the core entities and API endpoints that power our file operations

šŸ—ļø Core Entities & Relationships

User• user_id (PK)• email• name• storage_used• created_atFile• file_id (PK)• user_id (FK)• parent_folder_id• name• size• mime_type• version• created_atChunkMetadata• chunk_id (PK)• file_id (FK)• chunk_number• checksum• storage_urlDevice• device_id (PK)• user_id (FK)• device_name• device_type• last_syncPermission• permission_id (PK)• file_id (FK)• user_id (FK)• permission_type• granted_atBlob Storage(AWS S3 / GCS)• Actual chunk data• Stored as objects• 3x replication• CDN distributionownssplits intoreferenceshasshared via

šŸ‘¤ User

Account information and storage quota tracking

šŸ“„ File

File metadata and version information

🧩 ChunkMetadata

Metadata and S3/GCS URLs for file chunks (actual data in blob storage)

šŸ“± Device

User's devices for sync coordination

šŸ” Permission

File sharing and access control

ā˜ļø Blob Storage

S3/GCS for actual file data, database only stores URLs

šŸ“Œ Important Architecture Decision: We store only metadata in our database (file info, chunk checksums, S3/GCS URLs). The actual file data is stored in blob storage services like AWS S3, Google Cloud Storage, or Azure Blob Storage. This separation allows: • Cost-effective storage (blob storage is cheaper) • Direct CDN integration for fast downloads • Horizontal scaling without database bloat • Leveraging cloud provider's durability guarantees

šŸ”Œ REST API Design

šŸŽÆ Who Uses This API?

āœ… Client Applications

Desktop apps, mobile apps, web JavaScript, third-party integrations

These make API calls on behalf of users

āŒ NOT End Users Directly

Users never see or call these endpoints

Users interact with UI, not raw APIs

Example Flow: User drags 5GB file → Client handler chunks into 1,280 pieces (4MB each) → Makes 1,280 API calls to POST /api/files/upload → Shows single progress bar to user

šŸ“ File Operations

POST/api/files/upload

Upload file chunks with metadata

šŸ“Œ Note: Official client apps automatically chunk files (4MB each). Third-party developers can choose to implement chunking or upload smaller files directly.

Request/Response Example

Request:

{
"name": "document.pdf",
"size": 5242880,
"mime_type": "application/pdf",
"parent_folder_id": "folder_123",
"chunks": [
  {
    "chunk_number": 1,
    "size": 4194304,
    "checksum": "sha256:abc123..."
  },
  {
    "chunk_number": 2,
    "size": 1048576,
    "checksum": "sha256:def456..."
  }
]
}

Response:

{
"file_id": "file_a1b2c3d4e5",
"status": "upload_initialized",
"presigned_urls": [
  {
    "chunk_number": 1,
    "upload_url": "<s3 presigned url>",
    "expires_in": 3600
  }
],
"deduplication_results": {
  "chunk_1": "upload_required",
  "chunk_2": "already_exists"
},
"upload_session_id": "session_xyz789"
}
GET/api/files/{file_id}

Get file metadata and download URL

PUT/api/files/{file_id}

Update file metadata or content

DELETE/api/files/{file_id}

Delete file (move to trash)

šŸ”„ Sync Operations

GET/api/sync/changes

Get changes since last sync timestamp

Response Example
{
"changes": [
  {
    "file_id": "file_456",
    "change_type": "modified",
    "timestamp": 1234567890,
    "version": 3
  }
],
"cursor": "sync_token_789"
}
POST/api/sync/register-device

Register device for sync notifications

GET/api/sync/conflicts

Get unresolved sync conflicts

POST/api/sync/resolve-conflict

Resolve sync conflict with chosen version

šŸ¤ Sharing Operations

POST/api/sharing/create-link

Create shareable link with permissions

POST/api/sharing/invite-user

Invite user to collaborate on folder

GET/api/sharing/permissions/{file_id}

Get current sharing permissions

PUT/api/sharing/permissions/{file_id}

Update sharing permissions

šŸ“‚ Folder Operations

POST/api/folders

Create new folder

GET/api/folders/{folder_id}/contents

List folder contents

PUT/api/folders/{folder_id}/move

Move folder to new location

GET/api/search

Search files and folders

šŸ‘Øā€šŸ’»Deep Dive: How Client Apps Handle File Uploadā–¼

Here's a simplified example of how client applications (desktop app, web interface) handle file uploads with automatic chunking:

// Client-side JavaScript/TypeScript implementation
class FileUploader {
private CHUNK_SIZE = 4 * 1024 * 1024; // 4MB chunks
private MAX_PARALLEL_UPLOADS = 4;      // Upload 4 chunks simultaneously

async uploadFile(file: File): Promise<void> {
  // Step 1: Initialize upload and get file ID
  const fileId = await this.initializeUpload(file);

  // Step 2: Calculate total chunks
  const totalChunks = Math.ceil(file.size / this.CHUNK_SIZE);

  // Step 3: Chunk and upload with parallelization
  const uploadPromises: Promise<void>[] = [];
  let chunksUploaded = 0;

  for (let i = 0; i < totalChunks; i++) {
    // Create chunk
    const start = i * this.CHUNK_SIZE;
    const end = Math.min(start + this.CHUNK_SIZE, file.size);
    const chunk = file.slice(start, end);

    // Control parallelization
    if (uploadPromises.length >= this.MAX_PARALLEL_UPLOADS) {
      await Promise.race(uploadPromises);
    }

    // Upload chunk and track progress
    const uploadPromise = this.uploadChunk(fileId, chunk, i, totalChunks)
      .then(() => {
        chunksUploaded++;
        this.updateProgressBar(chunksUploaded / totalChunks * 100);
        // Remove completed promise from array
        const index = uploadPromises.indexOf(uploadPromise);
        uploadPromises.splice(index, 1);
      });

    uploadPromises.push(uploadPromise);
  }

  // Wait for all remaining chunks
  await Promise.all(uploadPromises);

  // Step 4: Finalize upload
  await this.finalizeUpload(fileId);
}

private async uploadChunk(
  fileId: string,
  chunk: Blob,
  chunkNumber: number,
  totalChunks: number
): Promise<void> {
  const checksum = await this.calculateSHA256(chunk);

  // Check if chunk already exists (deduplication)
  const exists = await this.checkChunkExists(checksum);
  if (exists) {
    // Just register the chunk, don't upload data
    await this.registerChunk(fileId, chunkNumber, checksum);
    return;
  }

  // Upload with retry logic
  let retries = 3;
  while (retries > 0) {
    try {
      await fetch('/api/files/upload', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/octet-stream',
          'X-File-Id': fileId,
          'X-Chunk-Number': chunkNumber.toString(),
          'X-Total-Chunks': totalChunks.toString(),
          'X-Checksum': checksum,
        },
        body: chunk
      });
      break; // Success
    } catch (error) {
      retries--;
      if (retries === 0) throw error;
      await this.delay(Math.pow(2, 3 - retries) * 1000); // Exponential backoff
    }
  }
}

private updateProgressBar(percentage: number): void {
  // Update UI - single progress bar for entire file
  document.getElementById('upload-progress').style.width = percentage + '%';
  document.getElementById('upload-text').innerText =
    `Uploading... ${Math.round(percentage)}%`;
}
}

// Usage - what happens when user drops a file
dropzone.addEventListener('drop', async (e) => {
const file = e.dataTransfer.files[0]; // User drops a 5GB video
const uploader = new FileUploader();
await uploader.uploadFile(file);      // Handles everything transparently
alert('Upload complete!');            // User sees success
});

āœ… Smart Features

  • • Parallel uploads (4 chunks at once)
  • • Deduplication check before upload
  • • Automatic retry with exponential backoff
  • • Single progress bar for user

šŸ”§ Production Enhancements

  • • Resume from last chunk on failure
  • • Adaptive chunk size based on network
  • • Priority queue for small files
  • • Background upload service

šŸ—„ļø Database Technology Comparison

AspectSQL (PostgreSQL)NoSQL (DynamoDB)Hybrid Approach
Metadata StorageACID transactions, complex queriesFast single-item operationsSQL for metadata, NoSQL for chunks
ScalabilityVertical scaling, read replicasHorizontal scaling, auto-shardingBest of both worlds
ConsistencyStrong consistencyEventual consistencyStrong for metadata, eventual for content
Query FlexibilityComplex joins and aggregationsKey-value access patternsComplex queries where needed
Use CaseFile relationships, permissionsChunk locations, high-volume opsProduction-ready compromise

āœ… Recommended: Hybrid Approach

Use PostgreSQL for file metadata, user management, and permissions (requires ACID). Use DynamoDB/Cassandra for chunk locations and high-volume operations (requires scale). This gives us strong consistency where it matters and scale where we need it.

šŸŽÆ Key Design Decisions

āœ… File Chunking

Client app automatically splits files into 4MB chunks for parallel transfer, deduplication, and resumable uploads. Transparent to users.

āœ… Separate Metadata

Store file metadata separately from content for fast queries and atomic operations

āœ… Permission Model

Flexible permission system supporting user-based and link-based sharing