Skip to content

Antar Bulk CSV Upload - Technical Implementation Specification

๐Ÿ—๏ธ Project Initialization & Structure

Project Bootstrap Commands

# Create project directory
mkdir -p backend/route-optimizer/cmd/api
mkdir -p backend/route-optimizer/internal/handlers
mkdir -p backend/route-optimizer/internal/models
mkdir -p backend/route-optimizer/internal/services
mkdir -p backend/route-optimizer/pkg/validation
mkdir -p backend/route-optimizer/scripts

# Initialize Go module
cd backend/route-optimizer
go mod init github.com/dolpheyn/antar/route-optimizer

Project Structure

backend/route-optimizer/
โ”‚
โ”œโ”€โ”€ cmd/
โ”‚   โ””โ”€โ”€ api/
โ”‚       โ””โ”€โ”€ main.go          # Application entry point
โ”‚
โ”œโ”€โ”€ internal/
โ”‚   โ”œโ”€โ”€ handlers/
โ”‚   โ”‚   โ””โ”€โ”€ csv_upload.go    # HTTP request handlers
โ”‚   โ”œโ”€โ”€ models/
โ”‚   โ”‚   โ””โ”€โ”€ delivery.go      # Data models
โ”‚   โ””โ”€โ”€ services/
โ”‚       โ””โ”€โ”€ csv_processor.go # Business logic
โ”‚
โ”œโ”€โ”€ pkg/
โ”‚   โ””โ”€โ”€ validation/
โ”‚       โ””โ”€โ”€ csv_validator.go # Reusable validation logic
โ”‚
โ”œโ”€โ”€ scripts/
โ”‚   โ”œโ”€โ”€ setup.sh             # Development environment setup
โ”‚   โ””โ”€โ”€ deploy.sh            # Deployment scripts
โ”‚
โ”œโ”€โ”€ config.yaml              # Configuration management
โ”œโ”€โ”€ go.mod
โ””โ”€โ”€ go.sum

Dependency Management

# Install core dependencies
go get -u github.com/gin-gonic/gin           # Web framework
go get -u github.com/spf13/viper              # Configuration management
go get -u go.uber.org/zap                     # Structured logging
go get -u github.com/stretchr/testify         # Testing utilities
go get -u github.com/swaggo/swag              # API documentation
go get -u github.com/google/uuid              # UUID generation

Configuration Management

# config.yaml
server:
  port: 8080
  max_upload_size: 5242880  # 5MB in bytes

validation:
  max_entries: 200
  strict_mode: true

logging:
  level: info
  output_paths:
    - stdout
    - /var/log/antar/route-optimizer.log

Justfile for Common Tasks

# Build the application
build:
    go build -o bin/api cmd/api/main.go

# Run the application
run:
    go run cmd/api/main.go

# Run tests
test:
    go test ./... -cover

# Run linter
lint:
    golangci-lint run

# Generate API documentation
generate-docs:
    swag init -g cmd/api/main.go

๐Ÿš€ Technical Architecture Overview

Design Philosophy

  • Zero-Allocation Performance: Minimize garbage collection overhead
  • Streaming Processing: Handle large files without full memory load
  • Robust Error Handling: Comprehensive validation with minimal performance penalty

Technology Stack

  • Language: Go 1.21+ (with generics and performance optimizations)
  • Web Framework: github.com/gin-gonic/gin (fastest HTTP router)
  • CSV Parsing: Custom implementation with encoding/csv
  • Validation: github.com/go-playground/validator/v10
  • Logging: go.uber.org/zap for structured, performant logging

๐Ÿ”ง Core Services & Components

CSV Processing Service

The CSVProcessorService will be responsible for handling the core logic of processing bulk CSV uploads. Key responsibilities include:

  1. Stream Processing
  2. Handle large file uploads without loading entire file into memory
  3. Support chunked processing for memory efficiency
  4. Implement streaming validation and parsing

  5. Validation Strategy

  6. Validate CSV structure and data integrity
  7. Perform type checking and constraint validation
  8. Generate detailed validation error reports

  9. Error Handling

  10. Provide granular error tracking
  11. Support partial success scenarios
  12. Generate comprehensive error logs

Validation Approach

type ValidationError struct {
    Row     int      `json:"row"`
    Field   string   `json:"field"`
    Message string   `json:"message"`
}

func validateDeliveryRecord(record DeliveryRecord) []ValidationError {
    var errors []ValidationError

    // Example validation rules
    if record.Weight <= 0 {
        errors = append(errors, ValidationError{
            Field:   "Weight",
            Message: "Weight must be positive",
        })
    }

    if len(record.DestinationAddress) > 255 {
        errors = append(errors, ValidationError{
            Field:   "DestinationAddress", 
            Message: "Address exceeds maximum length",
        })
    }

    return errors
}

Performance Considerations

  • Use buffered channels for concurrent processing
  • Implement rate limiting to prevent system overload
  • Use efficient memory management techniques

Logging & Monitoring

type ProcessingMetrics struct {
    TotalRecords     int
    ValidRecords     int
    InvalidRecords   int
    ProcessingTime   time.Duration
    MemoryAllocated  uint64
}

func (s *CSVProcessorService) logProcessingMetrics(metrics ProcessingMetrics) {
    logger.Info("CSV Processing Completed",
        zap.Int("total_records", metrics.TotalRecords),
        zap.Int("valid_records", metrics.ValidRecords),
        zap.Duration("processing_time", metrics.ProcessingTime),
    )
}

๐Ÿš€ API Design Implementation

Upload Endpoint

func (h *DeliveryUploadHandler) UploadCSV(c *gin.Context) {
    file, err := c.FormFile("csv_file")
    if err != nil {
        c.JSON(http.StatusBadRequest, gin.H{
            "error": "Invalid file upload",
        })
        return
    }

    records, validationErrors := h.processCSVStream(file)

    if len(validationErrors) > 0 {
        c.JSON(http.StatusUnprocessableEntity, gin.H{
            "errors": validationErrors,
        })
        return
    }

    // Process valid records
    processingResult := h.service.ProcessDeliveries(records)

    c.JSON(http.StatusOK, processingResult)
}

๐Ÿ”’ Security Considerations

  • Implement file size limits
  • Validate file type and extension
  • Use secure file handling techniques
  • Implement rate limiting on upload endpoint

๐Ÿงช Testing Strategy

  1. Unit Tests
  2. Validate individual component logic
  3. Test validation rules
  4. Mock external dependencies

  5. Integration Tests

  6. Test end-to-end CSV processing
  7. Verify error handling
  8. Performance benchmarking

  9. Load Testing

  10. Simulate bulk upload scenarios
  11. Test system resilience under high load

๐Ÿ“Š Monitoring & Observability

  • Implement Prometheus metrics
  • Create Grafana dashboards
  • Use distributed tracing
  • Set up centralized logging

๐Ÿ”ฎ Future Improvements

  • Support multiple file formats
  • Implement more advanced validation
  • Add machine learning-based data cleaning
  • Create real-time processing dashboard

๐Ÿ” Detailed Component Design

1. CSV Upload Handler

type DeliveryUploadHandler struct {
    validator *validator.Validate
    logger    *zap.Logger
    storage   FileStorageService
}

type DeliveryRecord struct {
    DeliveryID       string  `validate:"required,alphanum"`
    PickupLat        float64 `validate:"required,latitude"`
    PickupLon        float64 `validate:"required,longitude"`
    DropoffLat       float64 `validate:"required,latitude"`
    DropoffLon       float64 `validate:"required,longitude"`
}

func (h *DeliveryUploadHandler) HandleBulkUploadCSV(c *fiber.Ctx) error {
    // High-performance file handling
    file, err := c.FormFile("csv")
    if err != nil {
        return handleUploadError(err)
    }

    // Stream-based processing
    records, validationErrors := h.processCSVStream(file)

    return c.JSON(UploadResponse{
        UploadID:        generateUniqueID(),
        TotalEntries:    len(records),
        ProcessedEntries: len(records) - len(validationErrors),
        ValidationErrors: validationErrors,
    })
}

2. CSV Streaming Processor

func (h *DeliveryUploadHandler) processCSVStream(file *multipart.FileHeader) ([]DeliveryRecord, []ValidationError) {
    src, _ := file.Open()
    defer src.Close()

    reader := csv.NewReader(src)
    reader.FieldsPerRecord = 5  // Strict column count

    var records []DeliveryRecord
    var validationErrors []ValidationError

    // Zero-allocation CSV parsing
    for {
        record, err := reader.Read()
        if err == io.EOF {
            break
        }
        if err != nil {
            // Handle parsing errors
            continue
        }

        deliveryRecord := parseRecord(record)
        if err := h.validator.Struct(deliveryRecord); err != nil {
            validationErrors = append(validationErrors, mapValidationError(err))
            continue
        }

        records = append(records, deliveryRecord)
    }

    return records, validationErrors
}

3. Performance Optimization Techniques

  • Zero-Copy Parsing: Minimize memory allocations
  • Streaming Processing: Process files without loading entirely into memory
  • Concurrent Validation: Use goroutines for parallel record validation
  • Preallocated Slices: Reduce memory fragmentation

4. Error Handling Strategy

type ValidationError struct {
    Row     int    `json:"row"`
    Column  string `json:"column"`
    Message string `json:"error"`
}

func mapValidationError(err error) ValidationError {
    // Convert validator errors to structured format
    // Provides clear, actionable feedback
}

๐Ÿ›ก๏ธ Security Considerations

  • File Size Limit: 5MB hard cap
  • Sanitization: Strict type conversion
  • Temporary Storage: Secure, ephemeral file handling
  • Rate Limiting: Implemented via middleware

๐Ÿงช Testing Strategy

  • Unit Tests: Validate individual component behaviors
  • Integration Tests: End-to-end upload scenarios
  • Benchmark Tests: Ensure performance under load
  • Chaos Testing: Validate error handling

Performance Benchmarks

func BenchmarkCSVUpload(b *testing.B) {
    // Simulate various file sizes and complexities
    // Measure:
    // - Processing time
    // - Memory allocations
    // - CPU usage
}

๐Ÿšง Potential Improvements

  • Machine learning-based anomaly detection
  • Advanced geospatial validation
  • Support for more complex CSV structures
  • Real-time validation metrics

๐Ÿ’ก Interesting Tech Choices

  • Using fiber for its raw performance
  • Custom validation over generic libraries
  • Stream-based processing philosophy

๐Ÿ”ฌ Monitoring & Observability

  • Prometheus metrics endpoint
  • Distributed tracing support
  • Detailed performance logging

๐Ÿ“Š Expected Performance Characteristics

  • Latency: < 50ms for 200-entry CSV
  • Memory Usage: < 10MB per request
  • CPU Overhead: Minimal, constant-time parsing

golang #systemdesign #performance #backend