Skip to content

Visual Testing Infrastructure: Network Navigation Challenges

Problem Statement

Our visual testing infrastructure is experiencing consistent timeout and navigation failures when attempting to capture screenshots of documentation pages.

Observed Symptoms

  • Pages fail to load within the specified timeout (30-45 seconds)
  • Intermittent network-related errors
  • Inconsistent page capture behavior

Diagnostic Approach

Immediate Investigation Priorities

  1. Local Server Diagnostics
  2. Verify local server (localhost:8000) is running correctly
  3. Check server response times
  4. Validate server configuration

  5. Network Environment Analysis

  6. Examine network interface configurations
  7. Investigate localhost resolution
  8. Check for potential network interference

  9. Playwright Configuration

  10. Review browser context and page navigation settings
  11. Analyze timeout and wait conditions
  12. Investigate potential browser-specific issues

Proposed Investigation Workflow

Phase 1: Basic Diagnostics

  • Manually verify all pages load correctly in a web browser
  • Check local server startup process
  • Validate mkdocs build process
  • Run network diagnostics script

Phase 2: Detailed Network Analysis

  • Implement comprehensive network logging
  • Capture detailed request/response metadata
  • Analyze network request patterns
  • Identify potential bottlenecks

Phase 3: Browser and Playwright Configuration

  • Experiment with different wait conditions
  • Adjust timeout strategies
  • Test with different browser types (Chromium, Firefox, WebKit)
  • Implement more granular error handling

Timeout Strategy Investigation

Hypothesis Development

  1. Minimal Timeout Hypothesis
  2. Objective: Understand page load characteristics under strict time constraints
  3. Proposed Timeout: 3 seconds
  4. Key Questions:
    • Which pages load within 3 seconds?
    • What are the common characteristics of quickly loading pages?
    • Are there systemic bottlenecks in page generation?

Diagnostic Information Collection

Page Load Metrics

  • Capture page load times
  • Record successful vs. failed page captures
  • Categorize pages by load time complexity

Detailed Metrics to Track

Page Path | Load Time | Status | Network Requests | Error Details
---------------------------------------------------------------------------
/index.html | 0.5s | Success | 12 requests | None
/tech/arch.html | 3.2s | Timeout | 45 requests | Network idle not reached

Experimental Parameters

  • Base Timeout: 3 seconds
  • Wait Condition: 'networkidle'
  • Capture Strategy:
  • Sequential captures
  • Detailed logging
  • Minimal retry mechanism

Investigative Workflow

Phase 1: Baseline Measurement

  1. Run visual tests with 3-second timeout
  2. Collect comprehensive logs
  3. Analyze initial results
  4. Identify percentage of pages loading
  5. Categorize load time distributions

Phase 2: Comparative Analysis

  • Compare results with manual browser loading
  • Investigate discrepancies between automated and manual loading
  • Identify potential environmental factors

Potential Investigation Vectors

Network Layer

  • Verify local server configuration
  • Check network interface performance
  • Analyze DNS resolution
  • Investigate potential network bottlenecks

Page Generation

  • Examine mkdocs build process
  • Check for complex page generation logic
  • Identify pages with heavy dynamic content

Browser Automation

  • Test with different wait conditions
  • Experiment with alternative navigation strategies
  • Compare performance across browser types

Logging and Tracing Strategy

Enhanced Logging Configuration

logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('visual_test_diagnostics.log'),
        logging.StreamHandler()
    ]
)

Expected Outcomes

  1. Comprehensive understanding of page load characteristics
  2. Identification of systemic loading issues
  3. Data-driven approach to timeout and loading strategies

Risks and Limitations

  • 3-second timeout is extremely aggressive
  • May not capture full page loading complexity
  • Potential for false negative results

Mitigation Strategies

  • Implement flexible timeout mechanisms
  • Develop adaptive loading strategies
  • Create fallback capture methods

Decision Tree

Is page loading within 3s?
├── Yes → Capture Screenshot
└── No  → Investigate
    ├── Network Issues
    │   ├── DNS Resolution
    │   ├── Server Performance
    │   └── Network Latency
    └── Page Generation
        ├── Complex Dynamic Content
        ├── Large Asset Dependencies
        └── Mkdocs Generation Overhead

Documentation Tracking

  • Investigation Start: 2024-12-22
  • Initial Timeout: 3 seconds
  • Primary Goal: Understand page load characteristics

MkDocs Site Generation and Path Mapping

Source to Build Path Transformation

MkDocs converts Markdown source files to HTML using a specific path transformation:

Path Mapping Examples

Source Path (Markdown) Generated HTML Path
docs/index.md site/index.html
docs/notebook/index.md site/notebook/index.html
docs/notebook/features/bulk-upload/index.md site/notebook/features/bulk-upload/index.html
docs/tech/architecture.md site/tech/architecture/index.html

Key Transformation Rules

  1. .md files are converted to index.html
  2. Subdirectory structure is preserved
  3. Root docs/ directory maps to site/
  4. Markdown filenames become index.html

Capture Strategy Considerations

  • Always use index.html for directory paths
  • Preserve full subdirectory structure
  • Handle potential nested directory scenarios

Path Normalization Example

def normalize_page_path(page_path):
    """
    Convert source Markdown path to generated HTML path

    Args:
        page_path (str): Original Markdown file path

    Returns:
        str: Corresponding HTML file path in site directory
    """
    # Remove .md extension
    html_path = page_path.replace('.md', '')

    # Ensure index.html for directories
    if not html_path.endswith('/index'):
        html_path += '/index'

    return f"{html_path}.html"

Potential Capture Paths

  • notebook/features/bulk-upload/index.mdsite/notebook/features/bulk-upload/index.html
  • tech/architecture.mdsite/tech/architecture/index.html

Debugging Path Issues

  1. Verify source Markdown file existence
  2. Check MkDocs build configuration
  3. Validate site directory structure
  4. Use logging to track path transformations

MkDocs Path Mapping Guide

MkDocs Site Directory Structure

Directory Overview

The site/ directory is generated by MkDocs and contains the fully built static documentation site. Its structure mirrors the source documentation layout:

site/
├── index.html                  # Root landing page
├── 404.html                    # Custom 404 error page
├── assets/                     # Static assets (CSS, JS)
├── notebook/                   # Notebook section
│   ├── index.html
│   ├── ai-workspace/           # Specific workspace documentation
│   ├── features/               # Feature documentation
│   └── process/                # Process documentation
├── tech/                       # Technical documentation section
├── product/                    # Product-related pages
├── we-and-ai/                  # AI-related documentation
├── search/                     # Search index and functionality
└── stylesheets/                # Custom styling

Capture Strategy Considerations

  • Each HTML file represents a rendered documentation page
  • Subdirectories reflect the documentation's logical structure
  • Static site generation ensures consistent rendering

Filename Mapping

When capturing screenshots, we'll use a consistent mapping: - /notebook/ai-workspace/index.htmlnotebook_ai-workspace_index.png - /tech/architecture.htmltech_architecture.png

This approach ensures: - Unique filename generation - Easy traceability - Filesystem-friendly naming

Parallel Site Directory Screenshot Capture Strategy

Overview

The parallel site directory screenshot capture is a robust method for generating visual tests across a complex documentation site structure. It addresses the challenges of capturing screenshots from a multi-level, statically generated documentation site.

Key Design Principles

  1. Parallel Processing: Utilize asynchronous processing to capture screenshots concurrently
  2. Semaphore-based Concurrency: Limit simultaneous browser instances to prevent resource exhaustion
  3. Preserve Directory Structure: Maintain the original site directory hierarchy in screenshot storage
  4. Comprehensive Coverage: Capture screenshots for all HTML files recursively

Implementation Strategy

Directory Traversal

  • Start from the site/ root directory
  • Recursively walk through all subdirectories
  • Identify and process .html files
  • Skip non-HTML files and specific directories (e.g., assets/)

Capture Process

  1. File Discovery
  2. Use os.walk() to traverse directory tree
  3. Filter for .html files
  4. Preserve relative path information

  5. Screenshot Generation

  6. Use Playwright to render each HTML file
  7. Capture full-page screenshots
  8. Generate unique filenames based on original path

  9. Concurrency Management

  10. Implement asyncio Semaphore to control:
    • Maximum concurrent browser instances
    • Prevent system resource overload
    • Ensure predictable performance

Potential Challenges and Mitigations

  • Large Directory Structures: Use configurable concurrency limits
  • Rendering Variations: Standardize viewport and browser settings
  • Performance Overhead: Implement intelligent filtering and caching mechanisms

Pseudocode Concept

async def capture_site_screenshots(site_dir, max_concurrent=5):
    screenshots = []
    semaphore = asyncio.Semaphore(max_concurrent)

    async def process_html_file(file_path):
        async with semaphore:
            # Capture screenshot logic
            pass

    for root, _, files in os.walk(site_dir):
        html_files = [f for f in files if f.endswith('.html')]
        tasks = [process_html_file(os.path.join(root, file)) for file in html_files]
        await asyncio.gather(*tasks)
VisualTester(
    site_directory='site/',
    max_concurrent_captures=10,
    viewport_size=(1280, 720),
    output_directory='.cascade/visual-test'
)

Performance Considerations

  • Adjust max_concurrent_captures based on system capabilities
  • Monitor memory and CPU usage during large site captures
  • Implement logging and metrics collection

Future Enhancements

  • Intelligent file filtering
  • Incremental screenshot updates
  • Differential visual testing

Workaround Strategies

Timeout Modification

# Increased timeout with more detailed error handling
await page.goto(url, 
    wait_until='networkidle', 
    timeout=3000  # 3 seconds
)

Sequential Page Capture

  • Implement a sequential capture strategy instead of concurrent
  • Reduce parallel processing to minimize network contention

Retry Mechanism

async def capture_with_retry(page_path, max_retries=3):
    for attempt in range(max_retries):
        try:
            # Capture logic
            return result
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            await asyncio.sleep(2 ** attempt)  # Exponential backoff

Comprehensive Workaround Strategy

Workaround: Static HTML Snapshot Capture

Core Workaround Strategy

When network-based page loading fails, we'll implement a filesystem-driven screenshot approach that bypasses traditional network rendering.

Proposed Implementation

async def capture_page_snapshot(page_path):
    """
    Capture page screenshot using alternative methods

    Workaround Strategies:
    - Direct file reading
    - Static HTML rendering
    - Fallback screenshot mechanisms
    """
    try:
        # Strategy 1: Direct HTML File Reading
        html_path = os.path.join('site', page_path)
        if os.path.exists(html_path):
            # Use a headless browser to render static HTML
            async with async_playwright() as p:
                browser = await p.chromium.launch(headless=True)
                context = await browser.new_context()
                page = await context.new_page()

                # Load file directly from filesystem
                await page.goto(f'file://{os.path.abspath(html_path)}')

                # Immediate screenshot, bypass network loading
                screenshot_path = f'.cascade/visual-test/{page_path.replace("/", "_")}.png'
                await page.screenshot(path=screenshot_path)

                await browser.close()
                return screenshot_path

        # Strategy 2: Fallback Rendering
        console.print(f"⚠️ Could not capture {page_path}")
        return None

    except Exception as e:
        console.print(f"❌ Snapshot capture failed: {e}")
        return None

Workaround Objectives

  1. Bypass Network Constraints
  2. Render pages directly from filesystem
  3. Eliminate network-related loading issues
  4. Ensure consistent page capture

  5. Robust Error Handling

  6. Graceful failure modes
  7. Comprehensive error logging
  8. Minimal performance overhead

Implementation Advantages

  • Works with static site generators
  • Independent of network conditions
  • Consistent rendering environment
  • Low resource consumption

Potential Limitations

  • May not capture dynamically generated content
  • Requires pre-built site directory
  • Potential rendering differences from live site

Fallback Mechanism Flowchart

Start Page Capture
├── Filesystem HTML Exists?
│   ├── Yes → Render Static HTML
│   │         └── Take Screenshot
│   │
│   └── No  → Log Capture Failure
│             └── Continue to Next Page

Diagnostic Information

  • Capture Method: Filesystem-based static rendering
  • Rendering Engine: Playwright Chromium
  • Viewport: Consistent browser context
  • Error Handling: Comprehensive logging

Future Improvements

  • Add multiple rendering engine support
  • Implement more sophisticated fallback strategies
  • Create comprehensive capture reports

Adaptive Workaround for Documentation Screenshot Challenges

Reasoning and Decision Process

Root Cause Analysis

  1. Network Loading Challenges
  2. Consistent timeout issues during page capture
  3. Unpredictable page rendering times
  4. Complex interaction between mkdocs and browser automation

  5. Current Limitations

  6. Network-dependent screenshot generation
  7. High variability in page load times
  8. Lack of consistent capture mechanism

Analytical Reasoning

  • Traditional network-based rendering introduces:
  • Unpredictable latency
  • Complex dependency chains
  • Non-deterministic loading behaviors

  • Static site generators like mkdocs pre-generate HTML

  • HTML files already exist on filesystem
  • No need for dynamic network loading
  • Consistent, reproducible page structure

Decision Framework

Capture Strategy Evaluation
├── Network-Based Rendering
│   └── ❌ Unreliable
└── Filesystem-Based Rendering
    ├── ✅ Consistent
    ├── ✅ Predictable
    └── ✅ Low Overhead

Final Decision

Implement Filesystem-Based Static HTML Snapshot Capture

Rationale: - Bypasses network loading complexities - Leverages pre-generated HTML - Provides consistent, reproducible screenshots - Minimal performance overhead - Works independently of server state

Action Items

Immediate Implementation

  1. Create .cascade/visual-test directory for screenshots Directory already exists
  2. Implement capture_page_snapshot() function
  3. Use Playwright for rendering
  4. Read directly from site/ directory
  5. Generate consistent screenshots

Configuration Tasks

  • Update pyproject.toml with new dependencies

    [tool.poetry.dependencies]
    playwright = "^1.x.x"
    

  • Install Playwright browsers

    playwright install chromium
    

Testing Strategy

  1. Validate screenshot generation for 10 sample pages
  2. Compare filesystem rendering with live site
  3. Create comprehensive error logging
  4. Implement fallback mechanisms

Monitoring and Improvement

  • Track screenshot success rate
  • Log any rendering discrepancies
  • Develop alternative rendering strategies

Documentation

  • Update visual testing documentation
  • Create runbook for screenshot generation
  • Document known limitations

Timeline

  • Start: Immediate
  • Initial Implementation: Within 1 day
  • First Review: 3 days after implementation

Strategic Pivot: From Network Dependency to Filesystem Rendering

Debugging Checklist

  • Verify local server configuration
  • Check network interfaces
  • Validate mkdocs build process
  • Test individual page captures
  • Monitor system resources during testing

Potential Root Causes

  1. Network configuration issues
  2. Local server performance limitations
  3. Playwright/browser compatibility
  4. Resource constraints
  5. Mkdocs site generation complexity

Next Steps

  1. Run comprehensive diagnostics
  2. Collect detailed logs
  3. Analyze and categorize failures
  4. Implement targeted fixes
  5. Develop robust error recovery mechanism
  • netstat for network connection analysis
  • lsof to check port usage
  • Browser developer tools
  • Playwright trace viewer

Logging Recommendations

# Enhanced logging strategy
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    filename='visual_test_diagnostics.log'
)

Conclusion

This is an iterative debugging process. Each investigation provides insights to refine our testing infrastructure.


Last Updated: 2024-12-22 Status: Active Investigation


Ongoing Investigation - Adaptive Strategies Required