Visual Testing Infrastructure: Network Navigation Challenges¶
Problem Statement¶
Our visual testing infrastructure is experiencing consistent timeout and navigation failures when attempting to capture screenshots of documentation pages.
Observed Symptoms¶
- Pages fail to load within the specified timeout (30-45 seconds)
- Intermittent network-related errors
- Inconsistent page capture behavior
Diagnostic Approach¶
Immediate Investigation Priorities¶
- Local Server Diagnostics
- Verify local server (localhost:8000) is running correctly
- Check server response times
-
Validate server configuration
-
Network Environment Analysis
- Examine network interface configurations
- Investigate localhost resolution
-
Check for potential network interference
-
Playwright Configuration
- Review browser context and page navigation settings
- Analyze timeout and wait conditions
- Investigate potential browser-specific issues
Proposed Investigation Workflow¶
Phase 1: Basic Diagnostics¶
- Manually verify all pages load correctly in a web browser
- Check local server startup process
- Validate mkdocs build process
- Run network diagnostics script
Phase 2: Detailed Network Analysis¶
- Implement comprehensive network logging
- Capture detailed request/response metadata
- Analyze network request patterns
- Identify potential bottlenecks
Phase 3: Browser and Playwright Configuration¶
- Experiment with different wait conditions
- Adjust timeout strategies
- Test with different browser types (Chromium, Firefox, WebKit)
- Implement more granular error handling
Timeout Strategy Investigation¶
Hypothesis Development¶
- Minimal Timeout Hypothesis
- Objective: Understand page load characteristics under strict time constraints
- Proposed Timeout: 3 seconds
- Key Questions:
- Which pages load within 3 seconds?
- What are the common characteristics of quickly loading pages?
- Are there systemic bottlenecks in page generation?
Diagnostic Information Collection¶
Page Load Metrics¶
- Capture page load times
- Record successful vs. failed page captures
- Categorize pages by load time complexity
Detailed Metrics to Track¶
Page Path | Load Time | Status | Network Requests | Error Details
---------------------------------------------------------------------------
/index.html | 0.5s | Success | 12 requests | None
/tech/arch.html | 3.2s | Timeout | 45 requests | Network idle not reached
Experimental Parameters¶
- Base Timeout: 3 seconds
- Wait Condition: 'networkidle'
- Capture Strategy:
- Sequential captures
- Detailed logging
- Minimal retry mechanism
Investigative Workflow¶
Phase 1: Baseline Measurement¶
- Run visual tests with 3-second timeout
- Collect comprehensive logs
- Analyze initial results
- Identify percentage of pages loading
- Categorize load time distributions
Phase 2: Comparative Analysis¶
- Compare results with manual browser loading
- Investigate discrepancies between automated and manual loading
- Identify potential environmental factors
Potential Investigation Vectors¶
Network Layer¶
- Verify local server configuration
- Check network interface performance
- Analyze DNS resolution
- Investigate potential network bottlenecks
Page Generation¶
- Examine mkdocs build process
- Check for complex page generation logic
- Identify pages with heavy dynamic content
Browser Automation¶
- Test with different wait conditions
- Experiment with alternative navigation strategies
- Compare performance across browser types
Logging and Tracing Strategy¶
Enhanced Logging Configuration¶
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('visual_test_diagnostics.log'),
logging.StreamHandler()
]
)
Expected Outcomes¶
- Comprehensive understanding of page load characteristics
- Identification of systemic loading issues
- Data-driven approach to timeout and loading strategies
Risks and Limitations¶
- 3-second timeout is extremely aggressive
- May not capture full page loading complexity
- Potential for false negative results
Mitigation Strategies¶
- Implement flexible timeout mechanisms
- Develop adaptive loading strategies
- Create fallback capture methods
Decision Tree¶
Is page loading within 3s?
│
├── Yes → Capture Screenshot
│
└── No → Investigate
│
├── Network Issues
│ ├── DNS Resolution
│ ├── Server Performance
│ └── Network Latency
│
└── Page Generation
├── Complex Dynamic Content
├── Large Asset Dependencies
└── Mkdocs Generation Overhead
Documentation Tracking¶
- Investigation Start: 2024-12-22
- Initial Timeout: 3 seconds
- Primary Goal: Understand page load characteristics
MkDocs Site Generation and Path Mapping¶
Source to Build Path Transformation¶
MkDocs converts Markdown source files to HTML using a specific path transformation:
Path Mapping Examples¶
Source Path (Markdown) | Generated HTML Path |
---|---|
docs/index.md |
site/index.html |
docs/notebook/index.md |
site/notebook/index.html |
docs/notebook/features/bulk-upload/index.md |
site/notebook/features/bulk-upload/index.html |
docs/tech/architecture.md |
site/tech/architecture/index.html |
Key Transformation Rules¶
.md
files are converted toindex.html
- Subdirectory structure is preserved
- Root
docs/
directory maps tosite/
- Markdown filenames become
index.html
Capture Strategy Considerations¶
- Always use
index.html
for directory paths - Preserve full subdirectory structure
- Handle potential nested directory scenarios
Path Normalization Example¶
def normalize_page_path(page_path):
"""
Convert source Markdown path to generated HTML path
Args:
page_path (str): Original Markdown file path
Returns:
str: Corresponding HTML file path in site directory
"""
# Remove .md extension
html_path = page_path.replace('.md', '')
# Ensure index.html for directories
if not html_path.endswith('/index'):
html_path += '/index'
return f"{html_path}.html"
Potential Capture Paths¶
notebook/features/bulk-upload/index.md
→site/notebook/features/bulk-upload/index.html
tech/architecture.md
→site/tech/architecture/index.html
Debugging Path Issues¶
- Verify source Markdown file existence
- Check MkDocs build configuration
- Validate site directory structure
- Use logging to track path transformations
MkDocs Path Mapping Guide
MkDocs Site Directory Structure¶
Directory Overview¶
The site/
directory is generated by MkDocs and contains the fully built static documentation site. Its structure mirrors the source documentation layout:
site/
├── index.html # Root landing page
├── 404.html # Custom 404 error page
├── assets/ # Static assets (CSS, JS)
├── notebook/ # Notebook section
│ ├── index.html
│ ├── ai-workspace/ # Specific workspace documentation
│ ├── features/ # Feature documentation
│ └── process/ # Process documentation
├── tech/ # Technical documentation section
├── product/ # Product-related pages
├── we-and-ai/ # AI-related documentation
├── search/ # Search index and functionality
└── stylesheets/ # Custom styling
Capture Strategy Considerations¶
- Each HTML file represents a rendered documentation page
- Subdirectories reflect the documentation's logical structure
- Static site generation ensures consistent rendering
Filename Mapping¶
When capturing screenshots, we'll use a consistent mapping:
- /notebook/ai-workspace/index.html
→ notebook_ai-workspace_index.png
- /tech/architecture.html
→ tech_architecture.png
This approach ensures: - Unique filename generation - Easy traceability - Filesystem-friendly naming
Parallel Site Directory Screenshot Capture Strategy¶
Overview¶
The parallel site directory screenshot capture is a robust method for generating visual tests across a complex documentation site structure. It addresses the challenges of capturing screenshots from a multi-level, statically generated documentation site.
Key Design Principles¶
- Parallel Processing: Utilize asynchronous processing to capture screenshots concurrently
- Semaphore-based Concurrency: Limit simultaneous browser instances to prevent resource exhaustion
- Preserve Directory Structure: Maintain the original site directory hierarchy in screenshot storage
- Comprehensive Coverage: Capture screenshots for all HTML files recursively
Implementation Strategy¶
Directory Traversal¶
- Start from the
site/
root directory - Recursively walk through all subdirectories
- Identify and process
.html
files - Skip non-HTML files and specific directories (e.g.,
assets/
)
Capture Process¶
- File Discovery
- Use
os.walk()
to traverse directory tree - Filter for
.html
files -
Preserve relative path information
-
Screenshot Generation
- Use Playwright to render each HTML file
- Capture full-page screenshots
-
Generate unique filenames based on original path
-
Concurrency Management
- Implement asyncio Semaphore to control:
- Maximum concurrent browser instances
- Prevent system resource overload
- Ensure predictable performance
Potential Challenges and Mitigations¶
- Large Directory Structures: Use configurable concurrency limits
- Rendering Variations: Standardize viewport and browser settings
- Performance Overhead: Implement intelligent filtering and caching mechanisms
Pseudocode Concept¶
async def capture_site_screenshots(site_dir, max_concurrent=5):
screenshots = []
semaphore = asyncio.Semaphore(max_concurrent)
async def process_html_file(file_path):
async with semaphore:
# Capture screenshot logic
pass
for root, _, files in os.walk(site_dir):
html_files = [f for f in files if f.endswith('.html')]
tasks = [process_html_file(os.path.join(root, file)) for file in html_files]
await asyncio.gather(*tasks)
Recommended Configuration¶
VisualTester(
site_directory='site/',
max_concurrent_captures=10,
viewport_size=(1280, 720),
output_directory='.cascade/visual-test'
)
Performance Considerations¶
- Adjust
max_concurrent_captures
based on system capabilities - Monitor memory and CPU usage during large site captures
- Implement logging and metrics collection
Future Enhancements¶
- Intelligent file filtering
- Incremental screenshot updates
- Differential visual testing
Workaround Strategies¶
Timeout Modification¶
# Increased timeout with more detailed error handling
await page.goto(url,
wait_until='networkidle',
timeout=3000 # 3 seconds
)
Sequential Page Capture¶
- Implement a sequential capture strategy instead of concurrent
- Reduce parallel processing to minimize network contention
Retry Mechanism¶
async def capture_with_retry(page_path, max_retries=3):
for attempt in range(max_retries):
try:
# Capture logic
return result
except Exception as e:
if attempt == max_retries - 1:
raise
await asyncio.sleep(2 ** attempt) # Exponential backoff
Comprehensive Workaround Strategy¶
Workaround: Static HTML Snapshot Capture¶
Core Workaround Strategy¶
When network-based page loading fails, we'll implement a filesystem-driven screenshot approach that bypasses traditional network rendering.
Proposed Implementation¶
async def capture_page_snapshot(page_path):
"""
Capture page screenshot using alternative methods
Workaround Strategies:
- Direct file reading
- Static HTML rendering
- Fallback screenshot mechanisms
"""
try:
# Strategy 1: Direct HTML File Reading
html_path = os.path.join('site', page_path)
if os.path.exists(html_path):
# Use a headless browser to render static HTML
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
context = await browser.new_context()
page = await context.new_page()
# Load file directly from filesystem
await page.goto(f'file://{os.path.abspath(html_path)}')
# Immediate screenshot, bypass network loading
screenshot_path = f'.cascade/visual-test/{page_path.replace("/", "_")}.png'
await page.screenshot(path=screenshot_path)
await browser.close()
return screenshot_path
# Strategy 2: Fallback Rendering
console.print(f"⚠️ Could not capture {page_path}")
return None
except Exception as e:
console.print(f"❌ Snapshot capture failed: {e}")
return None
Workaround Objectives¶
- Bypass Network Constraints
- Render pages directly from filesystem
- Eliminate network-related loading issues
-
Ensure consistent page capture
-
Robust Error Handling
- Graceful failure modes
- Comprehensive error logging
- Minimal performance overhead
Implementation Advantages¶
- Works with static site generators
- Independent of network conditions
- Consistent rendering environment
- Low resource consumption
Potential Limitations¶
- May not capture dynamically generated content
- Requires pre-built site directory
- Potential rendering differences from live site
Fallback Mechanism Flowchart¶
Start Page Capture
│
├── Filesystem HTML Exists?
│ ├── Yes → Render Static HTML
│ │ └── Take Screenshot
│ │
│ └── No → Log Capture Failure
│ └── Continue to Next Page
Diagnostic Information¶
- Capture Method: Filesystem-based static rendering
- Rendering Engine: Playwright Chromium
- Viewport: Consistent browser context
- Error Handling: Comprehensive logging
Future Improvements¶
- Add multiple rendering engine support
- Implement more sophisticated fallback strategies
- Create comprehensive capture reports
Adaptive Workaround for Documentation Screenshot Challenges
Reasoning and Decision Process¶
Root Cause Analysis¶
- Network Loading Challenges
- Consistent timeout issues during page capture
- Unpredictable page rendering times
-
Complex interaction between mkdocs and browser automation
-
Current Limitations
- Network-dependent screenshot generation
- High variability in page load times
- Lack of consistent capture mechanism
Analytical Reasoning¶
- Traditional network-based rendering introduces:
- Unpredictable latency
- Complex dependency chains
-
Non-deterministic loading behaviors
-
Static site generators like mkdocs pre-generate HTML
- HTML files already exist on filesystem
- No need for dynamic network loading
- Consistent, reproducible page structure
Decision Framework¶
Capture Strategy Evaluation
│
├── Network-Based Rendering
│ └── ❌ Unreliable
│
└── Filesystem-Based Rendering
├── ✅ Consistent
├── ✅ Predictable
└── ✅ Low Overhead
Final Decision¶
Implement Filesystem-Based Static HTML Snapshot Capture
Rationale: - Bypasses network loading complexities - Leverages pre-generated HTML - Provides consistent, reproducible screenshots - Minimal performance overhead - Works independently of server state
Action Items¶
Immediate Implementation¶
-
CreateDirectory already exists.cascade/visual-test
directory for screenshots - Implement
capture_page_snapshot()
function - Use Playwright for rendering
- Read directly from
site/
directory - Generate consistent screenshots
Configuration Tasks¶
-
Update
pyproject.toml
with new dependencies -
Install Playwright browsers
Testing Strategy¶
- Validate screenshot generation for 10 sample pages
- Compare filesystem rendering with live site
- Create comprehensive error logging
- Implement fallback mechanisms
Monitoring and Improvement¶
- Track screenshot success rate
- Log any rendering discrepancies
- Develop alternative rendering strategies
Documentation¶
- Update visual testing documentation
- Create runbook for screenshot generation
- Document known limitations
Timeline¶
- Start: Immediate
- Initial Implementation: Within 1 day
- First Review: 3 days after implementation
Strategic Pivot: From Network Dependency to Filesystem Rendering
Debugging Checklist¶
- Verify local server configuration
- Check network interfaces
- Validate mkdocs build process
- Test individual page captures
- Monitor system resources during testing
Potential Root Causes¶
- Network configuration issues
- Local server performance limitations
- Playwright/browser compatibility
- Resource constraints
- Mkdocs site generation complexity
Next Steps¶
- Run comprehensive diagnostics
- Collect detailed logs
- Analyze and categorize failures
- Implement targeted fixes
- Develop robust error recovery mechanism
Recommended Tools¶
netstat
for network connection analysislsof
to check port usage- Browser developer tools
- Playwright trace viewer
Logging Recommendations¶
# Enhanced logging strategy
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
filename='visual_test_diagnostics.log'
)
Conclusion¶
This is an iterative debugging process. Each investigation provides insights to refine our testing infrastructure.
Last Updated: 2024-12-22 Status: Active Investigation
Ongoing Investigation - Adaptive Strategies Required