Testing Guide for NetIntel-OCR v0.1.18.1¶

Overview¶

NetIntel-OCR v0.1.18.1 introduces a comprehensive testing framework designed to ensure reliability, performance, and quality across all components. This guide covers testing strategies, running tests, and interpreting results.

New in v0.1.18.1

Complete test framework with Docker Compose environment, quality metrics, and CI/CD integration!

Test Framework Architecture¶

Test Categories¶

Category	Purpose	Coverage Goal	Typical Duration
Unit Tests	Individual functions/methods	85%+	< 5 minutes
Integration Tests	Component interactions	70%+	10-15 minutes
System Tests	End-to-end workflows	60%+	15-20 minutes
Performance Tests	Speed and resource usage	Critical paths	10-15 minutes
Regression Tests	Previous bug fixes	All fixed issues	5-10 minutes

Quality Metrics¶

Target Metrics:
  Line Coverage: ≥ 85%
  Branch Coverage: ≥ 70%
  Cyclomatic Complexity: ≤ 10
  Maintainability Index: ≥ 20
  Security Score: ≥ 85/100
  Performance:
    PDF Processing: < 30s median
    API Latency: < 100ms p95
    KG Queries: < 500ms median
    Vector Search: < 200ms median

Quick Start¶

Running All Tests¶

# Run complete test suite
./run-tests.sh

# Run with coverage report
./run-tests.sh --coverage

# Run specific category
./run-tests.sh --category unit
./run-tests.sh --category integration
./run-tests.sh --category performance

# Run in Docker environment
docker-compose -f tests/docker-compose.test.yml up --abort-on-container-exit

Running Specific Tests¶

# Unit tests only
pytest tests/unit/ -v

# Integration tests with real PDFs
pytest tests/integration/ -v --pdf-fixtures

# Performance benchmarks
pytest tests/performance/ -v --benchmark-only

# API v2 tests
pytest tests/api/v2/ -v

# Multi-model tests (NEW in v0.1.18.1)
pytest tests/integration/test_multimodel.py -v

Testing Multi-Model Features¶

Test Multi-Model Processing¶

# tests/integration/test_multimodel.py
import pytest
from netintel_ocr.hybrid_processor import process_pdf_hybrid

def test_multimodel_processing():
    """Test multi-model configuration"""
    result = process_pdf_hybrid(
        pdf_path="tests/fixtures/cisco-sdwan.pdf",
        output_dir="/tmp/test",
        model="nanonets-ocr-s",
        network_model="qwen2.5vl",
        flow_model="custom-flow",
        pages="1-5"
    )

    assert result.text_extracted
    assert result.diagrams_found
    assert result.tables_extracted

Test API v2 Feature Parity¶

# tests/api/v2/test_feature_parity.py
import pytest
from netintel_ocr.api.client import APIClient

def test_api_feature_parity():
    """Verify all CLI options work in API"""
    client = APIClient(base_url="http://localhost:8000")

    # Test all 30+ options
    result = client.process_document(
        file_path="test.pdf",
        model="nanonets-ocr-s",
        network_model="qwen2.5vl",
        confidence=0.8,
        fast_extraction=True,
        table_method="hybrid",
        vector_format="milvus",  # Default
        chunk_strategy="semantic",
        with_kg=True
    )

    assert result['status'] == 'completed'
    assert 'embeddings_count' in result

Docker Compose Test Environment¶

Starting Test Services¶

# Start all test services
docker-compose -f tests/docker-compose.test.yml up -d

# Services included:
# - Ollama (port 11434)
# - Milvus (port 19530)
# - FalkorDB (port 6379)
# - MinIO (port 9000)
# - Test API server (port 8000)

Running Tests in Docker¶

# tests/docker-compose.test.yml
version: '3.8'
services:
  test-runner:
    build:
      context: ..
      dockerfile: tests/Dockerfile
    environment:
      - OLLAMA_HOST=ollama:11434
      - MILVUS_HOST=milvus
      - FALKORDB_HOST=falkordb
    volumes:
      - ../tests:/tests
      - test-results:/results
    command: pytest -v --cov --html=/results/report.html

Performance Testing¶

Benchmark Suite¶

# Run performance benchmarks
python -m pytest tests/performance/ --benchmark-only

# Generate performance report
python tests/scripts/generate-performance-report.py

# Expected results:
# ┌─────────────────────────┬─────────┬──────┬──────┐
# │ Operation               │ Median  │ P95  │ P99  │
# ├─────────────────────────┼─────────┼──────┼──────┤
# │ PDF Processing (10 pg)  │ 12.3s   │ 18s  │ 22s  │
# │ Text Extraction         │ 2.1s    │ 3.2s │ 4.1s │
# │ Diagram Detection       │ 4.5s    │ 6.1s │ 7.8s │
# │ Table Extraction        │ 1.8s    │ 2.4s │ 3.1s │
# │ Vector Generation       │ 0.9s    │ 1.2s │ 1.5s │
# │ KG Entity Extraction    │ 3.2s    │ 4.5s │ 5.8s │
# │ Milvus Insert (1k)      │ 120ms   │ 180ms│ 220ms│
# │ Milvus Search           │ 45ms    │ 68ms │ 95ms │
# └─────────────────────────┴─────────┴──────┴──────┘

Load Testing¶

# tests/performance/test_load.py
import asyncio
import aiohttp
from concurrent.futures import ThreadPoolExecutor

async def test_concurrent_processing():
    """Test system under load"""
    async with aiohttp.ClientSession() as session:
        tasks = []
        for i in range(100):  # 100 concurrent requests
            task = process_document_async(
                session,
                f"test_{i}.pdf"
            )
            tasks.append(task)

        results = await asyncio.gather(*tasks)
        success_rate = sum(1 for r in results if r['status'] == 'success') / 100

        assert success_rate >= 0.95  # 95% success rate
        assert max(r['duration'] for r in results) < 60  # Max 60s

Quality Assurance¶

Code Coverage¶

# Generate coverage report
pytest --cov=netintel_ocr --cov-report=html --cov-report=term

# View coverage report
open htmlcov/index.html

# Expected coverage:
# ├── netintel_ocr/
# │   ├── hybrid_processor.py     92%
# │   ├── api/v2/                 89%
# │   ├── cli_v2/                 87%
# │   ├── mcp/                    85%
# │   └── Overall:                86%

Security Testing¶

# Run security scan
bandit -r netintel_ocr/ -f json -o security-report.json

# Check dependencies
safety check --json > safety-report.json

# SAST analysis
semgrep --config=auto netintel_ocr/

Linting and Type Checking¶

# Run all quality checks
./scripts/quality-check.sh

# Individual tools:
black netintel_ocr/ --check
isort netintel_ocr/ --check
flake8 netintel_ocr/
mypy netintel_ocr/
pylint netintel_ocr/

CI/CD Integration¶

GitHub Actions Workflow¶

# .github/workflows/test.yml
name: Test Suite

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ["3.10", "3.11", "3.12"]

    steps:
    - uses: actions/checkout@v3

    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: ${{ matrix.python-version }}

    - name: Install dependencies
      run: |
        pip install -e ".[test]"

    - name: Run tests
      run: |
        pytest --cov --junitxml=results.xml

    - name: Upload coverage
      uses: codecov/codecov-action@v3

Testing Best Practices¶

1. Test Data Management¶

# Use fixtures for test data
@pytest.fixture
def sample_pdf():
    return "tests/fixtures/cisco-sdwan.pdf"

@pytest.fixture
def mock_ollama_response():
    return {"text": "Sample extracted text"}

2. Mock External Services¶

# Mock Ollama API
@patch('netintel_ocr.ollama.client')
def test_with_mock_ollama(mock_client):
    mock_client.generate.return_value = {
        "response": "Mocked response"
    }
    result = process_document("test.pdf")
    assert result is not None

3. Test Isolation¶

# Use temporary directories
def test_processing(tmp_path):
    output_dir = tmp_path / "output"
    output_dir.mkdir()

    result = process_pdf_hybrid(
        pdf_path="test.pdf",
        output_dir=str(output_dir)
    )

    assert output_dir.exists()

Troubleshooting Tests¶

Common Issues¶

Issue	Solution
Ollama connection failed	Ensure Ollama is running: `docker-compose up ollama`
Milvus timeout	Check Milvus status: `docker-compose ps milvus`
PDF fixtures missing	Download test PDFs: `./scripts/download-fixtures.sh`
Coverage below threshold	Run: `pytest --cov-fail-under=85`
Performance regression	Compare with baseline: `pytest-benchmark compare`

Debug Mode¶

# Run tests with debug output
pytest -vvv --log-cli-level=DEBUG

# Run single test with pdb
pytest -k test_multimodel --pdb

# Profile test performance
pytest --profile --profile-svg

Test Reports¶

Generate Comprehensive Report¶

# Run full test suite with reporting
./tests/scripts/generate-test-report.sh

# Output includes:
# - HTML coverage report
# - JUnit XML results
# - Performance benchmarks
# - Security scan results
# - Quality metrics

View Test Dashboard¶

# Start test dashboard server
python -m http.server 8080 --directory tests/reports/

# Open in browser
open http://localhost:8080

Summary¶

The v0.1.18.1 test framework provides:

✅ Complete Coverage: All components and features tested
✅ Docker Environment: Isolated, reproducible test environment
✅ Quality Gates: Automated checks for coverage, performance, security
✅ CI/CD Ready: GitHub Actions integration
✅ Performance Benchmarks: Track and prevent regressions
✅ Real PDF Testing: Integration tests with actual documents

For more information, see: - API Testing Guide - Performance Guide - Troubleshooting