NetIntel-OCR v0.1.18.1¶

Enterprise Document Intelligence Platform with Complete Feature Parity¶

NetIntel-OCR v0.1.18.1 is a comprehensive enterprise platform that transforms unstructured technical documentation into structured, searchable knowledge. This release achieves 100% feature parity between CLI and API v2, includes a complete test framework, and delivers production-ready document intelligence capabilities for building Semantic Configuration Management Databases (CMDB).

Project Goal¶

Create an intelligent document processing platform that achieves 94% accuracy through sophisticated Hybrid Knowledge GraphRAG capabilities, extracting network architectures, security configurations, and operational workflows from enterprise documentation to populate a Semantic CMDB.

Performance Objectives¶

The system strives to achieve Knowledge GraphRAG accuracy and performance outcomes using sophisticated hybrid capabilities:

Metric	Traditional	NetIntel-OCR Hybrid KG	Improvement
Accuracy	62%	94%	+52%
Query Speed	2.3-8.7s	180-410ms	5-20x faster
Storage Efficiency	20 GB	11.3 GB	43.5% savings
Dependency Analysis	2.3s	180ms	12.8x faster
Compliance Check	5.1s	290ms	17.6x faster
Incident Correlation	8.7s	410ms	21.2x faster

By combining graph traversal, vector similarity, and knowledge graph embeddings, NetIntel-OCR delivers enterprise-grade accuracy with sub-second response times.

Key Use Cases¶

Network Domain¶

Architecture Discovery: Extract network topologies from design documents
Configuration Mapping: Parse firewall rules and routing configurations
Dependency Analysis: Identify component relationships and data flows
Change Impact: Track architecture evolution across document versions

Security Domain¶

Zone Identification: Detect DMZ, trust boundaries, and security zones
Compliance Mapping: Extract security controls and policy implementations
Risk Assessment: Identify exposed services and attack surfaces
Audit Trail: Document security architecture decisions and rationale

🆕 What's New in v0.1.18.1¶

Complete Feature Parity¶

100% CLI-API Parity: All 30+ CLI options now available in API v2
Multi-Model Support: Complete implementation of --model, --network-model, --flow-model
Enhanced Processing: All diagram, table, and vector options fully supported
Milvus Default: Vector operations now default to Milvus instead of LanceDB

Comprehensive Test Framework¶

Docker Compose Environment: Complete containerized testing infrastructure
Test Categories: Unit, integration, system, performance, and regression testing
Quality Metrics: Code coverage, complexity analysis, security scoring
CI/CD Integration: GitHub Actions with matrix testing across Python versions
Real PDF Fixtures: Integration testing with actual technical documents

API v2 Enhancements (from v0.1.18.0)¶

RESTful API: Complete /api/v2 endpoints with full feature support
GraphQL Support: Full schema with queries, mutations, and subscriptions
WebSocket: Real-time updates for document processing and search results
Streaming Upload: Chunked upload for files up to 5GB with resume capability

MCP (Model Context Protocol) Integration¶

15 MCP Tools: Document operations, Milvus management, Knowledge Graph queries
6 Interactive Resources: Document explorer, topology visualizer, KG explorer
5 Contextual Prompts: Analysis, synthesis, troubleshooting, security audit
LLM-Ready: Seamless integration with Claude, ChatGPT, and other AI assistants

Milvus Vector Database¶

Collection Management: Full CRUD operations with dynamic schemas
Advanced Search: Vector, hybrid, and expression-based queries
Result Reranking: Cross-encoder, feature-based, and RRF strategies
Index Optimization: IVF_FLAT, IVF_SQ8, HNSW, and more

Enterprise Features¶

Deduplication: MD5, SimHash, and CDC-based duplicate detection
Performance Monitoring: Real-time metrics and benchmarking
Batch Processing: Parallel processing with checkpoint/resume
Module Management: Dynamic module configuration and optimization
Configuration Templates: Pre-defined profiles for different deployments

Production Ready¶

OAuth2/OIDC: Enterprise authentication with JWT tokens
RBAC System: Full role-based access control
Multi-tier Caching: Memory, Redis, and hybrid caching
Rate Limiting: Multiple strategies for API protection
Health Monitoring: Comprehensive health checks and metrics
Audit Logging: Complete audit trail for compliance
Distributed Tracing: OpenTelemetry support

Core Features¶

🎯 Intelligent Detection¶

Network diagram recognition with 90%+ accuracy
Flow chart and process diagram extraction
Multi-diagram page processing
Context-aware interpretation using surrounding text

🔄 Mermaid Generation¶

Automatic conversion to Mermaid.js syntax
Syntax validation and auto-correction
Support for complex network topologies
Preserves component relationships

🧠 Context Extraction¶

Analyzes diagrams with document context
Identifies critical components and data flows
Provides security analysis and recommendations
Generates architecture summaries

🔍 Vector Search¶

Semantic search across processed documents
Milvus integration for scalable retrieval
Component and relationship queries
Cross-document knowledge linking

🕸️ Knowledge Graph (v0.1.17)¶

Automatic entity and relationship extraction
FalkorDB graph storage with PyKEEN embeddings
8 embedding models (TransE, RotatE, ComplEx, etc.)
Hybrid retrieval combining graph and vector search
Query intent classification for optimal routing

Architecture¶

graph TB
    PDF[PDF Documents] --> OCR[OCR Engine]
    OCR --> Detection[Diagram Detection]
    Detection --> Network[Network Processor]
    Detection --> Flow[Flow Processor]
    Network --> Mermaid[Mermaid Generator]
    Flow --> Mermaid
    Mermaid --> Context[Context Extractor]
    Context --> KG[Knowledge Graph]
    Context --> Vector[Vector Store]
    KG --> FalkorDB[FalkorDB Storage]
    KG --> Embeddings[PyKEEN Embeddings]
    Vector --> CMDB[Semantic CMDB]
    FalkorDB --> CMDB
    Embeddings --> CMDB

What's New in v0.1.17.1¶

🎯 Modular Installation¶

Reduced base size: From 2.5GB to just 500MB
Choose your features: Install only what you need
7 optional modules: kg, vector, api, mcp, performance, dev
Quick install: pip install netintel-ocr[kg] for Knowledge Graph

📊 Enhanced Version Display¶

Complete visibility: See all installed and available modules
Real-time status: Check service connections instantly
Installation hints: Get exact commands for missing features
JSON output: Programmatic access to version info

# Check what's installed and available
netintel-ocr --version

# Output shows:
# ✓ Installed modules with versions
# ✗ Available modules with install commands
# ✓ Active service connections

Getting Started¶

Quick Installation¶

# Choose your installation:

# Option 1: Minimal (500MB) - Core OCR only
pip install netintel-ocr

# Option 2: With Knowledge Graph (2GB) - Recommended
pip install "netintel-ocr[kg]"

# Option 3: Production (2.3GB) - KG + Vector + API
pip install "netintel-ocr[production]"

# Option 4: Everything (2.5GB)
pip install "netintel-ocr[all]"

Processing Documents¶

NetIntel-OCR v0.1.17+ uses a hierarchical CLI structure:

# Process network architecture document (v0.1.17+ syntax)
netintel-ocr process pdf network-architecture.pdf \
             --model Nanonets-OCR-s:latest \
             --network-model qwen2.5vl:7b

See the Quick Start Guide for installation and basic usage.

Documentation¶

🆕 v0.1.18.0 - API v2 & Enterprise Features¶

Core API & Integration¶

API v2 Complete Guide - Comprehensive RESTful API, GraphQL, WebSocket
MCP Integration Guide - 15 tools, 6 resources, 5 prompts for LLM integration
Milvus Vector Database Guide - Advanced vector operations & search
Enterprise Features Guide - Deduplication, monitoring, batch processing

Security & Production (Coming Soon)¶

Authentication & Security Guide - OAuth2, RBAC, audit logging
Production Deployment Guide - High availability, scaling, monitoring
GraphQL API Guide - Schema, queries, mutations, subscriptions
WebSocket Real-time Guide - Real-time updates & notifications

Core Guides¶

Quick Start Guide - Installation and first steps
Installation Guide - Detailed setup and configuration
Knowledge Graph Guide - KG extraction and querying
Deployment Guide - Docker and Kubernetes setup
Customization Guide - Prompt engineering and tuning
Configuration Guide - Complete configuration reference
Migration Guide - Migrating to modular architecture
CLI Reference - Complete CLI command reference
Batch Processing Guide - Large-scale document processing
MiniRAG Guide - RAG-powered question answering
Multi-Model Guide - Using multiple AI models
Performance Guide - Optimization and tuning
Monitoring Guide - System monitoring and metrics

Requirements¶

Python 3.10+
Ollama for LLM inference
8GB+ RAM for processing
GPU recommended for faster inference

Resources¶

Documentation: https://visionml.net/docs
PyPI Package: https://pypi.org/project/netintel-ocr/
GitHub: https://github.com/VisionMLNet/NetIntelOCR
Discord Community: https://discord.gg/netintel-ocr