Skip to content

NetIntel-OCR v0.1.18.1

Enterprise Document Intelligence Platform with Complete Feature Parity

NetIntel-OCR v0.1.18.1 is a comprehensive enterprise platform that transforms unstructured technical documentation into structured, searchable knowledge. This release achieves 100% feature parity between CLI and API v2, includes a complete test framework, and delivers production-ready document intelligence capabilities for building Semantic Configuration Management Databases (CMDB).

Project Goal

Create an intelligent document processing platform that achieves 94% accuracy through sophisticated Hybrid Knowledge GraphRAG capabilities, extracting network architectures, security configurations, and operational workflows from enterprise documentation to populate a Semantic CMDB.

Performance Objectives

The system strives to achieve Knowledge GraphRAG accuracy and performance outcomes using sophisticated hybrid capabilities:

Metric Traditional NetIntel-OCR Hybrid KG Improvement
Accuracy 62% 94% +52%
Query Speed 2.3-8.7s 180-410ms 5-20x faster
Storage Efficiency 20 GB 11.3 GB 43.5% savings
Dependency Analysis 2.3s 180ms 12.8x faster
Compliance Check 5.1s 290ms 17.6x faster
Incident Correlation 8.7s 410ms 21.2x faster

By combining graph traversal, vector similarity, and knowledge graph embeddings, NetIntel-OCR delivers enterprise-grade accuracy with sub-second response times.

Key Use Cases

Network Domain

  • Architecture Discovery: Extract network topologies from design documents
  • Configuration Mapping: Parse firewall rules and routing configurations
  • Dependency Analysis: Identify component relationships and data flows
  • Change Impact: Track architecture evolution across document versions

Security Domain

  • Zone Identification: Detect DMZ, trust boundaries, and security zones
  • Compliance Mapping: Extract security controls and policy implementations
  • Risk Assessment: Identify exposed services and attack surfaces
  • Audit Trail: Document security architecture decisions and rationale

πŸ†• What's New in v0.1.18.1

Complete Feature Parity

  • 100% CLI-API Parity: All 30+ CLI options now available in API v2
  • Multi-Model Support: Complete implementation of --model, --network-model, --flow-model
  • Enhanced Processing: All diagram, table, and vector options fully supported
  • Milvus Default: Vector operations now default to Milvus instead of LanceDB

Comprehensive Test Framework

  • Docker Compose Environment: Complete containerized testing infrastructure
  • Test Categories: Unit, integration, system, performance, and regression testing
  • Quality Metrics: Code coverage, complexity analysis, security scoring
  • CI/CD Integration: GitHub Actions with matrix testing across Python versions
  • Real PDF Fixtures: Integration testing with actual technical documents

API v2 Enhancements (from v0.1.18.0)

  • RESTful API: Complete /api/v2 endpoints with full feature support
  • GraphQL Support: Full schema with queries, mutations, and subscriptions
  • WebSocket: Real-time updates for document processing and search results
  • Streaming Upload: Chunked upload for files up to 5GB with resume capability

MCP (Model Context Protocol) Integration

  • 15 MCP Tools: Document operations, Milvus management, Knowledge Graph queries
  • 6 Interactive Resources: Document explorer, topology visualizer, KG explorer
  • 5 Contextual Prompts: Analysis, synthesis, troubleshooting, security audit
  • LLM-Ready: Seamless integration with Claude, ChatGPT, and other AI assistants

Milvus Vector Database

  • Collection Management: Full CRUD operations with dynamic schemas
  • Advanced Search: Vector, hybrid, and expression-based queries
  • Result Reranking: Cross-encoder, feature-based, and RRF strategies
  • Index Optimization: IVF_FLAT, IVF_SQ8, HNSW, and more

Enterprise Features

  • Deduplication: MD5, SimHash, and CDC-based duplicate detection
  • Performance Monitoring: Real-time metrics and benchmarking
  • Batch Processing: Parallel processing with checkpoint/resume
  • Module Management: Dynamic module configuration and optimization
  • Configuration Templates: Pre-defined profiles for different deployments

Production Ready

  • OAuth2/OIDC: Enterprise authentication with JWT tokens
  • RBAC System: Full role-based access control
  • Multi-tier Caching: Memory, Redis, and hybrid caching
  • Rate Limiting: Multiple strategies for API protection
  • Health Monitoring: Comprehensive health checks and metrics
  • Audit Logging: Complete audit trail for compliance
  • Distributed Tracing: OpenTelemetry support

Core Features

🎯 Intelligent Detection

  • Network diagram recognition with 90%+ accuracy
  • Flow chart and process diagram extraction
  • Multi-diagram page processing
  • Context-aware interpretation using surrounding text

πŸ”„ Mermaid Generation

  • Automatic conversion to Mermaid.js syntax
  • Syntax validation and auto-correction
  • Support for complex network topologies
  • Preserves component relationships

🧠 Context Extraction

  • Analyzes diagrams with document context
  • Identifies critical components and data flows
  • Provides security analysis and recommendations
  • Generates architecture summaries
  • Semantic search across processed documents
  • Milvus integration for scalable retrieval
  • Component and relationship queries
  • Cross-document knowledge linking

πŸ•ΈοΈ Knowledge Graph (v0.1.17)

  • Automatic entity and relationship extraction
  • FalkorDB graph storage with PyKEEN embeddings
  • 8 embedding models (TransE, RotatE, ComplEx, etc.)
  • Hybrid retrieval combining graph and vector search
  • Query intent classification for optimal routing

Architecture

graph TB
    PDF[PDF Documents] --> OCR[OCR Engine]
    OCR --> Detection[Diagram Detection]
    Detection --> Network[Network Processor]
    Detection --> Flow[Flow Processor]
    Network --> Mermaid[Mermaid Generator]
    Flow --> Mermaid
    Mermaid --> Context[Context Extractor]
    Context --> KG[Knowledge Graph]
    Context --> Vector[Vector Store]
    KG --> FalkorDB[FalkorDB Storage]
    KG --> Embeddings[PyKEEN Embeddings]
    Vector --> CMDB[Semantic CMDB]
    FalkorDB --> CMDB
    Embeddings --> CMDB

What's New in v0.1.17.1

🎯 Modular Installation

  • Reduced base size: From 2.5GB to just 500MB
  • Choose your features: Install only what you need
  • 7 optional modules: kg, vector, api, mcp, performance, dev
  • Quick install: pip install netintel-ocr[kg] for Knowledge Graph

πŸ“Š Enhanced Version Display

  • Complete visibility: See all installed and available modules
  • Real-time status: Check service connections instantly
  • Installation hints: Get exact commands for missing features
  • JSON output: Programmatic access to version info
# Check what's installed and available
netintel-ocr --version

# Output shows:
# βœ“ Installed modules with versions
# βœ— Available modules with install commands
# βœ“ Active service connections

Getting Started

Quick Installation

# Choose your installation:

# Option 1: Minimal (500MB) - Core OCR only
pip install netintel-ocr

# Option 2: With Knowledge Graph (2GB) - Recommended
pip install "netintel-ocr[kg]"

# Option 3: Production (2.3GB) - KG + Vector + API
pip install "netintel-ocr[production]"

# Option 4: Everything (2.5GB)
pip install "netintel-ocr[all]"

Processing Documents

NetIntel-OCR v0.1.17+ uses a hierarchical CLI structure:

# Process network architecture document (v0.1.17+ syntax)
netintel-ocr process pdf network-architecture.pdf \
             --model Nanonets-OCR-s:latest \
             --network-model qwen2.5vl:7b

See the Quick Start Guide for installation and basic usage.

Documentation

πŸ†• v0.1.18.0 - API v2 & Enterprise Features

Core API & Integration

Security & Production (Coming Soon)

Core Guides

Requirements

  • Python 3.10+
  • Ollama for LLM inference
  • 8GB+ RAM for processing
  • GPU recommended for faster inference

Resources

  • Documentation: https://visionml.net/docs
  • PyPI Package: https://pypi.org/project/netintel-ocr/
  • GitHub: https://github.com/VisionMLNet/NetIntelOCR
  • Discord Community: https://discord.gg/netintel-ocr