NetIntel-OCR v0.1.18.1¶
Enterprise Document Intelligence Platform with Complete Feature Parity¶
NetIntel-OCR v0.1.18.1 is a comprehensive enterprise platform that transforms unstructured technical documentation into structured, searchable knowledge. This release achieves 100% feature parity between CLI and API v2, includes a complete test framework, and delivers production-ready document intelligence capabilities for building Semantic Configuration Management Databases (CMDB).
Project Goal¶
Create an intelligent document processing platform that achieves 94% accuracy through sophisticated Hybrid Knowledge GraphRAG capabilities, extracting network architectures, security configurations, and operational workflows from enterprise documentation to populate a Semantic CMDB.
Performance Objectives¶
The system strives to achieve Knowledge GraphRAG accuracy and performance outcomes using sophisticated hybrid capabilities:
Metric | Traditional | NetIntel-OCR Hybrid KG | Improvement |
---|---|---|---|
Accuracy | 62% | 94% | +52% |
Query Speed | 2.3-8.7s | 180-410ms | 5-20x faster |
Storage Efficiency | 20 GB | 11.3 GB | 43.5% savings |
Dependency Analysis | 2.3s | 180ms | 12.8x faster |
Compliance Check | 5.1s | 290ms | 17.6x faster |
Incident Correlation | 8.7s | 410ms | 21.2x faster |
By combining graph traversal, vector similarity, and knowledge graph embeddings, NetIntel-OCR delivers enterprise-grade accuracy with sub-second response times.
Key Use Cases¶
Network Domain¶
- Architecture Discovery: Extract network topologies from design documents
- Configuration Mapping: Parse firewall rules and routing configurations
- Dependency Analysis: Identify component relationships and data flows
- Change Impact: Track architecture evolution across document versions
Security Domain¶
- Zone Identification: Detect DMZ, trust boundaries, and security zones
- Compliance Mapping: Extract security controls and policy implementations
- Risk Assessment: Identify exposed services and attack surfaces
- Audit Trail: Document security architecture decisions and rationale
π What's New in v0.1.18.1¶
Complete Feature Parity¶
- 100% CLI-API Parity: All 30+ CLI options now available in API v2
- Multi-Model Support: Complete implementation of
--model
,--network-model
,--flow-model
- Enhanced Processing: All diagram, table, and vector options fully supported
- Milvus Default: Vector operations now default to Milvus instead of LanceDB
Comprehensive Test Framework¶
- Docker Compose Environment: Complete containerized testing infrastructure
- Test Categories: Unit, integration, system, performance, and regression testing
- Quality Metrics: Code coverage, complexity analysis, security scoring
- CI/CD Integration: GitHub Actions with matrix testing across Python versions
- Real PDF Fixtures: Integration testing with actual technical documents
API v2 Enhancements (from v0.1.18.0)¶
- RESTful API: Complete
/api/v2
endpoints with full feature support - GraphQL Support: Full schema with queries, mutations, and subscriptions
- WebSocket: Real-time updates for document processing and search results
- Streaming Upload: Chunked upload for files up to 5GB with resume capability
MCP (Model Context Protocol) Integration¶
- 15 MCP Tools: Document operations, Milvus management, Knowledge Graph queries
- 6 Interactive Resources: Document explorer, topology visualizer, KG explorer
- 5 Contextual Prompts: Analysis, synthesis, troubleshooting, security audit
- LLM-Ready: Seamless integration with Claude, ChatGPT, and other AI assistants
Milvus Vector Database¶
- Collection Management: Full CRUD operations with dynamic schemas
- Advanced Search: Vector, hybrid, and expression-based queries
- Result Reranking: Cross-encoder, feature-based, and RRF strategies
- Index Optimization: IVF_FLAT, IVF_SQ8, HNSW, and more
Enterprise Features¶
- Deduplication: MD5, SimHash, and CDC-based duplicate detection
- Performance Monitoring: Real-time metrics and benchmarking
- Batch Processing: Parallel processing with checkpoint/resume
- Module Management: Dynamic module configuration and optimization
- Configuration Templates: Pre-defined profiles for different deployments
Production Ready¶
- OAuth2/OIDC: Enterprise authentication with JWT tokens
- RBAC System: Full role-based access control
- Multi-tier Caching: Memory, Redis, and hybrid caching
- Rate Limiting: Multiple strategies for API protection
- Health Monitoring: Comprehensive health checks and metrics
- Audit Logging: Complete audit trail for compliance
- Distributed Tracing: OpenTelemetry support
Core Features¶
π― Intelligent Detection¶
- Network diagram recognition with 90%+ accuracy
- Flow chart and process diagram extraction
- Multi-diagram page processing
- Context-aware interpretation using surrounding text
π Mermaid Generation¶
- Automatic conversion to Mermaid.js syntax
- Syntax validation and auto-correction
- Support for complex network topologies
- Preserves component relationships
π§ Context Extraction¶
- Analyzes diagrams with document context
- Identifies critical components and data flows
- Provides security analysis and recommendations
- Generates architecture summaries
π Vector Search¶
- Semantic search across processed documents
- Milvus integration for scalable retrieval
- Component and relationship queries
- Cross-document knowledge linking
πΈοΈ Knowledge Graph (v0.1.17)¶
- Automatic entity and relationship extraction
- FalkorDB graph storage with PyKEEN embeddings
- 8 embedding models (TransE, RotatE, ComplEx, etc.)
- Hybrid retrieval combining graph and vector search
- Query intent classification for optimal routing
Architecture¶
graph TB
PDF[PDF Documents] --> OCR[OCR Engine]
OCR --> Detection[Diagram Detection]
Detection --> Network[Network Processor]
Detection --> Flow[Flow Processor]
Network --> Mermaid[Mermaid Generator]
Flow --> Mermaid
Mermaid --> Context[Context Extractor]
Context --> KG[Knowledge Graph]
Context --> Vector[Vector Store]
KG --> FalkorDB[FalkorDB Storage]
KG --> Embeddings[PyKEEN Embeddings]
Vector --> CMDB[Semantic CMDB]
FalkorDB --> CMDB
Embeddings --> CMDB
What's New in v0.1.17.1¶
π― Modular Installation¶
- Reduced base size: From 2.5GB to just 500MB
- Choose your features: Install only what you need
- 7 optional modules: kg, vector, api, mcp, performance, dev
- Quick install:
pip install netintel-ocr[kg]
for Knowledge Graph
π Enhanced Version Display¶
- Complete visibility: See all installed and available modules
- Real-time status: Check service connections instantly
- Installation hints: Get exact commands for missing features
- JSON output: Programmatic access to version info
# Check what's installed and available
netintel-ocr --version
# Output shows:
# β Installed modules with versions
# β Available modules with install commands
# β Active service connections
Getting Started¶
Quick Installation¶
# Choose your installation:
# Option 1: Minimal (500MB) - Core OCR only
pip install netintel-ocr
# Option 2: With Knowledge Graph (2GB) - Recommended
pip install "netintel-ocr[kg]"
# Option 3: Production (2.3GB) - KG + Vector + API
pip install "netintel-ocr[production]"
# Option 4: Everything (2.5GB)
pip install "netintel-ocr[all]"
Processing Documents¶
NetIntel-OCR v0.1.17+ uses a hierarchical CLI structure:
# Process network architecture document (v0.1.17+ syntax)
netintel-ocr process pdf network-architecture.pdf \
--model Nanonets-OCR-s:latest \
--network-model qwen2.5vl:7b
See the Quick Start Guide for installation and basic usage.
Documentation¶
π v0.1.18.0 - API v2 & Enterprise Features¶
Core API & Integration¶
- API v2 Complete Guide - Comprehensive RESTful API, GraphQL, WebSocket
- MCP Integration Guide - 15 tools, 6 resources, 5 prompts for LLM integration
- Milvus Vector Database Guide - Advanced vector operations & search
- Enterprise Features Guide - Deduplication, monitoring, batch processing
Security & Production (Coming Soon)¶
- Authentication & Security Guide - OAuth2, RBAC, audit logging
- Production Deployment Guide - High availability, scaling, monitoring
- GraphQL API Guide - Schema, queries, mutations, subscriptions
- WebSocket Real-time Guide - Real-time updates & notifications
Core Guides¶
- Quick Start Guide - Installation and first steps
- Installation Guide - Detailed setup and configuration
- Knowledge Graph Guide - KG extraction and querying
- Deployment Guide - Docker and Kubernetes setup
- Customization Guide - Prompt engineering and tuning
- Configuration Guide - Complete configuration reference
- Migration Guide - Migrating to modular architecture
- CLI Reference - Complete CLI command reference
- Batch Processing Guide - Large-scale document processing
- MiniRAG Guide - RAG-powered question answering
- Multi-Model Guide - Using multiple AI models
- Performance Guide - Optimization and tuning
- Monitoring Guide - System monitoring and metrics
Requirements¶
- Python 3.10+
- Ollama for LLM inference
- 8GB+ RAM for processing
- GPU recommended for faster inference
Resources¶
- Documentation: https://visionml.net/docs
- PyPI Package: https://pypi.org/project/netintel-ocr/
- GitHub: https://github.com/VisionMLNet/NetIntelOCR
- Discord Community: https://discord.gg/netintel-ocr