Changelog¶
Version 0.1.17 - Latest Release - Hierarchical CLI & Knowledge Graph¶
Released: 2025-09-12
🎯 Major Feature - Hierarchical CLI Structure¶
- NEW Command Structure: Complete redesign with 8 command groups for better organization
- Command Groups:
process- Document processing (pdf, batch, watch)server- Server operations (api, mcp, worker, all, dev, health)db- Database management (query, merge, stats, cleanup, import, migrate)kg- Knowledge Graph (18+ commands for graph operations)model- Model management (list, set-default, preload, ollama)project- Project initialization (templates: small, medium, large, enterprise)config- Configuration management (profiles, templates, environment variables)system- System utilities (check, diagnose, version, health, metrics)- Breaking Change: Old syntax
netintel-ocr document.pdf→ New syntaxnetintel-ocr process pdf document.pdf
🧠 Knowledge Graph System (Major Enhancement)¶
Core KG Features¶
- FalkorDB Integration: Redis-based graph database for storing entities and relationships
- Automatic Entity Extraction: Identifies network components, flow elements, and their relationships
- PyKEEN Embeddings: 8 state-of-the-art models for knowledge graph embeddings (200-dim):
- TransE (fast, simple relationships)
- RotatE (complex relationships, default)
- ComplEx (symmetric relationships)
- DistMult, ConvE, TuckER, HolE, RESCAL
- Default Enabled: KG features active by default in v0.1.17, use
--no-kgto disable
Hybrid Retrieval System¶
- 4 Retrieval Strategies:
- Vector-first: Start with Milvus, expand with graph
- Graph-first: Start with FalkorDB, enhance with vectors
- Parallel: Execute both simultaneously with RRF
- Adaptive: Auto-select based on query classification
- Query Intent Classification: 6 query types for optimal routing:
- Entity-centric, Relational, Topological
- Semantic, Analytical, Exploratory
- Reciprocal Rank Fusion (RRF): Advanced result merging for parallel search
- Performance Metrics:
- 92% query accuracy (vs 72% vector-only)
- <150ms response time for hybrid queries
- 25% storage reduction with unified storage
Enhanced MiniRAG Integration¶
- 3 Query Modes:
minirag_only: Traditional RAG with vector searchkg_embedding_only: Pure KG embedding similarityhybrid: Combined graph + vector context- Context Enrichment: Graph traversal adds related entities to context
- Answer Generation: LLM with graph-aware context for better accuracy
KG CLI Commands (18+ new commands)¶
- Initialization:
kg init,kg check-requirements - Processing:
kg process,kg train-embeddings - Querying:
kg query,kg rag-query,kg hybrid-search - Analysis:
kg path-find,kg find-similar,kg cluster - Visualization:
kg visualize,kg embedding-stats - Management:
kg export,kg batch-query,kg stats
✨ New Features¶
- Configuration Templates: 6 pre-built templates (minimal, development, staging, production, enterprise, cloud)
- Profile Management: Multiple configuration profiles with easy switching
- Environment Variables: Complete configuration override capability
- 18+ KG Commands: Including
kg init,kg train-embeddings,kg hybrid-search,kg path-find - Visualization Tools: 2D/3D embedding visualization and clustering
- Batch KG Processing: Automatic KG extraction during batch operations
🛠️ Technical Improvements¶
- 50+ New Commands: Organized into intuitive hierarchical structure
- Click Framework: Modern CLI framework for better command organization
- Template System: Pre-configured templates for different deployment scenarios
- Configuration Validation: Comprehensive validation with helpful error messages
- Better Error Handling: Improved error messages and recovery
📚 Documentation¶
- Complete CLI reference with all new commands
- Migration guide from v0.1.16 to v0.1.17
- Configuration template documentation
- Knowledge Graph implementation guides
- Hybrid retrieval architecture documentation
- PyKEEN model selection guide
🔄 KG Implementation Details¶
What Gets Extracted¶
- Network Components: Routers, switches, firewalls, servers, load balancers
- Flow Elements: Process steps, decision points, data stores
- Relationships: CONNECTS_TO, DEPENDS_ON, ROUTES_THROUGH, CONTAINS
- Properties: IP addresses, VLANs, protocols, ports, bandwidth
- Context: Security zones, business services, applications
Storage Architecture¶
- FalkorDB: Graph structure + 200D KG embeddings as node properties
- Milvus: 4096D text embeddings for semantic search
- Unified Interface: Single query API for both graph and vector search
Example Usage¶
# Process with KG (default in v0.1.17)
netintel-ocr process pdf network-architecture.pdf
# Query the knowledge graph
netintel-ocr kg query "MATCH (n:NetworkDevice) RETURN n"
# Natural language query with MiniRAG
netintel-ocr kg rag-query "What are the security vulnerabilities?"
# Find paths between entities
netintel-ocr kg path-find "Router-A" "Database-Server"
# Visualize embeddings
netintel-ocr kg visualize --method tsne --output network-graph.html
Version 0.1.16.15¶
Released: 2025-09-01
🐛 Bug Fixes¶
- Fixed DEFAULT token parser error by replacing 'default' with 'DefaultZone'
- Enhanced connection handling to replace 'default' in arrow connections
- Improved keyword conflict resolution for Mermaid reserved words
Version 0.1.16.14¶
Released: 2025-09-01
🐛 Bug Fixes¶
- Fixed flow diagram parse errors with node-subgraph concatenation patterns
- Enhanced Mermaid fixer to handle 'default' keyword issues in subgraphs
- Improved preprocessing to separate concatenated node and subgraph definitions
Version 0.1.16.13¶
Released: 2025-09-01
✨ Features¶
- Applied Mermaid validation fixes to flow diagrams
- Added context extraction to flow diagrams using surrounding text
- Enhanced flow processor with RobustMermaidValidator for auto-correction
Version 0.1.16.12¶
Released: 2025-09-01
✨ Features¶
- Added context extraction for diagrams using surrounding text paragraphs
- Enhanced validation to auto-correct LLM-generated Mermaid syntax issues
🐛 Bug Fixes¶
- Fixed Mermaid diagram parsing errors with malformed subgraph/zone syntax
Version 0.1.16 - Major Release¶
Released: 2025-08-31
🎯 Major Features¶
- Unified Diagram Detection: Automatic detection for network/flow/hybrid diagrams
- Comprehensive Flow Processing: Full Mermaid generation for flow diagrams
- Context-Aware Analysis: Uses surrounding text (2 paragraphs before/after)
- Prompt Management System: Full customization without code changes
- Default Model Update: NetIntelOCR-7B-0925 as default vision model
✨ New Capabilities¶
- Flow diagram element extraction and Mermaid generation
- Context extraction using document surrounding text
- Complete prompt import/export system
- Enhanced syntax validation and auto-correction
Version 0.1.15¶
Released: 2025-08-30
🚀 Performance Improvements¶
- Milvus Integration: 20-60x faster search, 70% less memory usage
- Qwen3-8B Embeddings: 4096-dimensional vectors via Ollama
- Binary Vectors: Enhanced deduplication with SimHash
- Simplified Deployment: One-command initialization with scales
🔧 Infrastructure¶
- IVF_SQ8 indexing for CPU-optimized search
- Distributed architecture support
- Enhanced C++ deduplication core with AVX2 SIMD
Version 0.1.13¶
Released: 2025-08-25
✨ New Features¶
- REST API Server: Full API mode with
--apiflag - MCP Server: Model Context Protocol support with
--mcp - All-in-One Mode: Combined services with
--all-in-one - Deployment Scales: Small/medium/large/enterprise configurations
- Kubernetes Support: Helm charts and manifests generation
Version 0.1.12¶
Released: 2025-08-20
🎯 Major Features¶
- Centralized Database: Unified LanceDB management
- Advanced Query Engine: Multi-field filtering and reranking
- Parallel Batch Processing: Progress tracking and resumability
- Cloud Storage: S3/MinIO integration
- Enhanced Embeddings: Multiple providers with intelligent caching
Version 0.1.10¶
Released: 2025-08-15
✨ Features¶
- Hybrid Detection: Automatic network/flow diagram classification
- Improved Accuracy: Enhanced component extraction algorithms
- Better Error Handling: Graceful fallbacks for processing failures
Version 0.1.7¶
Released: 2025-08-10
🎯 Major Features¶
- Vector Database: Automatic LanceDB file generation
- RAG Optimization: Minimal metadata for optimal search
- Chunk Management: Intelligent document chunking
Version 0.1.4¶
Released: 2025-08-01
✨ New Features¶
- Multi-Model Support: Different models for different tasks
- Model Optimization: Task-specific model selection
- Performance Modes: Fast/balanced/accurate processing
Version 0.1.0¶
Released: 2025-07-01
🎉 Initial Release¶
- Network diagram detection and extraction
- Mermaid.js generation
- PDF processing with OCR
- Basic CLI interface
- Ollama integration
Upcoming Features¶
Version 0.2.0 (Planned)¶
- Web UI interface
- Real-time collaboration
- Custom model training
- Enterprise SSO integration
- Advanced analytics dashboard
Version 0.3.0 (Planned)¶
- AutoML for model selection
- Federated learning support
- Multi-language support
- Graph database integration
- Compliance reporting
Migration Guides¶
From 0.1.15 to 0.1.16¶
- Update default model to NetIntelOCR-7B-0925
- Export and update prompts using new management system
- Test flow diagram processing with new validator
From 0.1.12 to 0.1.15¶
- Migrate from LanceDB to Milvus
- Update embedding dimensions to 4096
- Regenerate vector indices
From 0.1.7 to 0.1.12¶
- Update batch processing scripts
- Configure cloud storage backends
- Migrate to centralized database
Deprecation Notices¶
Deprecated in 0.1.16¶
- Old flow diagram processor (use enhanced version)
- Manual prompt editing in code (use prompt management)
Deprecated in 0.1.15¶
- LanceDB backend (use Milvus)
- 768-dimension embeddings (use 4096)
Will be removed in 0.2.0¶
- Legacy CLI arguments
- Old configuration format
- Direct Ollama API calls
Support¶
For issues and questions: - GitHub: https://github.com/VisionMLNet/NetIntelOCR/issues - Documentation: https://visionml.net/docs - PyPI: https://pypi.org/project/netintel-ocr/ - Discord: https://discord.gg/netintel-ocr