Knowledge Graph Processing¶
Overview¶
NetIntel-OCR v0.1.17 introduces powerful Knowledge Graph (KG) capabilities that automatically extract and structure relationships from your network documentation. The system creates semantic graphs from network diagrams, flow charts, and technical text, enabling advanced querying and relationship analysis.
Model Categories
NetIntel-OCR uses two distinct categories of models:
INGESTION MODELS (for PDF processing):
- qwen2.5vl:7b - Network diagram analysis
- Nanonets-OCR-s:latest - OCR text extraction
MINIRAG MODELS (for Q&A after ingestion):
- gemma3:4b-it-qat - Answer generation
- qwen3-embedding:8b - Semantic search
These are separate model sets - ingestion models process documents, MiniRAG models enable Q&A!
New in v0.1.17
Knowledge Graph processing is enabled by default in v0.1.17. No additional flags needed!
Quick Start¶
Basic Usage¶
# Process with KG enabled (default)
netintel-ocr process pdf document.pdf
# Explicitly disable KG if not needed
netintel-ocr process pdf document.pdf --no-kg
What Gets Extracted¶
The Knowledge Graph system automatically identifies and extracts:
- Network Components: Routers, switches, firewalls, servers, load balancers
- Relationships: Connections, data flows, dependencies, configurations
- Attributes: IP addresses, VLANs, protocols, ports, bandwidths
- Topologies: Network paths, redundancy patterns, hierarchies
- Business Context: Services, applications, security zones
Architecture¶
Components¶
graph LR
A[PDF Document] --> B[OCR Engine]
B --> C[KG Constructor]
C --> D[FalkorDB]
C --> E[PyKEEN Embeddings]
D --> F[Graph Queries]
E --> F
F --> G[Hybrid Retrieval]
Storage Layers¶
- FalkorDB: Graph database for storing entities and relationships
- Milvus: Vector database for text embeddings (4096D)
- KG Embeddings: 200D knowledge graph embeddings stored as properties
Configuration¶
KG Model Selection¶
Choose from 8 different embedding models based on your use case:
# TransE - Fast, good for simple relationships
netintel-ocr process pdf document.pdf --kg-model TransE
# RotatE - Best for complex relationships (default)
netintel-ocr process pdf document.pdf --kg-model RotatE
# ComplEx - Good for symmetric relationships
netintel-ocr process pdf document.pdf --kg-model ComplEx
# Available models: TransE, RotatE, ComplEx, DistMult, ConvE, TuckER, HolE, RESCAL
Training Parameters¶
# Customize training epochs (default: 100)
netintel-ocr process pdf document.pdf --kg-epochs 200
# Adjust batch size (default: 256)
netintel-ocr kg train-embeddings --batch-size 512
# Combined configuration
netintel-ocr process pdf document.pdf \
--kg-model RotatE \
--kg-epochs 150 \
--kg-batch-size 384
External Services Configuration¶
# Configure external Ollama server
export OLLAMA_HOST="http://your-ollama-server:11434"
# Verify Ollama models are available
curl $OLLAMA_HOST/api/tags | jq '.models[].name'
# Models for different purposes:
# INGESTION: qwen2.5vl:7b, Nanonets-OCR-s:latest
# MINIRAG: gemma3:4b-it-qat, qwen3-embedding:8b
# Custom FalkorDB host/port
netintel-ocr process pdf document.pdf \
--falkordb-host 192.168.1.100 \
--falkordb-port 6379
# Using environment variables
export FALKORDB_HOST=falkordb.local
export FALKORDB_PORT=6379
export OLLAMA_HOST="http://192.168.1.100:11434"
netintel-ocr process pdf document.pdf
Processing Modes¶
Network Diagrams¶
When processing network diagrams, the KG system:
- Identifies network components from visual elements
- Extracts connection relationships from lines/arrows
- Preserves spatial layout information
- Links related text annotations
# Process network-only with KG
netintel-ocr process pdf document.pdf --network-only
# Output includes:
# - network_topology.json (graph structure)
# - kg_embeddings.npy (learned embeddings)
# - relationships.cypher (import queries)
Flow Diagrams¶
For flow diagrams and process charts:
- Extracts process steps as entities
- Maps flow direction as relationships
- Captures decision points and branches
- Associates metadata and conditions
# Process with flow detection (uses INGESTION model)
netintel-ocr process pdf document.pdf --flow-model qwen2.5vl:7b # Ingestion model, NOT MiniRAG
Hybrid Processing¶
Combines multiple extraction methods:
# Full hybrid processing (default)
netintel-ocr process pdf document.pdf
# This enables:
# - Network diagram KG extraction
# - Flow diagram relationship mapping
# - Table structure preservation
# - Text entity recognition
Knowledge Graph CLI Commands¶
Check System Requirements¶
# Check if all KG requirements are installed
netintel-ocr kg check-requirements
# Check with verbose output
netintel-ocr kg check-requirements --verbose
Initialize KG System¶
# Initialize FalkorDB indices and schema
netintel-ocr kg init
netintel-ocr kg init --falkordb-host localhost --falkordb-port 6379
# With authentication
netintel-ocr kg init --password your_password --graph-name custom_kg
Process Documents with KG¶
# Process a document with KG generation
netintel-ocr kg process document.pdf
netintel-ocr kg process --model RotatE --epochs 100 document.pdf
# Process with specific configuration
netintel-ocr kg process \
--kg-model ComplEx \
--batch-size 512 \
--force-retrain \
document.pdf
View Statistics¶
# Display Knowledge Graph statistics
netintel-ocr kg stats
netintel-ocr kg stats --format json
netintel-ocr kg stats --format table
# Display embedding statistics
netintel-ocr kg embedding-stats
netintel-ocr kg embedding-stats --detailed
Train KG Embeddings¶
# Train embeddings with PyKEEN
netintel-ocr kg train-embeddings
netintel-ocr kg train-embeddings --model RotatE --epochs 150
# Force retrain existing embeddings
netintel-ocr kg train-embeddings --force --model ComplEx
# Available models: TransE, RotatE, ComplEx, DistMult, ConvE, TuckER, HolE, RESCAL
Query the Knowledge Graph¶
# Execute Cypher queries
netintel-ocr kg query "MATCH (n:NetworkDevice) RETURN n LIMIT 10"
netintel-ocr kg query --format json "MATCH (n)-[r]->(m) RETURN n,r,m LIMIT 5"
# Find paths between entities
netintel-ocr kg path-find "Router-A" "Server-DB"
netintel-ocr kg path-find --max-depth 5 --bidirectional "DMZ" "Internal"
# Get entity context
netintel-ocr kg entity-context "Firewall-Main"
netintel-ocr kg entity-context --expand-depth 2 --include-embeddings "Router-Core"
Similarity and Clustering¶
# Find similar entities
netintel-ocr kg find-similar "Router-A"
netintel-ocr kg find-similar --limit 10 --threshold 0.7 "Switch-Core"
# Compute similarity between entities
netintel-ocr kg similarity "Router-A" "Router-B"
netintel-ocr kg similarity --method cosine "Server-1" "Server-2"
# Cluster entities by embeddings
netintel-ocr kg cluster
netintel-ocr kg cluster --n-clusters 5 --method kmeans
netintel-ocr kg cluster --min-samples 3 --eps 0.5 --method dbscan
Advanced Retrieval¶
# Classify query intent
netintel-ocr kg classify-query "What connects to the firewall?"
netintel-ocr kg classify-query --verbose "Show network topology"
# Hybrid search with multiple strategies
netintel-ocr kg hybrid-search "security vulnerabilities in DMZ"
netintel-ocr kg hybrid-search \
--strategy adaptive \
--limit 20 \
--expand-hops 3 \
"database connections"
# Compare retrieval strategies
netintel-ocr kg compare-strategies "network redundancy paths"
netintel-ocr kg compare-strategies --detailed --format json "firewall rules"
# RAG-enhanced query
netintel-ocr kg rag-query "What are the security implications?"
netintel-ocr kg rag-query \
--mode hybrid \
--context-depth 2 \
--temperature 0.7 \
"explain the network architecture"
Batch Operations¶
# Process batch queries
netintel-ocr kg batch-query queries.txt
netintel-ocr kg batch-query --output results.json --parallel 4 queries.txt
# Format for queries.txt:
# What connects to Router-A?
# Find path from DMZ to Database
# Show similar devices to Firewall-1
Visualization¶
# Visualize embeddings
netintel-ocr kg visualize
netintel-ocr kg visualize --method tsne --dimensions 2
netintel-ocr kg visualize --method pca --dimensions 3 --output embeddings.html
netintel-ocr kg visualize --color-by type --save-plot embeddings.png
Export and Import¶
# Export Knowledge Graph
netintel-ocr kg export --format cypher --output network.cypher
netintel-ocr kg export --format json --output graph.json
netintel-ocr kg export --format graphml --output network.graphml
# Include embeddings in export
netintel-ocr kg export --include-embeddings --format json --output full_graph.json
Query Types¶
The system supports 6 query types:
- Entity-Centric: Information about specific components
- Relational: Connection and dependency queries
- Topological: Path finding and network analysis
- Semantic: Content-based similarity search
- Analytical: Aggregations and statistics
- Exploratory: Pattern discovery
Example Python Usage¶
# Python API usage
from netintel_ocr.kg import HybridSystem, FalkorDBManager, HybridRetriever
# Initialize system
manager = FalkorDBManager(host="localhost", port=6379)
hybrid = HybridSystem(manager)
# Process document
results = await hybrid.process_document("document.pdf")
# Initialize retriever
retriever = HybridRetriever(manager)
# Perform searches
entity_results = await retriever.hybrid_search(
query="Router-Core-1",
strategy="graph_first"
)
path_results = await retriever.hybrid_search(
query="path from DMZ-Switch to Internal-DB",
strategy="adaptive"
)
Batch Processing with KG¶
Process Multiple Documents¶
# Batch process with KG (enabled by default)
netintel-ocr process batch *.pdf
# Batch with custom KG settings
netintel-ocr process batch \
--kg-model ComplEx \
--kg-epochs 200 \
--max-parallel 4 \
*.pdf
Building Unified Knowledge Base¶
# Ingest to shared knowledge graph
netintel-ocr process batch \
--collection enterprise_kg \
--kg-merge-strategy union \
/docs/**/*.pdf
Integration with MiniRAG¶
Enhanced Retrieval¶
The KG system enhances MiniRAG (Retrieval Augmented Generation) with:
- Graph-aware context: Include related entities in context
- Path-based retrieval: Follow relationships for comprehensive answers
- Hybrid scoring: Combine vector similarity with graph distance
MiniRAG Models
MiniRAG uses its own models (gemma3:4b-it-qat, qwen3-embedding:8b) for Q&A,
which are separate from the ingestion models used during PDF processing.
# Process document with KG enabled (default)
netintel-ocr process pdf document.pdf
# Query with Enhanced MiniRAG
netintel-ocr kg rag-query "What are the dependencies of Service-A?"
# RAG query with specific options
netintel-ocr kg rag-query \
--mode hybrid \
--context-depth 2 \
--temperature 0.7 \
"explain the network topology"
Retrieval Strategies¶
# Use hybrid search with different strategies
# Vector-first (fast, good for content)
netintel-ocr kg hybrid-search --strategy vector_first "security policies"
# Graph-first (accurate for relationships)
netintel-ocr kg hybrid-search --strategy graph_first "what connects to firewall"
# Parallel (balanced approach)
netintel-ocr kg hybrid-search --strategy parallel "network redundancy"
# Adaptive (query-dependent, default)
netintel-ocr kg hybrid-search --strategy adaptive "database vulnerabilities"
# Compare all strategies for a query
netintel-ocr kg compare-strategies "network topology analysis"
Performance Optimization¶
Memory Management¶
# Limit graph size for large documents
netintel-ocr process pdf document.pdf \
--kg-max-entities 10000 \
--kg-max-relations 50000
# Stream processing for very large graphs
netintel-ocr process pdf large_document.pdf \
--kg-streaming \
--kg-chunk-size 1000
GPU Acceleration¶
# Enable GPU for embeddings training
netintel-ocr process pdf document.pdf \
--kg-gpu \
--kg-device cuda:0
# Multi-GPU training
netintel-ocr process pdf document.pdf \
--kg-gpu \
--kg-device cuda:0,cuda:1 \
--kg-distributed
Docker Deployment¶
Quick Start with Docker Compose¶
# docker-compose.kg.yml
version: '3.8'
services:
falkordb:
image: falkordb/falkordb:latest
ports:
- "6379:6379"
volumes:
- falkordb_data:/data
milvus:
image: milvusdb/milvus:latest
ports:
- "19530:19530"
volumes:
- milvus_data:/var/lib/milvus
netintel-ocr:
image: visionml/netintel-ocr:v0.1.17
environment:
- FALKORDB_HOST=falkordb
- MILVUS_HOST=milvus:19530
- OLLAMA_HOST=http://your-ollama-server:11434 # External Ollama
volumes:
- ./documents:/documents
- ./output:/output
volumes:
falkordb_data:
milvus_data:
Start the stack:
Kubernetes Deployment¶
Helm Installation¶
# Add NetIntel-OCR helm repo
helm repo add netintel https://visionml.net/helm
helm repo update
# Install with KG enabled and external Ollama
helm install netintel-ocr netintel/netintel-ocr \
--set kg.enabled=true \
--set falkordb.enabled=true \
--set milvus.enabled=true \
--set ollama.host="http://your-ollama-server:11434"
Custom Values¶
# values-kg.yaml
kg:
enabled: true
model: RotatE
epochs: 150
batchSize: 384
ollama:
host: "http://your-ollama-server:11434" # External Ollama server
falkordb:
enabled: true
persistence:
size: 10Gi
milvus:
enabled: true
persistence:
size: 50Gi
Monitoring & Analytics¶
KG Statistics¶
# View graph statistics
netintel-ocr kg stats
# Detailed statistics in different formats
netintel-ocr kg stats --format json
netintel-ocr kg stats --format table
netintel-ocr kg stats --format summary
# View embedding statistics
netintel-ocr kg embedding-stats
netintel-ocr kg embedding-stats --detailed
# Example output:
# Graph Statistics:
# Total nodes: 1,247
# Total edges: 3,892
# Node types: NetworkDevice(156), Service(89), Zone(12)
# Edge types: CONNECTS_TO(2341), DEPENDS_ON(893), CONTAINS(658)
# Average degree: 6.2
# Connected components: 3
Training Monitoring¶
# Train with progress monitoring
netintel-ocr kg train-embeddings \
--model RotatE \
--epochs 150 \
--verbose
# View training history
netintel-ocr kg embedding-stats --show-history
Troubleshooting¶
Common Issues¶
KG processing is slow:
# Reduce epochs for faster processing
netintel-ocr process pdf document.pdf --kg-epochs 50
# Or disable if not needed
netintel-ocr process pdf document.pdf --no-kg
Out of memory errors:
# Reduce batch size
netintel-ocr kg train-embeddings --batch-size 128
# Enable streaming mode
netintel-ocr process pdf document.pdf --kg-streaming
FalkorDB connection issues:
# Check FalkorDB status
redis-cli -h localhost -p 6379 ping
# Verify graph module
redis-cli MODULE LIST
Debug Mode¶
# Enable debug output
netintel-ocr --debug process pdf document.pdf --kg-verbose
# Save intermediate results
netintel-ocr process pdf document.pdf \
--kg-save-intermediate \
--output-dir ./debug
API Reference¶
Python API¶
import os
from netintel_ocr.kg import KnowledgeGraphSystem
# Configure external Ollama
os.environ['OLLAMA_HOST'] = "http://your-ollama-server:11434"
# Initialize KG system
kg_system = KnowledgeGraphSystem(
falkordb_host="localhost",
falkordb_port=6379,
model="RotatE",
epochs=100,
ollama_host=os.environ.get('OLLAMA_HOST', 'http://localhost:11434')
)
# Process document
graph = kg_system.process_document("document.pdf")
# Query graph
results = kg_system.query(
query_type="entity_centric",
entity="Router-A"
)
# Export graph
kg_system.export(
format="cypher",
output="network_graph.cypher"
)
REST API¶
# Process with KG
curl -X POST http://localhost:8000/process \
-F "[email protected]" \
-F "enable_kg=true" \
-F "kg_model=RotatE"
# Query KG
curl -X GET http://localhost:8000/kg/query \
-d "entity=Router-A" \
-d "hops=2"
Best Practices¶
- Model Selection:
- Use
TransEfor simple, hierarchical networks - Use
RotatE(default) for complex topologies -
Use
ComplExfor bidirectional relationships -
Performance:
- Start with 100 epochs, increase if needed
- Use GPU for documents > 50 pages
-
Enable streaming for very large graphs
-
Integration:
- Always persist graphs to FalkorDB for reuse
- Combine with vector search for best results
- Use batch processing for document sets
Migration from v0.1.16¶
If upgrading from v0.1.16:
- KG is now enabled by default - no flags needed
- Dependencies are included - no separate install required
- Use
--no-kgto disable if you want v0.1.16 behavior
# v0.1.16 behavior (no KG)
netintel-ocr process pdf document.pdf --no-kg
# v0.1.17 default (with KG)
netintel-ocr process pdf document.pdf
Additional Resources¶
Complete KG Command Reference¶
All Available Commands¶
| Command | Description | Example |
|---|---|---|
check-requirements |
Check if all requirements are installed | netintel-ocr kg check-requirements |
init |
Initialize FalkorDB indices and schema | netintel-ocr kg init |
stats |
Display Knowledge Graph statistics | netintel-ocr kg stats --format json |
process |
Process document with KG generation | netintel-ocr kg process document.pdf |
query |
Execute Cypher query on the graph | netintel-ocr kg query "MATCH (n) RETURN n" |
train-embeddings |
Train PyKEEN KG embeddings | netintel-ocr kg train-embeddings --model RotatE |
embedding-stats |
Display embedding statistics | netintel-ocr kg embedding-stats |
similarity |
Compute similarity between entities | netintel-ocr kg similarity "A" "B" |
find-similar |
Find similar entities | netintel-ocr kg find-similar "Router-A" |
visualize |
Visualize embeddings in 2D/3D | netintel-ocr kg visualize --method tsne |
cluster |
Cluster entities by embeddings | netintel-ocr kg cluster --n-clusters 5 |
path-find |
Find paths between entities | netintel-ocr kg path-find "A" "B" |
entity-context |
Get rich context for entity | netintel-ocr kg entity-context "Server-1" |
rag-query |
Query using Enhanced MiniRAG | netintel-ocr kg rag-query "explain topology" |
classify-query |
Classify query intent | netintel-ocr kg classify-query "what connects?" |
hybrid-search |
Hybrid search with strategies | netintel-ocr kg hybrid-search "security" |
compare-strategies |
Compare retrieval strategies | netintel-ocr kg compare-strategies "query" |
batch-query |
Process batch queries | netintel-ocr kg batch-query queries.txt |
export |
Export Knowledge Graph | netintel-ocr kg export --format json |
Quick Reference Card¶
# Essential Setup
netintel-ocr kg check-requirements # Verify installation
netintel-ocr kg init # Initialize KG system
netintel-ocr kg stats # Check system status
# Document Processing
netintel-ocr process pdf document.pdf # Process with KG (default)
netintel-ocr process pdf document.pdf --no-kg # Process without KG
netintel-ocr kg process document.pdf # Explicit KG processing
# Training & Embeddings
netintel-ocr kg train-embeddings # Train with defaults
netintel-ocr kg train-embeddings --force # Force retrain
netintel-ocr kg embedding-stats # View embedding info
# Querying
netintel-ocr kg query "MATCH (n) RETURN n" # Cypher query
netintel-ocr kg rag-query "explain this" # Natural language query
netintel-ocr kg hybrid-search "topic" # Hybrid search
# Analysis
netintel-ocr kg find-similar "entity" # Find similar entities
netintel-ocr kg path-find "A" "B" # Find paths
netintel-ocr kg cluster # Cluster entities
# Export
netintel-ocr kg export --format json # Export as JSON
netintel-ocr kg export --format cypher # Export as Cypher
Support¶
For KG-related issues:
# View available commands and options
netintel-ocr kg --help
# Get help for specific command
netintel-ocr kg init --help
netintel-ocr kg train-embeddings --help
# Check system status
netintel-ocr kg stats --format json
Common Troubleshooting Commands¶
# Check all requirements first
netintel-ocr kg check-requirements --verbose
# Verify FalkorDB connection
netintel-ocr kg init
# Check if embeddings exist
netintel-ocr kg embedding-stats
# Test with simple query
netintel-ocr kg query "MATCH (n) RETURN count(n)"
# Verify MiniRAG models (separate from ingestion)
curl $OLLAMA_HOST/api/tags | grep -E "gemma3|qwen3-embedding"
Contact support with diagnostic output for faster resolution.