Knowledge Graph Processing¶

Overview¶

NetIntel-OCR v0.1.17 introduces powerful Knowledge Graph (KG) capabilities that automatically extract and structure relationships from your network documentation. The system creates semantic graphs from network diagrams, flow charts, and technical text, enabling advanced querying and relationship analysis.

Model Categories

NetIntel-OCR uses two distinct categories of models:

INGESTION MODELS (for PDF processing): - qwen2.5vl:7b - Network diagram analysis - Nanonets-OCR-s:latest - OCR text extraction

MINIRAG MODELS (for Q&A after ingestion): - gemma3:4b-it-qat - Answer generation - qwen3-embedding:8b - Semantic search

These are separate model sets - ingestion models process documents, MiniRAG models enable Q&A!

New in v0.1.17

Knowledge Graph processing is enabled by default in v0.1.17. No additional flags needed!

Quick Start¶

Basic Usage¶

# Process with KG enabled (default)
netintel-ocr process pdf document.pdf

# Explicitly disable KG if not needed
netintel-ocr process pdf document.pdf --no-kg

What Gets Extracted¶

The Knowledge Graph system automatically identifies and extracts:

Network Components: Routers, switches, firewalls, servers, load balancers
Relationships: Connections, data flows, dependencies, configurations
Attributes: IP addresses, VLANs, protocols, ports, bandwidths
Topologies: Network paths, redundancy patterns, hierarchies
Business Context: Services, applications, security zones

Architecture¶

Components¶

graph LR
    A[PDF Document] --> B[OCR Engine]
    B --> C[KG Constructor]
    C --> D[FalkorDB]
    C --> E[PyKEEN Embeddings]
    D --> F[Graph Queries]
    E --> F
    F --> G[Hybrid Retrieval]

Storage Layers¶

FalkorDB: Graph database for storing entities and relationships
Milvus: Vector database for text embeddings (4096D)
KG Embeddings: 200D knowledge graph embeddings stored as properties

Configuration¶

KG Model Selection¶

Choose from 8 different embedding models based on your use case:

# TransE - Fast, good for simple relationships
netintel-ocr process pdf document.pdf --kg-model TransE

# RotatE - Best for complex relationships (default)
netintel-ocr process pdf document.pdf --kg-model RotatE

# ComplEx - Good for symmetric relationships
netintel-ocr process pdf document.pdf --kg-model ComplEx

# Available models: TransE, RotatE, ComplEx, DistMult, ConvE, TuckER, HolE, RESCAL

Training Parameters¶

# Customize training epochs (default: 100)
netintel-ocr process pdf document.pdf --kg-epochs 200

# Adjust batch size (default: 256)
netintel-ocr kg train-embeddings --batch-size 512

# Combined configuration
netintel-ocr process pdf document.pdf \
  --kg-model RotatE \
  --kg-epochs 150 \
  --kg-batch-size 384

External Services Configuration¶

# Configure external Ollama server
export OLLAMA_HOST="http://your-ollama-server:11434"

# Verify Ollama models are available
curl $OLLAMA_HOST/api/tags | jq '.models[].name'

# Models for different purposes:
# INGESTION: qwen2.5vl:7b, Nanonets-OCR-s:latest
# MINIRAG: gemma3:4b-it-qat, qwen3-embedding:8b

# Custom FalkorDB host/port
netintel-ocr process pdf document.pdf \
  --falkordb-host 192.168.1.100 \
  --falkordb-port 6379

# Using environment variables
export FALKORDB_HOST=falkordb.local
export FALKORDB_PORT=6379
export OLLAMA_HOST="http://192.168.1.100:11434"
netintel-ocr process pdf document.pdf

Processing Modes¶

Network Diagrams¶

When processing network diagrams, the KG system:

Identifies network components from visual elements
Extracts connection relationships from lines/arrows
Preserves spatial layout information
Links related text annotations

# Process network-only with KG
netintel-ocr process pdf document.pdf --network-only

# Output includes:
# - network_topology.json (graph structure)
# - kg_embeddings.npy (learned embeddings)
# - relationships.cypher (import queries)

Flow Diagrams¶

For flow diagrams and process charts:

Extracts process steps as entities
Maps flow direction as relationships
Captures decision points and branches
Associates metadata and conditions

# Process with flow detection (uses INGESTION model)
netintel-ocr process pdf document.pdf --flow-model qwen2.5vl:7b  # Ingestion model, NOT MiniRAG

Hybrid Processing¶

Combines multiple extraction methods:

# Full hybrid processing (default)
netintel-ocr process pdf document.pdf

# This enables:
# - Network diagram KG extraction
# - Flow diagram relationship mapping
# - Table structure preservation
# - Text entity recognition

Knowledge Graph CLI Commands¶

Check System Requirements¶

# Check if all KG requirements are installed
netintel-ocr kg check-requirements

# Check with verbose output
netintel-ocr kg check-requirements --verbose

Initialize KG System¶

# Initialize FalkorDB indices and schema
netintel-ocr kg init
netintel-ocr kg init --falkordb-host localhost --falkordb-port 6379

# With authentication
netintel-ocr kg init --password your_password --graph-name custom_kg

Process Documents with KG¶

# Process a document with KG generation
netintel-ocr kg process document.pdf
netintel-ocr kg process --model RotatE --epochs 100 document.pdf

# Process with specific configuration
netintel-ocr kg process \
  --kg-model ComplEx \
  --batch-size 512 \
  --force-retrain \
  document.pdf

View Statistics¶

# Display Knowledge Graph statistics
netintel-ocr kg stats
netintel-ocr kg stats --format json
netintel-ocr kg stats --format table

# Display embedding statistics
netintel-ocr kg embedding-stats
netintel-ocr kg embedding-stats --detailed

Train KG Embeddings¶

# Train embeddings with PyKEEN
netintel-ocr kg train-embeddings
netintel-ocr kg train-embeddings --model RotatE --epochs 150

# Force retrain existing embeddings
netintel-ocr kg train-embeddings --force --model ComplEx

# Available models: TransE, RotatE, ComplEx, DistMult, ConvE, TuckER, HolE, RESCAL

Query the Knowledge Graph¶

# Execute Cypher queries
netintel-ocr kg query "MATCH (n:NetworkDevice) RETURN n LIMIT 10"
netintel-ocr kg query --format json "MATCH (n)-[r]->(m) RETURN n,r,m LIMIT 5"

# Find paths between entities
netintel-ocr kg path-find "Router-A" "Server-DB"
netintel-ocr kg path-find --max-depth 5 --bidirectional "DMZ" "Internal"

# Get entity context
netintel-ocr kg entity-context "Firewall-Main"
netintel-ocr kg entity-context --expand-depth 2 --include-embeddings "Router-Core"

Similarity and Clustering¶

# Find similar entities
netintel-ocr kg find-similar "Router-A"
netintel-ocr kg find-similar --limit 10 --threshold 0.7 "Switch-Core"

# Compute similarity between entities
netintel-ocr kg similarity "Router-A" "Router-B"
netintel-ocr kg similarity --method cosine "Server-1" "Server-2"

# Cluster entities by embeddings
netintel-ocr kg cluster
netintel-ocr kg cluster --n-clusters 5 --method kmeans
netintel-ocr kg cluster --min-samples 3 --eps 0.5 --method dbscan

Advanced Retrieval¶

# Classify query intent
netintel-ocr kg classify-query "What connects to the firewall?"
netintel-ocr kg classify-query --verbose "Show network topology"

# Hybrid search with multiple strategies
netintel-ocr kg hybrid-search "security vulnerabilities in DMZ"
netintel-ocr kg hybrid-search \
  --strategy adaptive \
  --limit 20 \
  --expand-hops 3 \
  "database connections"

# Compare retrieval strategies
netintel-ocr kg compare-strategies "network redundancy paths"
netintel-ocr kg compare-strategies --detailed --format json "firewall rules"

# RAG-enhanced query
netintel-ocr kg rag-query "What are the security implications?"
netintel-ocr kg rag-query \
  --mode hybrid \
  --context-depth 2 \
  --temperature 0.7 \
  "explain the network architecture"

Batch Operations¶

# Process batch queries
netintel-ocr kg batch-query queries.txt
netintel-ocr kg batch-query --output results.json --parallel 4 queries.txt

# Format for queries.txt:
# What connects to Router-A?
# Find path from DMZ to Database
# Show similar devices to Firewall-1

Visualization¶

# Visualize embeddings
netintel-ocr kg visualize
netintel-ocr kg visualize --method tsne --dimensions 2
netintel-ocr kg visualize --method pca --dimensions 3 --output embeddings.html
netintel-ocr kg visualize --color-by type --save-plot embeddings.png

Export and Import¶

# Export Knowledge Graph
netintel-ocr kg export --format cypher --output network.cypher
netintel-ocr kg export --format json --output graph.json
netintel-ocr kg export --format graphml --output network.graphml

# Include embeddings in export
netintel-ocr kg export --include-embeddings --format json --output full_graph.json

Query Types¶

The system supports 6 query types:

Entity-Centric: Information about specific components
Relational: Connection and dependency queries
Topological: Path finding and network analysis
Semantic: Content-based similarity search
Analytical: Aggregations and statistics
Exploratory: Pattern discovery

Example Python Usage¶

# Python API usage
from netintel_ocr.kg import HybridSystem, FalkorDBManager, HybridRetriever

# Initialize system
manager = FalkorDBManager(host="localhost", port=6379)
hybrid = HybridSystem(manager)

# Process document
results = await hybrid.process_document("document.pdf")

# Initialize retriever
retriever = HybridRetriever(manager)

# Perform searches
entity_results = await retriever.hybrid_search(
    query="Router-Core-1",
    strategy="graph_first"
)

path_results = await retriever.hybrid_search(
    query="path from DMZ-Switch to Internal-DB",
    strategy="adaptive"
)

Batch Processing with KG¶

Process Multiple Documents¶

# Batch process with KG (enabled by default)
netintel-ocr process batch *.pdf

# Batch with custom KG settings
netintel-ocr process batch \
  --kg-model ComplEx \
  --kg-epochs 200 \
  --max-parallel 4 \
  *.pdf

Building Unified Knowledge Base¶

# Ingest to shared knowledge graph
netintel-ocr process batch \
  --collection enterprise_kg \
  --kg-merge-strategy union \
  /docs/**/*.pdf

Integration with MiniRAG¶

Enhanced Retrieval¶

The KG system enhances MiniRAG (Retrieval Augmented Generation) with:

Graph-aware context: Include related entities in context
Path-based retrieval: Follow relationships for comprehensive answers
Hybrid scoring: Combine vector similarity with graph distance

MiniRAG Models

MiniRAG uses its own models (gemma3:4b-it-qat, qwen3-embedding:8b) for Q&A, which are separate from the ingestion models used during PDF processing.

# Process document with KG enabled (default)
netintel-ocr process pdf document.pdf

# Query with Enhanced MiniRAG
netintel-ocr kg rag-query "What are the dependencies of Service-A?"

# RAG query with specific options
netintel-ocr kg rag-query \
  --mode hybrid \
  --context-depth 2 \
  --temperature 0.7 \
  "explain the network topology"

Retrieval Strategies¶

# Use hybrid search with different strategies
# Vector-first (fast, good for content)
netintel-ocr kg hybrid-search --strategy vector_first "security policies"

# Graph-first (accurate for relationships)
netintel-ocr kg hybrid-search --strategy graph_first "what connects to firewall"

# Parallel (balanced approach)
netintel-ocr kg hybrid-search --strategy parallel "network redundancy"

# Adaptive (query-dependent, default)
netintel-ocr kg hybrid-search --strategy adaptive "database vulnerabilities"

# Compare all strategies for a query
netintel-ocr kg compare-strategies "network topology analysis"

Performance Optimization¶

Memory Management¶

# Limit graph size for large documents
netintel-ocr process pdf document.pdf \
  --kg-max-entities 10000 \
  --kg-max-relations 50000

# Stream processing for very large graphs
netintel-ocr process pdf large_document.pdf \
  --kg-streaming \
  --kg-chunk-size 1000

GPU Acceleration¶

# Enable GPU for embeddings training
netintel-ocr process pdf document.pdf \
  --kg-gpu \
  --kg-device cuda:0

# Multi-GPU training
netintel-ocr process pdf document.pdf \
  --kg-gpu \
  --kg-device cuda:0,cuda:1 \
  --kg-distributed

Docker Deployment¶

Quick Start with Docker Compose¶

# docker-compose.kg.yml
version: '3.8'

services:
  falkordb:
    image: falkordb/falkordb:latest
    ports:
      - "6379:6379"
    volumes:
      - falkordb_data:/data

  milvus:
    image: milvusdb/milvus:latest
    ports:
      - "19530:19530"
    volumes:
      - milvus_data:/var/lib/milvus

  netintel-ocr:
    image: visionml/netintel-ocr:v0.1.17
    environment:
      - FALKORDB_HOST=falkordb
      - MILVUS_HOST=milvus:19530
      - OLLAMA_HOST=http://your-ollama-server:11434  # External Ollama
    volumes:
      - ./documents:/documents
      - ./output:/output

volumes:
  falkordb_data:
  milvus_data:

Start the stack:

docker-compose -f docker-compose.kg.yml up -d

Kubernetes Deployment¶

Helm Installation¶

# Add NetIntel-OCR helm repo
helm repo add netintel https://visionml.net/helm
helm repo update

# Install with KG enabled and external Ollama
helm install netintel-ocr netintel/netintel-ocr \
  --set kg.enabled=true \
  --set falkordb.enabled=true \
  --set milvus.enabled=true \
  --set ollama.host="http://your-ollama-server:11434"

Custom Values¶

# values-kg.yaml
kg:
  enabled: true
  model: RotatE
  epochs: 150
  batchSize: 384

ollama:
  host: "http://your-ollama-server:11434"  # External Ollama server

falkordb:
  enabled: true
  persistence:
    size: 10Gi

milvus:
  enabled: true
  persistence:
    size: 50Gi

Monitoring & Analytics¶

KG Statistics¶

# View graph statistics
netintel-ocr kg stats

# Detailed statistics in different formats
netintel-ocr kg stats --format json
netintel-ocr kg stats --format table
netintel-ocr kg stats --format summary

# View embedding statistics
netintel-ocr kg embedding-stats
netintel-ocr kg embedding-stats --detailed

# Example output:
# Graph Statistics:
#   Total nodes: 1,247
#   Total edges: 3,892
#   Node types: NetworkDevice(156), Service(89), Zone(12)
#   Edge types: CONNECTS_TO(2341), DEPENDS_ON(893), CONTAINS(658)
#   Average degree: 6.2
#   Connected components: 3

Training Monitoring¶

# Train with progress monitoring
netintel-ocr kg train-embeddings \
  --model RotatE \
  --epochs 150 \
  --verbose

# View training history
netintel-ocr kg embedding-stats --show-history

Troubleshooting¶

Common Issues¶

KG processing is slow:

# Reduce epochs for faster processing
netintel-ocr process pdf document.pdf --kg-epochs 50

# Or disable if not needed
netintel-ocr process pdf document.pdf --no-kg

Out of memory errors:

# Reduce batch size
netintel-ocr kg train-embeddings --batch-size 128

# Enable streaming mode
netintel-ocr process pdf document.pdf --kg-streaming

FalkorDB connection issues:

# Check FalkorDB status
redis-cli -h localhost -p 6379 ping

# Verify graph module
redis-cli MODULE LIST

Debug Mode¶

# Enable debug output
netintel-ocr --debug process pdf document.pdf --kg-verbose

# Save intermediate results
netintel-ocr process pdf document.pdf \
  --kg-save-intermediate \
  --output-dir ./debug

API Reference¶

Python API¶

import os
from netintel_ocr.kg import KnowledgeGraphSystem

# Configure external Ollama
os.environ['OLLAMA_HOST'] = "http://your-ollama-server:11434"

# Initialize KG system
kg_system = KnowledgeGraphSystem(
    falkordb_host="localhost",
    falkordb_port=6379,
    model="RotatE",
    epochs=100,
    ollama_host=os.environ.get('OLLAMA_HOST', 'http://localhost:11434')
)

# Process document
graph = kg_system.process_document("document.pdf")

# Query graph
results = kg_system.query(
    query_type="entity_centric",
    entity="Router-A"
)

# Export graph
kg_system.export(
    format="cypher",
    output="network_graph.cypher"
)

REST API¶

# Process with KG
curl -X POST http://localhost:8000/process \
  -F "[email protected]" \
  -F "enable_kg=true" \
  -F "kg_model=RotatE"

# Query KG
curl -X GET http://localhost:8000/kg/query \
  -d "entity=Router-A" \
  -d "hops=2"

Best Practices¶

Model Selection:
Use TransE for simple, hierarchical networks
Use RotatE (default) for complex topologies
Use ComplEx for bidirectional relationships
Performance:
Start with 100 epochs, increase if needed
Use GPU for documents > 50 pages
Enable streaming for very large graphs
Integration:
Always persist graphs to FalkorDB for reuse
Combine with vector search for best results
Use batch processing for document sets

Migration from v0.1.16¶

If upgrading from v0.1.16:

KG is now enabled by default - no flags needed
Dependencies are included - no separate install required
Use --no-kg to disable if you want v0.1.16 behavior

# v0.1.16 behavior (no KG)
netintel-ocr process pdf document.pdf --no-kg

# v0.1.17 default (with KG)
netintel-ocr process pdf document.pdf

Additional Resources¶

Complete KG Command Reference¶

All Available Commands¶

Command	Description	Example
`check-requirements`	Check if all requirements are installed	`netintel-ocr kg check-requirements`
`init`	Initialize FalkorDB indices and schema	`netintel-ocr kg init`
`stats`	Display Knowledge Graph statistics	`netintel-ocr kg stats --format json`
`process`	Process document with KG generation	`netintel-ocr kg process document.pdf`
`query`	Execute Cypher query on the graph	`netintel-ocr kg query "MATCH (n) RETURN n"`
`train-embeddings`	Train PyKEEN KG embeddings	`netintel-ocr kg train-embeddings --model RotatE`
`embedding-stats`	Display embedding statistics	`netintel-ocr kg embedding-stats`
`similarity`	Compute similarity between entities	`netintel-ocr kg similarity "A" "B"`
`find-similar`	Find similar entities	`netintel-ocr kg find-similar "Router-A"`
`visualize`	Visualize embeddings in 2D/3D	`netintel-ocr kg visualize --method tsne`
`cluster`	Cluster entities by embeddings	`netintel-ocr kg cluster --n-clusters 5`
`path-find`	Find paths between entities	`netintel-ocr kg path-find "A" "B"`
`entity-context`	Get rich context for entity	`netintel-ocr kg entity-context "Server-1"`
`rag-query`	Query using Enhanced MiniRAG	`netintel-ocr kg rag-query "explain topology"`
`classify-query`	Classify query intent	`netintel-ocr kg classify-query "what connects?"`
`hybrid-search`	Hybrid search with strategies	`netintel-ocr kg hybrid-search "security"`
`compare-strategies`	Compare retrieval strategies	`netintel-ocr kg compare-strategies "query"`
`batch-query`	Process batch queries	`netintel-ocr kg batch-query queries.txt`
`export`	Export Knowledge Graph	`netintel-ocr kg export --format json`

Quick Reference Card¶

# Essential Setup
netintel-ocr kg check-requirements            # Verify installation
netintel-ocr kg init                          # Initialize KG system
netintel-ocr kg stats                         # Check system status

# Document Processing
netintel-ocr process pdf document.pdf                     # Process with KG (default)
netintel-ocr process pdf document.pdf --no-kg            # Process without KG
netintel-ocr kg process document.pdf                      # Explicit KG processing

# Training & Embeddings
netintel-ocr kg train-embeddings             # Train with defaults
netintel-ocr kg train-embeddings --force     # Force retrain
netintel-ocr kg embedding-stats              # View embedding info

# Querying
netintel-ocr kg query "MATCH (n) RETURN n"   # Cypher query
netintel-ocr kg rag-query "explain this"     # Natural language query
netintel-ocr kg hybrid-search "topic"        # Hybrid search

# Analysis
netintel-ocr kg find-similar "entity"        # Find similar entities
netintel-ocr kg path-find "A" "B"           # Find paths
netintel-ocr kg cluster                      # Cluster entities

# Export
netintel-ocr kg export --format json         # Export as JSON
netintel-ocr kg export --format cypher       # Export as Cypher

Support¶

For KG-related issues:

# View available commands and options
netintel-ocr kg --help

# Get help for specific command
netintel-ocr kg init --help
netintel-ocr kg train-embeddings --help

# Check system status
netintel-ocr kg stats --format json

Common Troubleshooting Commands¶

# Check all requirements first
netintel-ocr kg check-requirements --verbose

# Verify FalkorDB connection
netintel-ocr kg init

# Check if embeddings exist
netintel-ocr kg embedding-stats

# Test with simple query
netintel-ocr kg query "MATCH (n) RETURN count(n)"

# Verify MiniRAG models (separate from ingestion)
curl $OLLAMA_HOST/api/tags | grep -E "gemma3|qwen3-embedding"

Contact support with diagnostic output for faster resolution.