Skip to content

Knowledge Graph Processing

Overview

NetIntel-OCR v0.1.17 introduces powerful Knowledge Graph (KG) capabilities that automatically extract and structure relationships from your network documentation. The system creates semantic graphs from network diagrams, flow charts, and technical text, enabling advanced querying and relationship analysis.

Model Categories

NetIntel-OCR uses two distinct categories of models:

INGESTION MODELS (for PDF processing): - qwen2.5vl:7b - Network diagram analysis - Nanonets-OCR-s:latest - OCR text extraction

MINIRAG MODELS (for Q&A after ingestion): - gemma3:4b-it-qat - Answer generation - qwen3-embedding:8b - Semantic search

These are separate model sets - ingestion models process documents, MiniRAG models enable Q&A!

New in v0.1.17

Knowledge Graph processing is enabled by default in v0.1.17. No additional flags needed!

Quick Start

Basic Usage

# Process with KG enabled (default)
netintel-ocr process pdf document.pdf

# Explicitly disable KG if not needed
netintel-ocr process pdf document.pdf --no-kg

What Gets Extracted

The Knowledge Graph system automatically identifies and extracts:

  • Network Components: Routers, switches, firewalls, servers, load balancers
  • Relationships: Connections, data flows, dependencies, configurations
  • Attributes: IP addresses, VLANs, protocols, ports, bandwidths
  • Topologies: Network paths, redundancy patterns, hierarchies
  • Business Context: Services, applications, security zones

Architecture

Components

graph LR
    A[PDF Document] --> B[OCR Engine]
    B --> C[KG Constructor]
    C --> D[FalkorDB]
    C --> E[PyKEEN Embeddings]
    D --> F[Graph Queries]
    E --> F
    F --> G[Hybrid Retrieval]

Storage Layers

  1. FalkorDB: Graph database for storing entities and relationships
  2. Milvus: Vector database for text embeddings (4096D)
  3. KG Embeddings: 200D knowledge graph embeddings stored as properties

Configuration

KG Model Selection

Choose from 8 different embedding models based on your use case:

# TransE - Fast, good for simple relationships
netintel-ocr process pdf document.pdf --kg-model TransE

# RotatE - Best for complex relationships (default)
netintel-ocr process pdf document.pdf --kg-model RotatE

# ComplEx - Good for symmetric relationships
netintel-ocr process pdf document.pdf --kg-model ComplEx

# Available models: TransE, RotatE, ComplEx, DistMult, ConvE, TuckER, HolE, RESCAL

Training Parameters

# Customize training epochs (default: 100)
netintel-ocr process pdf document.pdf --kg-epochs 200

# Adjust batch size (default: 256)
netintel-ocr kg train-embeddings --batch-size 512

# Combined configuration
netintel-ocr process pdf document.pdf \
  --kg-model RotatE \
  --kg-epochs 150 \
  --kg-batch-size 384

External Services Configuration

# Configure external Ollama server
export OLLAMA_HOST="http://your-ollama-server:11434"

# Verify Ollama models are available
curl $OLLAMA_HOST/api/tags | jq '.models[].name'

# Models for different purposes:
# INGESTION: qwen2.5vl:7b, Nanonets-OCR-s:latest
# MINIRAG: gemma3:4b-it-qat, qwen3-embedding:8b

# Custom FalkorDB host/port
netintel-ocr process pdf document.pdf \
  --falkordb-host 192.168.1.100 \
  --falkordb-port 6379

# Using environment variables
export FALKORDB_HOST=falkordb.local
export FALKORDB_PORT=6379
export OLLAMA_HOST="http://192.168.1.100:11434"
netintel-ocr process pdf document.pdf

Processing Modes

Network Diagrams

When processing network diagrams, the KG system:

  1. Identifies network components from visual elements
  2. Extracts connection relationships from lines/arrows
  3. Preserves spatial layout information
  4. Links related text annotations
# Process network-only with KG
netintel-ocr process pdf document.pdf --network-only

# Output includes:
# - network_topology.json (graph structure)
# - kg_embeddings.npy (learned embeddings)
# - relationships.cypher (import queries)

Flow Diagrams

For flow diagrams and process charts:

  1. Extracts process steps as entities
  2. Maps flow direction as relationships
  3. Captures decision points and branches
  4. Associates metadata and conditions
# Process with flow detection (uses INGESTION model)
netintel-ocr process pdf document.pdf --flow-model qwen2.5vl:7b  # Ingestion model, NOT MiniRAG

Hybrid Processing

Combines multiple extraction methods:

# Full hybrid processing (default)
netintel-ocr process pdf document.pdf

# This enables:
# - Network diagram KG extraction
# - Flow diagram relationship mapping
# - Table structure preservation
# - Text entity recognition

Knowledge Graph CLI Commands

Check System Requirements

# Check if all KG requirements are installed
netintel-ocr kg check-requirements

# Check with verbose output
netintel-ocr kg check-requirements --verbose

Initialize KG System

# Initialize FalkorDB indices and schema
netintel-ocr kg init
netintel-ocr kg init --falkordb-host localhost --falkordb-port 6379

# With authentication
netintel-ocr kg init --password your_password --graph-name custom_kg

Process Documents with KG

# Process a document with KG generation
netintel-ocr kg process document.pdf
netintel-ocr kg process --model RotatE --epochs 100 document.pdf

# Process with specific configuration
netintel-ocr kg process \
  --kg-model ComplEx \
  --batch-size 512 \
  --force-retrain \
  document.pdf

View Statistics

# Display Knowledge Graph statistics
netintel-ocr kg stats
netintel-ocr kg stats --format json
netintel-ocr kg stats --format table

# Display embedding statistics
netintel-ocr kg embedding-stats
netintel-ocr kg embedding-stats --detailed

Train KG Embeddings

# Train embeddings with PyKEEN
netintel-ocr kg train-embeddings
netintel-ocr kg train-embeddings --model RotatE --epochs 150

# Force retrain existing embeddings
netintel-ocr kg train-embeddings --force --model ComplEx

# Available models: TransE, RotatE, ComplEx, DistMult, ConvE, TuckER, HolE, RESCAL

Query the Knowledge Graph

# Execute Cypher queries
netintel-ocr kg query "MATCH (n:NetworkDevice) RETURN n LIMIT 10"
netintel-ocr kg query --format json "MATCH (n)-[r]->(m) RETURN n,r,m LIMIT 5"

# Find paths between entities
netintel-ocr kg path-find "Router-A" "Server-DB"
netintel-ocr kg path-find --max-depth 5 --bidirectional "DMZ" "Internal"

# Get entity context
netintel-ocr kg entity-context "Firewall-Main"
netintel-ocr kg entity-context --expand-depth 2 --include-embeddings "Router-Core"

Similarity and Clustering

# Find similar entities
netintel-ocr kg find-similar "Router-A"
netintel-ocr kg find-similar --limit 10 --threshold 0.7 "Switch-Core"

# Compute similarity between entities
netintel-ocr kg similarity "Router-A" "Router-B"
netintel-ocr kg similarity --method cosine "Server-1" "Server-2"

# Cluster entities by embeddings
netintel-ocr kg cluster
netintel-ocr kg cluster --n-clusters 5 --method kmeans
netintel-ocr kg cluster --min-samples 3 --eps 0.5 --method dbscan

Advanced Retrieval

# Classify query intent
netintel-ocr kg classify-query "What connects to the firewall?"
netintel-ocr kg classify-query --verbose "Show network topology"

# Hybrid search with multiple strategies
netintel-ocr kg hybrid-search "security vulnerabilities in DMZ"
netintel-ocr kg hybrid-search \
  --strategy adaptive \
  --limit 20 \
  --expand-hops 3 \
  "database connections"

# Compare retrieval strategies
netintel-ocr kg compare-strategies "network redundancy paths"
netintel-ocr kg compare-strategies --detailed --format json "firewall rules"

# RAG-enhanced query
netintel-ocr kg rag-query "What are the security implications?"
netintel-ocr kg rag-query \
  --mode hybrid \
  --context-depth 2 \
  --temperature 0.7 \
  "explain the network architecture"

Batch Operations

# Process batch queries
netintel-ocr kg batch-query queries.txt
netintel-ocr kg batch-query --output results.json --parallel 4 queries.txt

# Format for queries.txt:
# What connects to Router-A?
# Find path from DMZ to Database
# Show similar devices to Firewall-1

Visualization

# Visualize embeddings
netintel-ocr kg visualize
netintel-ocr kg visualize --method tsne --dimensions 2
netintel-ocr kg visualize --method pca --dimensions 3 --output embeddings.html
netintel-ocr kg visualize --color-by type --save-plot embeddings.png

Export and Import

# Export Knowledge Graph
netintel-ocr kg export --format cypher --output network.cypher
netintel-ocr kg export --format json --output graph.json
netintel-ocr kg export --format graphml --output network.graphml

# Include embeddings in export
netintel-ocr kg export --include-embeddings --format json --output full_graph.json

Query Types

The system supports 6 query types:

  1. Entity-Centric: Information about specific components
  2. Relational: Connection and dependency queries
  3. Topological: Path finding and network analysis
  4. Semantic: Content-based similarity search
  5. Analytical: Aggregations and statistics
  6. Exploratory: Pattern discovery

Example Python Usage

# Python API usage
from netintel_ocr.kg import HybridSystem, FalkorDBManager, HybridRetriever

# Initialize system
manager = FalkorDBManager(host="localhost", port=6379)
hybrid = HybridSystem(manager)

# Process document
results = await hybrid.process_document("document.pdf")

# Initialize retriever
retriever = HybridRetriever(manager)

# Perform searches
entity_results = await retriever.hybrid_search(
    query="Router-Core-1",
    strategy="graph_first"
)

path_results = await retriever.hybrid_search(
    query="path from DMZ-Switch to Internal-DB",
    strategy="adaptive"
)

Batch Processing with KG

Process Multiple Documents

# Batch process with KG (enabled by default)
netintel-ocr process batch *.pdf

# Batch with custom KG settings
netintel-ocr process batch \
  --kg-model ComplEx \
  --kg-epochs 200 \
  --max-parallel 4 \
  *.pdf

Building Unified Knowledge Base

# Ingest to shared knowledge graph
netintel-ocr process batch \
  --collection enterprise_kg \
  --kg-merge-strategy union \
  /docs/**/*.pdf

Integration with MiniRAG

Enhanced Retrieval

The KG system enhances MiniRAG (Retrieval Augmented Generation) with:

  • Graph-aware context: Include related entities in context
  • Path-based retrieval: Follow relationships for comprehensive answers
  • Hybrid scoring: Combine vector similarity with graph distance

MiniRAG Models

MiniRAG uses its own models (gemma3:4b-it-qat, qwen3-embedding:8b) for Q&A, which are separate from the ingestion models used during PDF processing.

# Process document with KG enabled (default)
netintel-ocr process pdf document.pdf

# Query with Enhanced MiniRAG
netintel-ocr kg rag-query "What are the dependencies of Service-A?"

# RAG query with specific options
netintel-ocr kg rag-query \
  --mode hybrid \
  --context-depth 2 \
  --temperature 0.7 \
  "explain the network topology"

Retrieval Strategies

# Use hybrid search with different strategies
# Vector-first (fast, good for content)
netintel-ocr kg hybrid-search --strategy vector_first "security policies"

# Graph-first (accurate for relationships)
netintel-ocr kg hybrid-search --strategy graph_first "what connects to firewall"

# Parallel (balanced approach)
netintel-ocr kg hybrid-search --strategy parallel "network redundancy"

# Adaptive (query-dependent, default)
netintel-ocr kg hybrid-search --strategy adaptive "database vulnerabilities"

# Compare all strategies for a query
netintel-ocr kg compare-strategies "network topology analysis"

Performance Optimization

Memory Management

# Limit graph size for large documents
netintel-ocr process pdf document.pdf \
  --kg-max-entities 10000 \
  --kg-max-relations 50000

# Stream processing for very large graphs
netintel-ocr process pdf large_document.pdf \
  --kg-streaming \
  --kg-chunk-size 1000

GPU Acceleration

# Enable GPU for embeddings training
netintel-ocr process pdf document.pdf \
  --kg-gpu \
  --kg-device cuda:0

# Multi-GPU training
netintel-ocr process pdf document.pdf \
  --kg-gpu \
  --kg-device cuda:0,cuda:1 \
  --kg-distributed

Docker Deployment

Quick Start with Docker Compose

# docker-compose.kg.yml
version: '3.8'

services:
  falkordb:
    image: falkordb/falkordb:latest
    ports:
      - "6379:6379"
    volumes:
      - falkordb_data:/data

  milvus:
    image: milvusdb/milvus:latest
    ports:
      - "19530:19530"
    volumes:
      - milvus_data:/var/lib/milvus

  netintel-ocr:
    image: visionml/netintel-ocr:v0.1.17
    environment:
      - FALKORDB_HOST=falkordb
      - MILVUS_HOST=milvus:19530
      - OLLAMA_HOST=http://your-ollama-server:11434  # External Ollama
    volumes:
      - ./documents:/documents
      - ./output:/output

volumes:
  falkordb_data:
  milvus_data:

Start the stack:

docker-compose -f docker-compose.kg.yml up -d

Kubernetes Deployment

Helm Installation

# Add NetIntel-OCR helm repo
helm repo add netintel https://visionml.net/helm
helm repo update

# Install with KG enabled and external Ollama
helm install netintel-ocr netintel/netintel-ocr \
  --set kg.enabled=true \
  --set falkordb.enabled=true \
  --set milvus.enabled=true \
  --set ollama.host="http://your-ollama-server:11434"

Custom Values

# values-kg.yaml
kg:
  enabled: true
  model: RotatE
  epochs: 150
  batchSize: 384

ollama:
  host: "http://your-ollama-server:11434"  # External Ollama server

falkordb:
  enabled: true
  persistence:
    size: 10Gi

milvus:
  enabled: true
  persistence:
    size: 50Gi

Monitoring & Analytics

KG Statistics

# View graph statistics
netintel-ocr kg stats

# Detailed statistics in different formats
netintel-ocr kg stats --format json
netintel-ocr kg stats --format table
netintel-ocr kg stats --format summary

# View embedding statistics
netintel-ocr kg embedding-stats
netintel-ocr kg embedding-stats --detailed

# Example output:
# Graph Statistics:
#   Total nodes: 1,247
#   Total edges: 3,892
#   Node types: NetworkDevice(156), Service(89), Zone(12)
#   Edge types: CONNECTS_TO(2341), DEPENDS_ON(893), CONTAINS(658)
#   Average degree: 6.2
#   Connected components: 3

Training Monitoring

# Train with progress monitoring
netintel-ocr kg train-embeddings \
  --model RotatE \
  --epochs 150 \
  --verbose

# View training history
netintel-ocr kg embedding-stats --show-history

Troubleshooting

Common Issues

KG processing is slow:

# Reduce epochs for faster processing
netintel-ocr process pdf document.pdf --kg-epochs 50

# Or disable if not needed
netintel-ocr process pdf document.pdf --no-kg

Out of memory errors:

# Reduce batch size
netintel-ocr kg train-embeddings --batch-size 128

# Enable streaming mode
netintel-ocr process pdf document.pdf --kg-streaming

FalkorDB connection issues:

# Check FalkorDB status
redis-cli -h localhost -p 6379 ping

# Verify graph module
redis-cli MODULE LIST

Debug Mode

# Enable debug output
netintel-ocr --debug process pdf document.pdf --kg-verbose

# Save intermediate results
netintel-ocr process pdf document.pdf \
  --kg-save-intermediate \
  --output-dir ./debug

API Reference

Python API

import os
from netintel_ocr.kg import KnowledgeGraphSystem

# Configure external Ollama
os.environ['OLLAMA_HOST'] = "http://your-ollama-server:11434"

# Initialize KG system
kg_system = KnowledgeGraphSystem(
    falkordb_host="localhost",
    falkordb_port=6379,
    model="RotatE",
    epochs=100,
    ollama_host=os.environ.get('OLLAMA_HOST', 'http://localhost:11434')
)

# Process document
graph = kg_system.process_document("document.pdf")

# Query graph
results = kg_system.query(
    query_type="entity_centric",
    entity="Router-A"
)

# Export graph
kg_system.export(
    format="cypher",
    output="network_graph.cypher"
)

REST API

# Process with KG
curl -X POST http://localhost:8000/process \
  -F "[email protected]" \
  -F "enable_kg=true" \
  -F "kg_model=RotatE"

# Query KG
curl -X GET http://localhost:8000/kg/query \
  -d "entity=Router-A" \
  -d "hops=2"

Best Practices

  1. Model Selection:
  2. Use TransE for simple, hierarchical networks
  3. Use RotatE (default) for complex topologies
  4. Use ComplEx for bidirectional relationships

  5. Performance:

  6. Start with 100 epochs, increase if needed
  7. Use GPU for documents > 50 pages
  8. Enable streaming for very large graphs

  9. Integration:

  10. Always persist graphs to FalkorDB for reuse
  11. Combine with vector search for best results
  12. Use batch processing for document sets

Migration from v0.1.16

If upgrading from v0.1.16:

  1. KG is now enabled by default - no flags needed
  2. Dependencies are included - no separate install required
  3. Use --no-kg to disable if you want v0.1.16 behavior
# v0.1.16 behavior (no KG)
netintel-ocr process pdf document.pdf --no-kg

# v0.1.17 default (with KG)
netintel-ocr process pdf document.pdf

Additional Resources

Complete KG Command Reference

All Available Commands

Command Description Example
check-requirements Check if all requirements are installed netintel-ocr kg check-requirements
init Initialize FalkorDB indices and schema netintel-ocr kg init
stats Display Knowledge Graph statistics netintel-ocr kg stats --format json
process Process document with KG generation netintel-ocr kg process document.pdf
query Execute Cypher query on the graph netintel-ocr kg query "MATCH (n) RETURN n"
train-embeddings Train PyKEEN KG embeddings netintel-ocr kg train-embeddings --model RotatE
embedding-stats Display embedding statistics netintel-ocr kg embedding-stats
similarity Compute similarity between entities netintel-ocr kg similarity "A" "B"
find-similar Find similar entities netintel-ocr kg find-similar "Router-A"
visualize Visualize embeddings in 2D/3D netintel-ocr kg visualize --method tsne
cluster Cluster entities by embeddings netintel-ocr kg cluster --n-clusters 5
path-find Find paths between entities netintel-ocr kg path-find "A" "B"
entity-context Get rich context for entity netintel-ocr kg entity-context "Server-1"
rag-query Query using Enhanced MiniRAG netintel-ocr kg rag-query "explain topology"
classify-query Classify query intent netintel-ocr kg classify-query "what connects?"
hybrid-search Hybrid search with strategies netintel-ocr kg hybrid-search "security"
compare-strategies Compare retrieval strategies netintel-ocr kg compare-strategies "query"
batch-query Process batch queries netintel-ocr kg batch-query queries.txt
export Export Knowledge Graph netintel-ocr kg export --format json

Quick Reference Card

# Essential Setup
netintel-ocr kg check-requirements            # Verify installation
netintel-ocr kg init                          # Initialize KG system
netintel-ocr kg stats                         # Check system status

# Document Processing
netintel-ocr process pdf document.pdf                     # Process with KG (default)
netintel-ocr process pdf document.pdf --no-kg            # Process without KG
netintel-ocr kg process document.pdf                      # Explicit KG processing

# Training & Embeddings
netintel-ocr kg train-embeddings             # Train with defaults
netintel-ocr kg train-embeddings --force     # Force retrain
netintel-ocr kg embedding-stats              # View embedding info

# Querying
netintel-ocr kg query "MATCH (n) RETURN n"   # Cypher query
netintel-ocr kg rag-query "explain this"     # Natural language query
netintel-ocr kg hybrid-search "topic"        # Hybrid search

# Analysis
netintel-ocr kg find-similar "entity"        # Find similar entities
netintel-ocr kg path-find "A" "B"           # Find paths
netintel-ocr kg cluster                      # Cluster entities

# Export
netintel-ocr kg export --format json         # Export as JSON
netintel-ocr kg export --format cypher       # Export as Cypher

Support

For KG-related issues:

# View available commands and options
netintel-ocr kg --help

# Get help for specific command
netintel-ocr kg init --help
netintel-ocr kg train-embeddings --help

# Check system status
netintel-ocr kg stats --format json

Common Troubleshooting Commands

# Check all requirements first
netintel-ocr kg check-requirements --verbose

# Verify FalkorDB connection
netintel-ocr kg init

# Check if embeddings exist
netintel-ocr kg embedding-stats

# Test with simple query
netintel-ocr kg query "MATCH (n) RETURN count(n)"

# Verify MiniRAG models (separate from ingestion)
curl $OLLAMA_HOST/api/tags | grep -E "gemma3|qwen3-embedding"

Contact support with diagnostic output for faster resolution.