MiniRAG - Enhanced Retrieval with Knowledge Graphs¶
Overview¶
MiniRAG is NetIntel-OCR's advanced Retrieval-Augmented Generation system that combines traditional vector search with Knowledge Graph context for more accurate, explainable, and context-aware answers. It leverages both structured graph data and unstructured text to provide comprehensive responses to complex queries.
MiniRAG vs Ingestion Models
MiniRAG models are used AFTER document ingestion for Q&A and retrieval:
- gemma3:4b-it-qat - For generating answers to questions
- qwen3-embedding:8b - For semantic search during retrieval
Ingestion models are used DURING PDF processing:
- qwen2.5vl:7b - For analyzing network diagrams
- Nanonets-OCR-s:latest - For OCR text extraction
These are completely separate model sets serving different purposes!
Architecture¶
┌────────────────────────────────────────┐
│ User Query │
└────────────┬───────────────────────────┘
│
┌────────────▼───────────────────────────┐
│ Query Intent Classifier │
│ (Determines optimal retrieval path) │
└────────────┬───────────────────────────┘
│
┌────────┴────────┬──────────┬──────────┐
▼ ▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌─────────┐ ┌──────────┐
│ FalkorDB │ │ PyKEEN │ │ Milvus │ │ Ollama │
│ Graph │ │Embeddings│ │ Vectors │ │ LLM │
└─────────┘ └──────────┘ └─────────┘ └──────────┘
│ │ │ │
└────────┬────────┴──────────┴──────────┘
│
┌────────────▼───────────────────────────┐
│ Result Fusion (RRF) │
│ (Combines multiple retrieval paths) │
└────────────┬───────────────────────────┘
│
┌────────────▼───────────────────────────┐
│ Context-Enhanced Response │
└────────────────────────────────────────┘
Installation and Setup¶
Prerequisites¶
# Ensure NetIntel-OCR v0.1.17+ is installed
pip install netintel-ocr>=0.1.17
# Configure external Ollama server
export OLLAMA_HOST="http://your-ollama-server:11434"
# Verify Ollama has MiniRAG models (NOT ingestion models)
curl $OLLAMA_HOST/api/tags | jq '.models[].name'
# MiniRAG MODELS ONLY (for Q&A after ingestion):
# - qwen3-embedding:8b (for semantic search embeddings)
# - gemma3:4b-it-qat (for answer generation)
# Pull MiniRAG models if missing
curl -X POST $OLLAMA_HOST/api/pull -d '{"name":"qwen3-embedding:8b"}'
curl -X POST $OLLAMA_HOST/api/pull -d '{"name":"gemma3:4b-it-qat"}'
# Note: Ingestion models (qwen2.5vl:7b, Nanonets-OCR-s) are separate!
# Start required graph/vector services
docker-compose up -d falkordb milvus
# Verify MiniRAG components
netintel-ocr rag check
# Output:
# ✅ FalkorDB: Connected
# ✅ Milvus: Connected
# ✅ Ollama: Connected (http://your-ollama-server:11434)
# ✅ MiniRAG: Ready
# ✅ Available Models: qwen3-embedding:8b, gemma3:4b-it-qat
Initialize MiniRAG¶
# Initialize MiniRAG with Q&A models (NOT ingestion models)
netintel-ocr rag init \
--collection network_docs \
--llm-model gemma3:4b-it-qat \ # MiniRAG answer generation
--embedding-model qwen3-embedding:8b # MiniRAG semantic search
# Advanced initialization with custom settings
netintel-ocr rag init \
--collection production_infrastructure \
--llm-model gemma3:4b-it-qat \ # MiniRAG model (NOT ingestion)
--embedding-model qwen3-embedding:8b \ # MiniRAG model (NOT ingestion)
--chunk-size 512 \
--chunk-overlap 50 \
--temperature 0.7 \
--max-tokens 2000 \
--ollama-host $OLLAMA_HOST
Basic Usage¶
Simple Question Answering¶
# Ask a question about your processed telecom infrastructure
netintel-ocr rag query \
--question "What are the main components of our 5G network architecture?" \
--collection telecom_docs
# Output:
# Question: What are the main components of our 5G network architecture?
#
# Answer: Based on the analyzed documentation, your 5G network architecture consists of:
#
# RAN Infrastructure:
# • 12,000 gNodeBs (5G base stations) across 3 frequency bands
# • 8,000 Small cells for urban densification
# • 450 Distributed Units (DUs) for edge processing
# • 45 Centralized Units (CUs) for baseband processing
#
# 5G Core Network:
# • AMF (Access and Mobility Management Function) - 4 instances
# • SMF (Session Management Function) - 6 instances
# • UPF (User Plane Function) - 24 edge instances
# • PCF (Policy Control Function) - 2 instances
# • UDM (Unified Data Management) - 2 instances
#
# Transport Network:
# • MPLS backbone with 100G/400G links
# • Segment Routing for network slicing
# • Edge compute nodes at 150 locations
#
# Sources: network-topology.pdf (pages 3-5), infrastructure-design.pdf (pages 12-14)
# Confidence: 0.94
Query with Context¶
# Include surrounding context for better answers
netintel-ocr rag query \
--question "How does traffic flow from a UE to the internet in our 5G network?" \
--include-context \
--context-window 3 \
--collection telecom_docs
# Output includes:
# - Direct answer with traffic flow path
# - Network diagram visualization
# - Security checkpoints along the path
# - Relevant firewall rules
# - Performance metrics for each hop
Query Modes¶
1. Graph Mode - Structured Data Queries¶
Best for queries about relationships, dependencies, and network topology.
# Query using only Knowledge Graph
netintel-ocr rag query \
--mode graph \
--question "List all network functions in the 5G Core" \
--collection telecom_infrastructure
# Output:
# Network Functions in 5G Core (from Knowledge Graph):
#
# Control Plane Functions:
# • AMF-01, AMF-02 (Access & Mobility Management)
# • SMF-01 to SMF-06 (Session Management)
# • PCF-01, PCF-02 (Policy Control)
# • UDM-01, UDM-02 (Unified Data Management)
# • AUSF-01 (Authentication Server)
# • NSSF-01 (Network Slice Selection)
#
# User Plane Functions:
# • UPF-EDGE-01 to UPF-EDGE-24 (Edge locations)
# • UPF-CORE-01 to UPF-CORE-04 (Core locations)
#
# Supporting Functions:
# • NRF-01, NRF-02 (Network Repository)
# • NEF-01 (Network Exposure Function)
#
# Total: 8 devices
# Graph traversal time: 12ms
2. Vector Mode - Unstructured Text Queries¶
Best for policy questions, procedures, and descriptive content.
# Query using only vector search
netintel-ocr rag query \
--mode vector \
--question "What are the password policy requirements?" \
--collection security_policies
# Output:
# Password Policy Requirements (from document search):
#
# According to the Security Policy document (v2.3):
#
# 1. Minimum Length: 12 characters
# 2. Complexity Requirements:
# • At least one uppercase letter
# • At least one lowercase letter
# • At least one number
# • At least one special character
# 3. Password History: Cannot reuse last 12 passwords
# 4. Expiration: 90 days
# 5. Account Lockout: 5 failed attempts
# 6. Multi-Factor Authentication: Required for privileged accounts
#
# Source: security-policy.pdf (page 23)
# Relevance Score: 0.96
3. Hybrid Mode - Combined Intelligence¶
Best for complex queries requiring both structured and unstructured data.
# Query using hybrid retrieval (default)
netintel-ocr rag query \
--mode hybrid \
--question "What would be the impact of upgrading the core router firmware?" \
--collection network_docs
# Output:
# Impact Analysis for Core Router Firmware Upgrade:
#
# From Knowledge Graph Analysis:
# • Affected Devices: 47 directly connected systems
# • Service Dependencies: 23 critical services rely on core router
# • Redundancy: Core-Router-02 can handle traffic during upgrade
# • Estimated Affected Users: 2,500
#
# From Documentation:
# • Upgrade Window: 2-4 hours (per upgrade guide)
# • Required Downtime: 15-30 minutes for failover
# • Rollback Procedure: Available (documented in section 4.3)
# • Known Issues: Memory leak fixed in new version (CVE-2024-1234)
#
# Risk Assessment: MEDIUM
# Recommendation: Schedule during maintenance window with failover testing
#
# Sources:
# - Knowledge Graph: 47 entities, 156 relationships analyzed
# - Documents: upgrade-guide.pdf (p.12), network-sop.pdf (p.45)
# Confidence: 0.91
4. Embedding Mode - Similarity Queries¶
Best for finding similar configurations or patterns.
# Query using KG embeddings
netintel-ocr rag query \
--mode embeddings \
--question "Find all firewalls with similar configurations to FW-PROD-01" \
--similarity-threshold 0.85 \
--collection network_configs
# Output:
# Firewalls Similar to FW-PROD-01:
#
# 1. FW-PROD-02 (Similarity: 0.98)
# • Location: Data Center 1
# • Config Match: 98% identical rules
# • Differences: NAT pool ranges
#
# 2. FW-DR-01 (Similarity: 0.92)
# • Location: DR Site
# • Config Match: 85% identical rules
# • Differences: IP ranges, VLAN assignments
#
# 3. FW-PROD-03 (Similarity: 0.87)
# • Location: Data Center 2
# • Config Match: 82% identical rules
# • Differences: Additional DMZ rules
#
# Embedding Model: RotatE (200D)
# Comparison Time: 34ms
Advanced Features¶
Multi-Hop Reasoning¶
# Complex reasoning across multiple relationships
netintel-ocr rag query \
--question "If the primary 5G Core AMF fails, what services are affected and what's the recovery plan?" \
--reasoning-depth 4 \
--include-alternatives \
--collection telecom_infrastructure
# Output:
# Multi-Hop Analysis for AMF Failure:
#
# Immediate Impact (1 hop):
# • UE Registration → Failed for new devices
# • Mobility Management → Handover failures
# • Session Management → Cannot establish new PDU sessions
#
# Cascading Impact (2-3 hops):
# • Voice Services (VoNR) → New calls fail
# • Network Slicing → Slice selection unavailable
# • Roaming Services → Inbound roamers affected
#
# Recovery Plan (from runbooks):
# 1. Automatic failover to DR database (RTO: 5 min)
# 2. If failover fails:
# a. Start manual recovery procedure
# b. Restore from backup (RPO: 1 hour)
# c. Replay transaction logs
# 3. Notify stakeholders per escalation matrix
#
# Alternative Paths:
# • Read-only mode using replica
# • Cache-based operations for 2 hours
# • Queue writes for later processing
Comparative Analysis¶
# Compare configurations or architectures
netintel-ocr rag compare \
--entities "gNodeB-NYC-001,gNodeB-NYC-002" \
--aspects "config,performance,capacity" \
--collection telecom_configs
# Output:
# Comparative Analysis: gNodeB-NYC-001 vs gNodeB-NYC-002
#
# ┌─────────────────┬───────────────┬───────────────┐
# │ Aspect │ gNodeB-NYC-001 │ gNodeB-NYC-002 │
# ├─────────────────┼───────────────┼───────────────┤
# │ Frequency Bands │ n77, n41, n5 │ n77, n41 │
# │ MIMO Config │ 64T64R │ 32T32R │
# │ Max UEs │ 4,000 │ 2,000 │
# │ Active UEs │ 3,421 │ 1,876 │
# │ Throughput │ 8.2 Gbps │ 4.7 Gbps │
# │ PRB Utilization │ 78% │ 82% │
# │ Handover Success│ 99.2% │ 98.7% │
# │ CPU Usage │ 62% │ 71% │
# │ Power Output │ 40W │ 20W │
# │ Last Config │ 2024-01-10 │ 2024-01-09 │
# └─────────────────┴───────────────┴───────────────┘
#
# Key Differences:
# • Production has 45 additional rules for specific services
# • DR has simplified NAT configuration
# • Configuration sync lag: 2 days
Temporal Queries¶
# Query with time context
netintel-ocr rag query \
--question "What changes were made to the network in the last 30 days?" \
--temporal \
--lookback-days 30 \
--include-changelog \
--collection network_docs
# Output:
# Network Changes (Last 30 Days):
#
# Week 1 (Jan 1-7):
# • Added VLAN 245 for new development team
# • Updated firewall rules for cloud migration
# • Replaced Switch-Access-12 (hardware failure)
#
# Week 2 (Jan 8-14):
# • Implemented new QoS policies
# • Added redundant link between DC1 and DC2
# • Updated routing tables for new subnet
#
# Week 3 (Jan 15-21):
# • Patched 15 devices for CVE-2024-1234
# • Migrated 3 services to cloud
# • Decommissioned legacy mail server
#
# Week 4 (Jan 22-28):
# • Upgraded core router firmware
# • Added new load balancer for web tier
# • Implemented zero-trust policies in DMZ
#
# Total Changes: 23
# Change Frequency: Increasing trend (+35%)
Compliance and Security Queries¶
Compliance Checking¶
# Check compliance with specific framework
netintel-ocr rag compliance-check \
--question "Does our network segmentation meet PCI-DSS requirements?" \
--framework PCI-DSS-v4.0 \
--include-evidence \
--generate-report \
--collection compliance_docs
# Output:
# PCI-DSS Network Segmentation Compliance Check:
#
# ✅ Requirement 1.1: Network diagram documented
# Evidence: network-topology.pdf, last updated 2024-01-15
#
# ✅ Requirement 1.2: Firewall configuration standards
# Evidence: 234 rules reviewed, all follow standard
#
# ⚠️ Requirement 1.3: DMZ implementation
# Issue: Direct route found between DMZ and Internal
# Risk: Medium
# Remediation: Add deny rule on FW-Internal
#
# ✅ Requirement 1.4: Personal firewall software
# Evidence: Endpoint protection policy enforced
#
# ❌ Requirement 1.5: Security policy review
# Issue: Policy last reviewed 13 months ago (requires annual)
# Risk: High
# Remediation: Schedule immediate policy review
#
# Overall Compliance: 78%
# Critical Issues: 1
# Warnings: 1
#
# Report saved to: pci_compliance_report_2024-01-30.pdf
Security Analysis¶
# Analyze security posture
netintel-ocr rag security-analysis \
--question "What are the potential attack vectors to our database?" \
--threat-model MITRE-ATT&CK \
--include-mitigations \
--collection security_docs
# Output:
# Attack Vector Analysis for Database Access:
#
# Identified Attack Vectors:
#
# 1. External Network Path (High Risk)
# Path: Internet → Firewall → DMZ → App Server → Database
# MITRE Techniques: T1190 (Exploit Public-Facing Application)
# Current Mitigations:
# • WAF in place
# • IPS monitoring
# • Rate limiting enabled
# Gaps: No API gateway authentication
#
# 2. Lateral Movement (Medium Risk)
# Path: Compromised Workstation → Internal Network → Database
# MITRE Techniques: T1021 (Remote Services)
# Current Mitigations:
# • Network segmentation
# • MFA on privileged accounts
# Gaps: Some service accounts without MFA
#
# 3. Insider Threat (Medium Risk)
# Path: Direct database access via admin credentials
# MITRE Techniques: T1078 (Valid Accounts)
# Current Mitigations:
# • Audit logging
# • Privileged access management
# Gaps: No behavior analytics
#
# Recommended Actions:
# 1. Implement API gateway with authentication
# 2. Enable MFA for all service accounts
# 3. Deploy user behavior analytics
# 4. Regular penetration testing
Batch Processing¶
Process Multiple Questions¶
# Create batch query file
cat > queries.txt << EOF
What is our current network capacity?
Which systems have no redundancy?
What are the critical single points of failure?
How many firewall rules allow any-to-any traffic?
What is the backup retention policy?
EOF
# Run batch queries
netintel-ocr rag batch \
--input queries.txt \
--collection network_docs \
--output results.json \
--parallel 4 \
--format json
# View results
cat results.json | jq '.queries[0]'
# {
# "question": "What is our current network capacity?",
# "answer": "Current network capacity: Core: 40Gbps (60% utilized)...",
# "confidence": 0.92,
# "sources": ["capacity-report.pdf", "network-metrics.xlsx"],
# "response_time_ms": 234
# }
Interactive Session¶
# Start interactive RAG session
netintel-ocr rag interactive \
--collection network_docs \
--history-file session.log \
--context-memory 5
# Interactive prompt appears:
# MiniRAG Interactive Mode
# Type 'help' for commands, 'exit' to quit
#
# rag> what is our primary data center location?
# Answer: The primary data center is located in Dallas, TX...
#
# rag> how many servers are there?
# Answer: Based on the context, there are 47 servers total...
#
# rag> show graph
# [Displays interactive network graph visualization]
#
# rag> export session
# Session exported to: rag_session_2024-01-30.md
Performance Optimization¶
Caching Configuration¶
# Enable response caching
netintel-ocr rag config \
--enable-cache \
--cache-size 1000 \
--cache-ttl 3600 \
--collection network_docs
# Query with cache
netintel-ocr rag query \
--question "What is the network topology?" \
--use-cache \
--collection network_docs
# Clear cache
netintel-ocr rag cache-clear --collection network_docs
Retrieval Tuning¶
# Optimize retrieval parameters
netintel-ocr rag tune \
--test-queries evaluation_set.txt \
--optimize-for accuracy \
--collection network_docs
# Output:
# Optimization Results:
#
# Best Parameters:
# • Chunk Size: 512
# • Chunk Overlap: 64
# • Top-K: 8
# • Temperature: 0.7
# • Retrieval Strategy: hybrid
# • Vector Weight: 0.4
# • Graph Weight: 0.6
#
# Performance Improvement:
# • Accuracy: 87% → 94% (+7%)
# • Latency: 380ms → 290ms (-24%)
# • Relevance: 0.81 → 0.93 (+15%)
Monitoring and Metrics¶
# View RAG performance metrics
netintel-ocr rag metrics \
--period 7d \
--collection network_docs
# Output:
# MiniRAG Performance Metrics (7 days):
#
# Query Statistics:
# • Total Queries: 1,847
# • Unique Questions: 423
# • Avg Response Time: 312ms
# • P95 Response Time: 780ms
# • Cache Hit Rate: 67%
#
# Retrieval Performance:
# • Graph Queries: 34% (avg 120ms)
# • Vector Queries: 28% (avg 290ms)
# • Hybrid Queries: 38% (avg 410ms)
#
# Accuracy Metrics:
# • User Satisfaction: 92%
# • Answer Relevance: 0.89
# • Source Accuracy: 94%
#
# Top Query Categories:
# 1. Configuration (34%)
# 2. Troubleshooting (28%)
# 3. Compliance (22%)
# 4. Capacity Planning (16%)
Export and Integration¶
Export Conversations¶
# Export Q&A as documentation
netintel-ocr rag export \
--format markdown \
--include-sources \
--include-confidence \
--output network_qa.md \
--collection network_docs
# Export as JSON for API integration
netintel-ocr rag export \
--format json \
--schema openapi \
--output rag_api.json \
--collection network_docs
Generate Knowledge Base¶
# Build comprehensive KB from queries
netintel-ocr rag build-kb \
--min-confidence 0.8 \
--categories "network,security,operations" \
--format html \
--output knowledge_base.html \
--collection network_docs
# Generate FAQ
netintel-ocr rag generate-faq \
--top-questions 50 \
--group-by-category \
--output faq.md \
--collection network_docs
API Integration¶
REST API Usage¶
import requests
# Query via API
response = requests.post(
"http://localhost:8000/rag/query",
json={
"question": "What is the database connection string?",
"collection": "network_docs",
"mode": "hybrid",
"include_sources": True
}
)
result = response.json()
print(f"Answer: {result['answer']}")
print(f"Confidence: {result['confidence']}")
print(f"Sources: {result['sources']}")
Python SDK¶
import os
from netintel_ocr.rag import MiniRAG
# Configure external Ollama
os.environ['OLLAMA_HOST'] = "http://your-ollama-server:11434"
# Initialize MiniRAG
rag = MiniRAG(
collection="network_docs",
llm_model="gemma3:4b-it-qat",
embedding_model="qwen3-embedding:8b",
retrieval_strategy="adaptive",
ollama_host=os.environ.get('OLLAMA_HOST', 'http://localhost:11434')
)
# Query
result = rag.query(
question="What are the backup procedures?",
include_context=True,
max_tokens=500
)
print(result.answer)
print(f"Retrieved from: {result.sources}")
print(f"Confidence: {result.confidence}")
# Batch processing
questions = [
"What is the network capacity?",
"How many servers do we have?",
"What is the DR strategy?"
]
results = rag.batch_query(questions)
for q, r in zip(questions, results):
print(f"Q: {q}")
print(f"A: {r.answer}\n")
Troubleshooting¶
Common Issues¶
# Debug slow queries
netintel-ocr rag debug \
--question "Your question here" \
--show-retrieval \
--show-timing \
--show-reasoning \
--collection network_docs
# Output:
# Query Debug Information:
#
# 1. Query Classification (12ms)
# Type: entity_centric
# Strategy: graph-first
#
# 2. Graph Retrieval (45ms)
# Entities found: 12
# Relationships: 34
#
# 3. Vector Retrieval (89ms)
# Documents: 5
# Chunks: 15
#
# 4. Context Building (23ms)
# Context size: 2048 tokens
#
# 5. LLM Generation (234ms)
# Model: gemma3:4b-it-qat
# Tokens: 450
#
# Total Time: 403ms
# Bottleneck: LLM Generation (58%)
Improve Answer Quality¶
# Analyze answer quality
netintel-ocr rag analyze \
--question "Your question" \
--answer "Generated answer" \
--check-hallucination \
--check-completeness \
--collection network_docs
# Re-index for better retrieval
netintel-ocr rag reindex \
--optimize-embeddings \
--update-graph \
--collection network_docs
Best Practices¶
- Choose the Right Mode:
- Use
graphmode for structural queries - Use
vectormode for policy/procedure questions - Use
hybridmode for complex analysis -
Use
embeddingsmode for similarity searches -
Optimize for Your Use Case:
- Tune chunk size based on document types
- Adjust temperature for creativity vs accuracy
- Use caching for frequently asked questions
-
Enable monitoring to track performance
-
Maintain Quality:
- Regularly update your knowledge graph
- Re-index after major document changes
- Monitor confidence scores
- Collect user feedback for improvements
Command Reference¶
# Essential MiniRAG commands
netintel-ocr rag init # Initialize MiniRAG
netintel-ocr rag query # Ask a question
netintel-ocr rag batch # Process multiple questions
netintel-ocr rag interactive # Start interactive session
netintel-ocr rag compare # Compare entities
netintel-ocr rag compliance-check # Check compliance
netintel-ocr rag metrics # View performance metrics
netintel-ocr rag export # Export Q&A pairs
netintel-ocr rag debug # Debug queries
netintel-ocr rag tune # Optimize parameters