Loki¶

Last verified: v2.0

Loki is the log aggregation engine behind MEHO's log investigation capabilities. MEHO's Loki connector translates natural-language questions about logs into structured LogQL queries -- operators can search, filter, and analyze log data without learning LogQL syntax or knowing which labels are available.

All 4 observability connectors (Prometheus, Loki, Tempo, Alertmanager) share the same ObservabilityHTTPConnector base, meaning identical auth setup and consistent behavior across your monitoring stack.

Authentication¶

All observability connectors use the shared ObservabilityHTTPConnector authentication model:

Method	Credential Fields	Notes
None	--	Direct access to internal Loki (e.g., in-cluster)
Basic Auth	`username`, `password`	HTTP Basic Auth via reverse proxy (nginx, Apache)
Bearer Token	`token`	OAuth2 proxy, service mesh, or API gateway

Setup:

No auth (default): Point MEHO at your Loki URL (e.g., http://loki:3100). Common for in-cluster access.
Basic auth: Used when Loki is behind a reverse proxy requiring HTTP basic credentials.
Bearer token: Used when Loki is behind an OAuth2 proxy or service mesh.

Multi-tenancy: For multi-tenant Loki deployments (e.g., Grafana Cloud Loki), the org_id is configured at the connector level. MEHO will include the X-Scope-OrgID header on all requests automatically.

Optional: Set skip_tls_verification: true if Loki uses a self-signed certificate.

Operations¶

MEHO registers 8 pre-defined operations for Loki, organized into three categories:

Log Search (5 operations)¶

Operation	Trust Level	Description
`search_logs`	READ	Search logs with label filters (namespace, pod, service, container), severity, and text filter. Returns summary stats plus structured log lines (timestamp, severity, source, message).
`get_error_logs`	READ	Retrieve error and warning logs. Shortcut for search_logs pre-filtered to error/warn/fatal severity levels.
`get_log_context`	READ	Retrieve log lines surrounding a specific timestamp for incident context. Returns lines before and after the target timestamp.
`get_log_volume`	READ	Query log volume statistics over time. Returns counts/rates bucketed by time interval -- detects log spikes, outage windows, and volume trends.
`get_log_patterns`	READ	Detect repeating log patterns with occurrence counts. Groups similar log lines by structural pattern -- useful for identifying systemic errors vs one-off events.

Discovery (2 operations)¶

Operation	Trust Level	Description
`list_labels`	READ	Discover available log labels in Loki (namespace, pod, container, service_name, level, job, etc.)
`list_label_values`	READ	Get all values for a specific log label -- e.g., list all namespaces, pods, or services available in Loki

Query (1 operation)¶

Operation	Trust Level	Description
`query_logql`	WRITE	Execute an arbitrary LogQL query (log queries or metric queries like `count_over_time`, `rate`). Requires operator approval.

LogQL Escape Hatch

The query_logql operation requires WRITE trust level because it executes arbitrary queries against your log data. MEHO will present the query to the operator for approval. For standard log investigation, the 7 pre-defined READ operations cover the vast majority of diagnostic scenarios.

Example Queries¶

Ask MEHO questions like:

"Show me error logs from the payment service in the last 15 minutes"
"What log lines contain 'timeout' across all services?"
"Are there any OOMKilled errors in the production namespace?"
"Show me the log context around when the crash happened at 10:30 AM"
"What's the log volume trend for the checkout service over the last 24 hours?"
"Are there repeating error patterns in the API gateway logs?"
"What labels are available in Loki for filtering?"
"List all namespaces that have logs in Loki"
"Show me warning and error logs from pod nginx-abc123 in the last hour"
"Has there been a spike in error logs recently?"

MEHO automatically builds the LogQL label selectors and filters from your natural-language question. For complex queries (e.g., multi-stage pipelines, metric aggregations), MEHO may propose a LogQL query via query_logql for your approval.

Topology¶

Loki is a query-only connector -- it does not discover topology entities. Loki stores log data about infrastructure components that other connectors (Kubernetes, Prometheus) already track as topology entities.

The connection between Loki logs and infrastructure happens through shared labels: the namespace, pod, and node labels in Loki match the Kubernetes entities discovered by the K8s connector, enabling cross-system correlation.

Cross-System Observability¶

Loki is most powerful when combined with the other observability connectors:

Loki + Prometheus: When Prometheus shows a metric spike (e.g., elevated error rate from get_red_metrics), use Loki to find the corresponding error logs in the same time window. The shared namespace and service labels make correlation seamless.
Loki + Tempo: Error logs often contain trace IDs. MEHO can extract a trace ID from a Loki log line and retrieve the full distributed trace from Tempo to understand the request flow that triggered the error.
Loki + Kubernetes: When get_error_logs surfaces pod crashes, cross-reference with Kubernetes pod status, events, and restart counts to determine if it's a recurring issue or an isolated incident.

In a single MEHO conversation, you can ask "Why are we seeing 500 errors in the checkout service?" and MEHO will check Loki for error logs, correlate with Prometheus request metrics, and if a trace ID is found, pull the full trace from Tempo.

Troubleshooting¶

Large Time Range Queries Return Slowly or Timeout¶

Symptom: Operations like search_logs with time_range='7d' are very slow or timeout. Cause: Loki is optimized for recent data. Large time ranges scan massive amounts of log data. Fix: Narrow the time range to the relevant window (e.g., 1h, 6h). Use get_log_volume first to identify the interesting time period, then search within that window.

No Logs Found for a Known Service¶

Symptom: search_logs(service='my-service') returns empty results. Cause: The label name for your service may differ. Loki labels are configured by the log shipper (Promtail, Grafana Agent, etc.) and may use service_name, app, or job instead of service. Fix: Use list_labels to discover available labels, then list_label_values to find the correct label and value for your service.

Label Cardinality Issues¶

Symptom: Queries with many label filters return unexpected results or errors. Cause: High-cardinality labels (e.g., unique request IDs) can cause performance issues in Loki. Fix: Filter by low-cardinality labels first (namespace, service) and use text_filter for high-cardinality matching. Avoid using high-cardinality labels as primary filters.

Log Pattern Detection Misses¶

Symptom: get_log_patterns returns fewer patterns than expected. Cause: Pattern detection works best on structured/semi-structured logs. Highly variable log formats may not cluster well. Fix: Narrow the scope with namespace and service filters to improve pattern quality. Consider using search_logs with a text filter for specific error messages instead.

Natural Language to LogQL Mismatch¶

Symptom: MEHO searches for the wrong labels or filters. Cause: Ambiguous phrasing -- "service errors" could mean different things depending on label structure. Fix: Be specific about the dimension you want to filter: "error logs from namespace payments, pod checkout-abc" rather than just "checkout errors". If unsure about available labels, ask "What labels are available in Loki?" first.

Connector type: Observability (ObservabilityHTTPConnector) Operations: 8 (7 READ, 1 WRITE) Topology entities: None (query-only)