Troubleshooting¶

Last verified: v2.0

Common issues, their causes, and how to fix them. Organized by problem category.

Connection Issues¶

Connector can't connect -- authentication failure¶

Symptom

Connector test returns 401 Unauthorized or 403 Forbidden. MEHO reports "Authentication failed" when trying to use a connector.

Cause: Credentials are incorrect, expired, or have insufficient permissions.

Fix:

Verify the credentials in Connectors > [connector name] > Edit
For token-based connectors (Kubernetes, ArgoCD, GitHub), check that the token hasn't expired
For Atlassian connectors (Jira, Confluence), verify you're using an API token, not your account password
For observability connectors (Prometheus, Loki, Tempo, Alertmanager), check if authentication is required -- many installations run without auth behind a reverse proxy

Quick test

Use the Test Connection button on the connector edit page. It performs a lightweight health check that validates auth without running full operations.

Connector can't connect -- network/SSL errors¶

Symptom

Connector test returns ConnectionError, SSLError, or TimeoutError. MEHO reports "Could not reach [system]".

Cause: The MEHO backend container can't reach the target system. Common causes: DNS resolution failure, firewall rules, SSL certificate issues, or the target system is on a private network.

Fix:

Verify the target URL is reachable from the MEHO backend container:

docker exec meho-meho-1 curl -v https://your-system-url/health

For self-signed certificates, ensure the CA certificate is mounted in the backend container
For private networks, ensure the Docker network has access (may need network_mode: host or additional Docker network configuration)
Check that the port is correct -- some systems use non-standard ports (e.g., Kubernetes API on 6443, not 443)

Connector connects but returns empty results¶

Symptom

Connector test succeeds, but queries return no data. MEHO says "No results found" for queries that should return data.

Cause: The connector credentials have limited scope (e.g., read access to only certain namespaces, projects, or repositories).

Fix:

Check the permissions of the service account or API token used by the connector
For Kubernetes: verify the service account has ClusterRole bindings (not just namespace-scoped Role)
For GitHub: verify the token has repo scope for private repositories
For Prometheus/Loki: check if the connector is pointed at the correct datasource or tenant

Agent Issues¶

Agent not using the right connector¶

Symptom

MEHO queries the wrong system or uses a connector you didn't intend. For example, asking about "production pods" hits a staging Kubernetes cluster.

Cause: Multiple connectors of the same type exist, and the agent selected the wrong one. The agent uses connector names and descriptions to determine which to query.

Fix:

Give connectors descriptive names: "Production Kubernetes" vs "Staging Kubernetes" rather than "K8s 1" and "K8s 2"
Use @connector_name mentions in your message to explicitly target a connector
Update connector descriptions to clearly state their scope (environment, region, team)

Agent makes unexpected tool calls¶

Symptom

MEHO queries systems you didn't ask about, or performs operations that seem unrelated to your question.

Cause: The agent's ReAct reasoning loop determined that additional context from other systems would help answer your question. This is by design -- cross-system reasoning is MEHO's core value.

Fix:

If the additional queries are helpful but slow, this is expected behavior. MEHO traces problems across systems automatically.
If the queries are genuinely irrelevant, be more specific in your question. Instead of "what's wrong?", try "what's the CPU usage on the checkout-service pods?"
Use Ask mode (toggle in chat input) for simple questions that don't need investigation. Ask mode queries the knowledge base without invoking connectors.

Agent hits context limits¶

Symptom

MEHO's response is cut short or it says "I've reached my context limit". Investigations with many data-heavy queries may hit this.

Cause: The combined data from multiple connector queries exceeded the LLM's context window, even after reduction.

Fix:

Ask more specific questions to reduce the amount of data returned
Break complex investigations into smaller steps: first identify the problem area, then drill down
Start a new session if the current one has accumulated too much context -- MEHO persists session state, so the next session can reference previous findings via knowledge base

Context monitoring

The chat input area shows a context usage indicator. When it approaches the limit, MEHO automatically compacts earlier messages to free space.

Data Issues¶

Stale or cached data¶

Symptom

MEHO returns data that doesn't match what you see in the source system. Results seem outdated.

Cause: MEHO caches connector responses in DuckDB/Parquet for SQL reduction within a session. If the source data changed after the initial query, the cache still holds the old data.

Fix:

Ask MEHO to "refresh" or "re-query" the data -- it will make a new API call instead of using the cache
Start a new session for a fresh investigation
Note that some connectors have built-in time ranges (e.g., Prometheus queries default to the last hour). Specify explicit time ranges in your question if needed.

Large response handling¶

Symptom

Queries that return very large datasets (thousands of pods, millions of log lines) are slow or cause errors.

Cause: The target system returned more data than expected. MEHO's data pipeline handles this, but very large responses take longer to normalize and reduce.

Fix:

Add filters to your question: "show me pods in the checkout namespace" instead of "show me all pods"
Use time ranges for log queries: "logs from the last 30 minutes" instead of "show me the logs"
For Prometheus, prefer instant queries over range queries when you only need current values

Authentication Issues¶

Keycloak token expiry -- 401 responses¶

Symptom

The MEHO UI suddenly starts showing 401 errors. All API calls fail. The page may redirect to the Keycloak login screen.

Cause: The Keycloak access token has expired and automatic token refresh failed. This can happen if Keycloak is temporarily unreachable or if the refresh token has also expired (default: 30 minutes idle).

Fix:

Refresh the browser page -- keycloak-js will attempt to re-authenticate
If the login page appears, log in again. Your session data (chat history, investigation state) is persisted and will be restored
If Keycloak itself is down, check the container: docker logs meho-keycloak

CORS errors in browser console¶

Symptom

Browser console shows Access-Control-Allow-Origin errors. API calls from the frontend fail with CORS rejection.

Cause: The frontend URL is not in the backend's allowed origins list.

Fix:

Check .env for CORS_ORIGINS -- it must include the frontend URL (default: ["http://localhost:5173"])
If running the frontend on a different port or domain, update CORS_ORIGINS accordingly
Restart the backend after changing CORS settings: ./scripts/dev-env.sh restart meho

403 Forbidden on specific operations¶

Symptom

MEHO can read data from a connector but fails when attempting write operations. Error: "Forbidden" or "Insufficient permissions".

Cause: The connector's credentials have read-only access. Write and destructive operations require elevated permissions.

Fix:

This is often intentional -- many organizations configure read-only credentials for safety
If write access is needed, update the connector credentials with a service account that has appropriate permissions
Check the connector's documentation page for the exact permissions required for each operation

Deployment Issues¶

Docker Compose startup failures¶

Symptom

./scripts/dev-env.sh up fails. Containers crash on startup or fail health checks.

Cause: Missing environment variables, port conflicts, or insufficient resources.

Fix:

Missing .env file: Copy env.example to .env and set the required secrets:

cp env.example .env
# Edit .env: set ANTHROPIC_API_KEY, VOYAGE_API_KEY, CREDENTIAL_ENCRYPTION_KEY

Port conflicts: Check that ports 5432, 6379, 8000, 8080, 5173, 9000, 5341 are not in use:
```
lsof -i :8000  # Check if port is occupied
```
Insufficient memory: The full stack requires approximately 4GB of RAM. Docker Desktop default is often 2GB.
- macOS/Windows: Docker Desktop > Settings > Resources > Memory > Set to 6GB+
Keycloak slow startup: Keycloak can take 60-90 seconds to initialize on first run. The health check has a 90-second start period, but on slow machines it may need longer. Check logs: docker logs meho-keycloak

Database migration errors¶

Symptom

The backend starts but API calls fail with database errors. Logs show "relation does not exist" or "column not found".

Cause: Database migrations haven't run or failed silently.

Fix:

Always use ./scripts/dev-env.sh up instead of raw docker compose up -- the helper script runs migrations automatically
If migrations need to be run manually:
```
./scripts/run-migrations-monolith.sh
```

If migrations fail with version conflicts, check for stale Alembic version entries:

docker exec meho-postgres-1 psql -U meho -c "SELECT * FROM alembic_version_meho_knowledge;"

For a clean slate (destroys all data):

./scripts/dev-env.sh down --volumes
./scripts/dev-env.sh up

Backend crashes on startup¶

Symptom

The meho container exits immediately or enters a restart loop. Logs show import errors or configuration errors.

Cause: Missing or invalid environment variables, or a Python dependency issue.

Fix:

Check backend logs for the specific error:
```
docker logs meho-meho-1
```
Common causes:
- ANTHROPIC_API_KEY not set or invalid
- CREDENTIAL_ENCRYPTION_KEY not set (must be a valid Fernet key, minimum 32 characters)
- DATABASE_URL pointing to wrong host (should be postgres inside Docker network, not localhost)
Rebuild the image if dependencies changed:
```
./scripts/dev-env.sh up --build
```

Getting Help¶

If you encounter an issue not covered here:

Check the logs: ./scripts/dev-env.sh logs shows all service logs. Add a service name for filtered output: ./scripts/dev-env.sh logs meho
Check connector-specific pages: Each connector documentation page includes a troubleshooting section for connector-specific issues
Check the API docs: The API Reference documents all endpoints and their expected responses