Troubleshooting¶
Last verified: v2.0
Common issues, their causes, and how to fix them. Organized by problem category.
Connection Issues¶
Connector can't connect -- authentication failure¶
Symptom
Connector test returns 401 Unauthorized or 403 Forbidden. MEHO reports "Authentication failed" when trying to use a connector.
Cause: Credentials are incorrect, expired, or have insufficient permissions.
Fix:
- Verify the credentials in Connectors > [connector name] > Edit
- For token-based connectors (Kubernetes, ArgoCD, GitHub), check that the token hasn't expired
- For Atlassian connectors (Jira, Confluence), verify you're using an API token, not your account password
- For observability connectors (Prometheus, Loki, Tempo, Alertmanager), check if authentication is required -- many installations run without auth behind a reverse proxy
Quick test
Use the Test Connection button on the connector edit page. It performs a lightweight health check that validates auth without running full operations.
Connector can't connect -- network/SSL errors¶
Symptom
Connector test returns ConnectionError, SSLError, or TimeoutError. MEHO reports "Could not reach [system]".
Cause: The MEHO backend container can't reach the target system. Common causes: DNS resolution failure, firewall rules, SSL certificate issues, or the target system is on a private network.
Fix:
- Verify the target URL is reachable from the MEHO backend container:
- For self-signed certificates, ensure the CA certificate is mounted in the backend container
- For private networks, ensure the Docker network has access (may need
network_mode: hostor additional Docker network configuration) - Check that the port is correct -- some systems use non-standard ports (e.g., Kubernetes API on 6443, not 443)
Connector connects but returns empty results¶
Symptom
Connector test succeeds, but queries return no data. MEHO says "No results found" for queries that should return data.
Cause: The connector credentials have limited scope (e.g., read access to only certain namespaces, projects, or repositories).
Fix:
- Check the permissions of the service account or API token used by the connector
- For Kubernetes: verify the service account has
ClusterRolebindings (not just namespace-scopedRole) - For GitHub: verify the token has
reposcope for private repositories - For Prometheus/Loki: check if the connector is pointed at the correct datasource or tenant
Agent Issues¶
Agent not using the right connector¶
Symptom
MEHO queries the wrong system or uses a connector you didn't intend. For example, asking about "production pods" hits a staging Kubernetes cluster.
Cause: Multiple connectors of the same type exist, and the agent selected the wrong one. The agent uses connector names and descriptions to determine which to query.
Fix:
- Give connectors descriptive names: "Production Kubernetes" vs "Staging Kubernetes" rather than "K8s 1" and "K8s 2"
- Use
@connector_namementions in your message to explicitly target a connector - Update connector descriptions to clearly state their scope (environment, region, team)
Agent makes unexpected tool calls¶
Symptom
MEHO queries systems you didn't ask about, or performs operations that seem unrelated to your question.
Cause: The agent's ReAct reasoning loop determined that additional context from other systems would help answer your question. This is by design -- cross-system reasoning is MEHO's core value.
Fix:
- If the additional queries are helpful but slow, this is expected behavior. MEHO traces problems across systems automatically.
- If the queries are genuinely irrelevant, be more specific in your question. Instead of "what's wrong?", try "what's the CPU usage on the checkout-service pods?"
- Use Ask mode (toggle in chat input) for simple questions that don't need investigation. Ask mode queries the knowledge base without invoking connectors.
Agent hits context limits¶
Symptom
MEHO's response is cut short or it says "I've reached my context limit". Investigations with many data-heavy queries may hit this.
Cause: The combined data from multiple connector queries exceeded the LLM's context window, even after reduction.
Fix:
- Ask more specific questions to reduce the amount of data returned
- Break complex investigations into smaller steps: first identify the problem area, then drill down
- Start a new session if the current one has accumulated too much context -- MEHO persists session state, so the next session can reference previous findings via knowledge base
Context monitoring
The chat input area shows a context usage indicator. When it approaches the limit, MEHO automatically compacts earlier messages to free space.
Data Issues¶
Stale or cached data¶
Symptom
MEHO returns data that doesn't match what you see in the source system. Results seem outdated.
Cause: MEHO caches connector responses in DuckDB/Parquet for SQL reduction within a session. If the source data changed after the initial query, the cache still holds the old data.
Fix:
- Ask MEHO to "refresh" or "re-query" the data -- it will make a new API call instead of using the cache
- Start a new session for a fresh investigation
- Note that some connectors have built-in time ranges (e.g., Prometheus queries default to the last hour). Specify explicit time ranges in your question if needed.
Large response handling¶
Symptom
Queries that return very large datasets (thousands of pods, millions of log lines) are slow or cause errors.
Cause: The target system returned more data than expected. MEHO's data pipeline handles this, but very large responses take longer to normalize and reduce.
Fix:
- Add filters to your question: "show me pods in the
checkoutnamespace" instead of "show me all pods" - Use time ranges for log queries: "logs from the last 30 minutes" instead of "show me the logs"
- For Prometheus, prefer instant queries over range queries when you only need current values
Authentication Issues¶
Keycloak token expiry -- 401 responses¶
Symptom
The MEHO UI suddenly starts showing 401 errors. All API calls fail. The page may redirect to the Keycloak login screen.
Cause: The Keycloak access token has expired and automatic token refresh failed. This can happen if Keycloak is temporarily unreachable or if the refresh token has also expired (default: 30 minutes idle).
Fix:
- Refresh the browser page -- keycloak-js will attempt to re-authenticate
- If the login page appears, log in again. Your session data (chat history, investigation state) is persisted and will be restored
- If Keycloak itself is down, check the container:
docker logs meho-keycloak
CORS errors in browser console¶
Symptom
Browser console shows Access-Control-Allow-Origin errors. API calls from the frontend fail with CORS rejection.
Cause: The frontend URL is not in the backend's allowed origins list.
Fix:
- Check
.envforCORS_ORIGINS-- it must include the frontend URL (default:["http://localhost:5173"]) - If running the frontend on a different port or domain, update
CORS_ORIGINSaccordingly - Restart the backend after changing CORS settings:
./scripts/dev-env.sh restart meho
403 Forbidden on specific operations¶
Symptom
MEHO can read data from a connector but fails when attempting write operations. Error: "Forbidden" or "Insufficient permissions".
Cause: The connector's credentials have read-only access. Write and destructive operations require elevated permissions.
Fix:
- This is often intentional -- many organizations configure read-only credentials for safety
- If write access is needed, update the connector credentials with a service account that has appropriate permissions
- Check the connector's documentation page for the exact permissions required for each operation
Deployment Issues¶
Docker Compose startup failures¶
Symptom
./scripts/dev-env.sh up fails. Containers crash on startup or fail health checks.
Cause: Missing environment variables, port conflicts, or insufficient resources.
Fix:
-
Missing
.envfile: Copyenv.exampleto.envand set the required secrets: -
Port conflicts: Check that ports 5432, 6379, 8000, 8080, 5173, 9000, 5341 are not in use:
-
Insufficient memory: The full stack requires approximately 4GB of RAM. Docker Desktop default is often 2GB.
- macOS/Windows: Docker Desktop > Settings > Resources > Memory > Set to 6GB+
-
Keycloak slow startup: Keycloak can take 60-90 seconds to initialize on first run. The health check has a 90-second start period, but on slow machines it may need longer. Check logs:
docker logs meho-keycloak
Database migration errors¶
Symptom
The backend starts but API calls fail with database errors. Logs show "relation does not exist" or "column not found".
Cause: Database migrations haven't run or failed silently.
Fix:
- Always use
./scripts/dev-env.sh upinstead of rawdocker compose up-- the helper script runs migrations automatically - If migrations need to be run manually:
- If migrations fail with version conflicts, check for stale Alembic version entries:
- For a clean slate (destroys all data):
Backend crashes on startup¶
Symptom
The meho container exits immediately or enters a restart loop. Logs show import errors or configuration errors.
Cause: Missing or invalid environment variables, or a Python dependency issue.
Fix:
- Check backend logs for the specific error:
- Common causes:
ANTHROPIC_API_KEYnot set or invalidCREDENTIAL_ENCRYPTION_KEYnot set (must be a valid Fernet key, minimum 32 characters)DATABASE_URLpointing to wrong host (should bepostgresinside Docker network, notlocalhost)
- Rebuild the image if dependencies changed:
Getting Help¶
If you encounter an issue not covered here:
- Check the logs:
./scripts/dev-env.sh logsshows all service logs. Add a service name for filtered output:./scripts/dev-env.sh logs meho - Check connector-specific pages: Each connector documentation page includes a troubleshooting section for connector-specific issues
- Check the API docs: The API Reference documents all endpoints and their expected responses