linux-system-health
Diagnose Linux OS-level issues — slow server, OOM kills, disk full, high CPU/load, DNS failures, connection timeouts, port exhaustion, too many open files, zombie processes, browser automation failures, locale problems, and kernel misconfigurations.
安装 / 下载方式
TotalClaw CLI推荐
totalclaw install github:LeoYeAI~openclaw-master-skills~linux-system-healthcURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/github%3ALeoYeAI~openclaw-master-skills~linux-system-health/file -o linux-system-health.md# Linux System Health Diagnostic Skill You are a Linux OS diagnostic expert. When a user reports any of the following problems, use this skill: - **Performance**: server slow, high load, lag, unresponsive - **Memory**: OOM killed, out of memory, memory leak, swap thrashing - **Disk**: disk full, read-only filesystem, inode exhaustion, log files too large - **CPU**: high CPU, IO wait, process stuck, load average spike - **Network**: DNS failure, connection timeout, port exhaustion, CLOSE_WAIT accumulation, firewall blocking - **Process**: crash, zombie processes, too many open files, file descriptor limit - **Browser automation**: missing shared libraries, Chromium sandbox error, headless browser failures - **Locale/Encoding**: garbled text, character encoding issues, locale not configured Use the judgment rules below to systematically diagnose OS-level root causes. **When NOT to use this skill**: For application-level issues specific to OpenClaw (gateway config, API keys, model configuration, service management, systemd units), use the `openclaw-diagnostic` skill instead. This skill only covers OS-level diagnostics. **Diagnostic workflow**: 1. Always start with Section 1 (System Environment Baseline) to establish context 2. Then run the sections relevant to the user's reported symptoms 3. If the root cause is unclear, run all sections in order for a comprehensive check > **Commands**: Run the corresponding section in [scripts/diagnostics.sh](scripts/diagnostics.sh). Run as root with `export LANG=C`. > > **Issue Registry**: See [reference.md](reference.md) for severity level definitions and the complete issue name table. **Data access scope** — this skill collects OS-level diagnostic data. Review before running in sensitive environments: | Category | What is accessed | Sections | |----------|-----------------|----------| | System config files | `/etc/os-release`, `/etc/resolv.conf`, `/etc/security/limits.conf`, `/etc/default/locale`, `/etc/locale.conf`, `/etc/systemd/journald.conf` | 1, 6, 8, 11, 17 | | Kernel interfaces | `/proc/meminfo`, `/proc/stat`, `/proc/loadavg`, `/proc/sys/fs/*`, `/proc/sys/net/*`, `/sys/kernel/mm/*` | 2, 3, 5, 6, 7, 14 | | Kernel ring buffer | `dmesg` — may contain process names and OOM kill details | 2, 7, 12 | | Systemd journal | `journalctl -k` — kernel messages only | 2 | | Log directory | `/var/log/` size enumeration only (does **not** read log content) | 11 | | Process & socket table | `ps`, `ss -p` — exposes PIDs, command names, socket owners | 2, 3, 10, 15 | | User home directories | `/root/.cache/ms-playwright`, `/home/*/.cache/ms-playwright` — Chromium binary search only | 16 | | Outbound network probes | DNS resolution tests (`nslookup`/`dig`/`getent` to `github.com`), nameserver TCP/53 reachability, Chrome headless launch test (`about:blank`) | 8, 16 | | Write operation | Creates and immediately removes `/tmp/.oc_write_test` to verify filesystem writability — the **only** write in the entire script | 12 | **Output format**: After running diagnostics, report findings as a severity-sorted list (FATAL > CRITICAL > ERROR > WARNING > INFO). For each issue found, include: - Issue name (e.g., `OpenClaw.Memory.SystemMemoryCritical`) - Severity level - Observed value vs threshold - Recommended remediation --- ## 1. System Environment Baseline Collect OS context for subsequent analysis. **Judgment rules**: - Record output as **OpenClaw.System.EnvironmentBaseline** (INFO) — no issues, context only. --- ## 2. Memory & OOM Detect low memory and past OOM kills that affect any workload on this server. **Judgment rules**: - MemAvailable / MemTotal < 5% → **OpenClaw.Memory.SystemMemoryCritical** (CRITICAL) - Remediation: Kill unnecessary processes, add swap, or increase instance RAM - MemAvailable / MemTotal < 10% → **OpenClaw.Memory.SystemMemoryLow** (WARNING) - Remediation: Monitor closely; consider scaling up - MemTotal < 2 GB → **OpenClaw.Memory.InsufficientTotalMemory** (ERROR) - Remediation: 4 GB+ RAM recommended for production workloads - dmesg contains "oom-killer" → **OpenClaw.Memory.OOMKillerEvent** (WARNING) - Remediation: Identify which processes were killed; review memory allocation --- ## 3. CPU & Performance Resource contention causes slow responses; high iowait indicates disk bottlenecks. **Judgment rules**: - Load average (1 min) > 2x `nproc` → **OpenClaw.CPU.SystemLoadHigh** (WARNING) - Remediation: Identify top CPU consumers; check for runaway processes - CPU idle < 10% (i.e., total utilization > 90%) → **OpenClaw.CPU.SystemCPUExhausted** (CRITICAL) - Remediation: Identify top process; check for log flooding or computation storms - iowait > 30% (from `/proc/stat`) → **OpenClaw.CPU.HighIOWait** (WARNING) - Remediation: Check disk I/O — likely excessive logging or disk-bound workload --- ## 4. Network Infrastructure Basic network configuration, DNS, IPv6, and firewall state. **Judgment rules**: - IPv6 enabled and services bind `::` but upstream resolves to IPv4 only → **OpenClaw.Network.IPv6Mismatch** (WARNING) - Remediation: Set `NODE_OPTIONS='--dns-result-order=ipv4first'` or `sysctl -w net.ipv6.conf.all.disable_ipv6=1` --- ## 5. Disk & inotify Disk space exhaustion and inotify limits cause "ENOSPC" errors. **Judgment rules**: - Any filesystem usage >= 95% → **OpenClaw.Disk.FilesystemFull** (CRITICAL) - Remediation: Clean old logs and data; extend partition or add disk - Any filesystem usage >= 80% → **OpenClaw.Disk.FilesystemHighUsage** (WARNING) - Remediation: Monitor; plan cleanup or expansion - `max_user_watches` < 65536 → **OpenClaw.Disk.InotifyWatchesTooLow** (ERROR) - Remediation: `echo 'fs.inotify.max_user_watches=524288' >> /etc/sysctl.d/99-inotify.conf && sysctl -p /etc/sysctl.d/99-inotify.conf` - `max_user_instances` < 256 → **OpenClaw.Disk.InotifyInstancesTooLow** (WARNING) - Remediation: `echo 'fs.inotify.max_user_instances=512' >> /etc/sysctl.d/99-inotify.conf && sysctl -p /etc/sysctl.d/99-inotify.conf` --- ## 6. File Descriptor & Process Limits Low ulimits cause "too many open files" (EMFILE) errors under load. **Judgment rules**: - Shell `ulimit -n` < 4096 → **OpenClaw.Limits.NofileTooLow** (ERROR) - Remediation: Add `* soft nofile 65536` and `* hard nofile 65536` to `/etc/security/limits.conf`; re-login - limits.conf `nofile` value > `fs.nr_open` → **OpenClaw.Limits.NofileExceedsKernelMax** (CRITICAL) - Remediation: Increase `fs.nr_open` first: `sysctl -w fs.nr_open=1048576` and persist in `/etc/sysctl.d/` - `file-nr` allocated / max > 80% → **OpenClaw.Limits.SystemFileDescriptorsHigh** (WARNING) - Remediation: Identify processes holding many FDs (`ls /proc/*/fd 2>/dev/null | wc -l`); increase `fs.file-max` if needed --- ## 7. Kernel & Sysctl Tuning nf_conntrack, TCP tuning, and somaxconn affect high-concurrency workloads. **Judgment rules**: - `nf_conntrack_max` < 65536 → **OpenClaw.Kernel.NfConntrackMaxTooLow** (ERROR) - Remediation: `sysctl -w net.netfilter.nf_conntrack_max=262144` and persist in `/etc/sysctl.d/99-sysctl.conf` - dmesg contains "nf_conntrack: table full" → **OpenClaw.Kernel.NfConntrackTableFull** (CRITICAL) - Remediation: Increase `nf_conntrack_max`; check for connection leaks - `somaxconn` < 1024 → **OpenClaw.Kernel.SomaxconnTooLow** (WARNING) - Remediation: `sysctl -w net.core.somaxconn=4096` and persist - `tcp_max_tw_buckets` < 10000 → **OpenClaw.Kernel.TcpMaxTwBucketsTooLow** (WARNING) - Remediation: `sysctl -w net.ipv4.tcp_max_tw_buckets=262144` - `tcp_tw_reuse = 0` → **OpenClaw.Kernel.TcpTwReuseNotEnabled** (WARNING) - Remediation: `sysctl -w net.ipv4.tcp_tw_reuse=1` - TIME_WAIT count from `ss -s` > 10000 → **OpenClaw.Kernel.TimeWaitOverflow** (WARNING) - Remediation: Enable `tcp_tw_reuse`, increase `tcp_max_tw_buckets`, reduce `tcp_fin_timeout` - List