-
Incident report
-
Resolution: Fixed
-
Blocker
-
3.0.1
Suppose we start the agent and request proc.cpu.util[]:
$ bin/zabbix_get -s 127.0.0.1 -p 23050 -k proc.cpu.util[zabbix_agentd] zabbix_get [29276]: Check access restrictions in Zabbix agent configuration
Investigation shows that the agent crashes with the following debug log:
29078:20160405:225645.401 Requested [proc.cpu.util[zabbix_agentd]] 29078:20160405:225645.403 In procstat_add() 29078:20160405:225645.403 In zbx_dshm_realloc() shmid:-1 size:21704 29078:20160405:225645.403 In procstat_copy_data() 29078:20160405:225645.403 End of procstat_copy_data() 29078:20160405:225645.403 End of zbx_dshm_realloc():SUCCEED shmid:605 29078:20160405:225645.404 End of procstat_add() 29077:20160405:225645.750 __zbx_zbx_setproctitle() title:'collector [processing data]' 29077:20160405:225645.750 In update_cpustats() 29077:20160405:225645.750 End of update_cpustats() ... 29077:20160405:225645.766 DEBUG: util_local = ffbee948 29077:20160405:225645.766 DEBUG: procstat_snapshot = 0 29077:20160405:225645.767 DEBUG: procstat_snapshot_num = 0 29077:20160405:225645.767 DEBUG: sizeof(zbx_procstat_util_t) = 32 29077:20160405:225645.767 DEBUG: inside procstat_util_compare, u1 = ffbee948, u2 = 0 29077:20160405:225645.767 Got signal [signal:11(SIGSEGV),reason:1,refaddr:0]. Crashing ... 29077:20160405:225645.767 ====== Fatal information: ====== 29077:20160405:225645.767 program counter not available for this architecture 29077:20160405:225645.768 === Registers: === 29077:20160405:225645.768 register dump not available for this architecture 29077:20160405:225645.768 === Backtrace: === 29077:20160405:225645.768 backtrace not available for this platform 29077:20160405:225645.768 === Memory map: === 29077:20160405:225645.768 memory map not available for this platform 29077:20160405:225645.769 ================================ 29076:20160405:225645.773 One child process died (PID:29077,exitcode/signal:1). Exiting ...
The crash happens in procstat_calculate_cpu_util_for_queries() in the second call to bsearch:
/* find the process utilization data in last snapshot */ putil = (zbx_procstat_util_t *)bsearch(&util_local, procstat_snapshot, procstat_snapshot_num, sizeof(zbx_procstat_util_t), procstat_util_compare);
As can be seen in the debug log above, the "procstat_snapshot" variable is NULL. The following small program shows that, when the second argument to bsearch() is NULL on Solaris 8, then the comparison function is called with one of the arguments being NULL (which our procstat_util_compare() function does not handle and crashes):
#include <stdio.h> #include <stdlib.h> static int compare(const void *p1, const void *p2) { printf("p1 = %p, p2 = %p\n", p1, p2); return 0; } int main() { int a; void *p; p = bsearch(&a, NULL, 0, sizeof(a), compare); printf("p = %p\n", p); return 0; }
$ gcc bsearch.c -o bsearch -Wall -Wextra $ ./bsearch p1 = ffbefb4c, p2 = 0 p = 0
Compiling the same program on Linux gives the following warning (note that the comparison function is not called at runtime):
$ gcc bsearch.c -o bsearch -Wall -Wextra bsearch.c: In function 'main': bsearch.c:15:2: warning: null argument where non-null required (argument 2) [-Wnonnull] p = bsearch(&a, NULL, 0, sizeof(a), compare); ^ $ ./bsearch p = (nil)