linux suse 3.0.101的一次中断暴增的排查

本文相关背景知识可以在:http://man7.org/linux/man-pages/man5/proc.5.html?spm=5176.100239.blogcont6047.8.ImCGpr 看到。

在一次呼叫测试中,遇到如下的问题:

 13  10  65   0   0  11| 512k  116k| 412k 5964M|   0     0 | 694k  317k|91.9G 85.3M  125G 34.9G|17-09 12:54:50
 14  10  65   0   0  11|   0    11M| 465k 5962M|   0     0 | 696k  319k|91.8G 85.3M  125G 34.9G|17-09 12:54:51
 13  11  64   0   0  11| 652k    0 | 463k 5964M|   0     0 | 695k  319k|91.9G 85.3M  125G 34.9G|17-09 12:54:52
 13  10  66   0   0  11|   0   444k| 512k 5962M|   0     0 | 695k  318k|91.8G 85.5M  125G 34.9G|17-09 12:54:53
  6  18  16   0   0  60|1168k  116k|1394k 9420M|   0     0 |5324k  200k|91.9G 85.5M  125G 34.9G|17-09 12:54:56 missed 2 ticks
  5  19   2   0   0  73| 512k  168k|1074k 5632M|   0     0 |3683k   18k|91.9G 85.5M  125G 34.8G|17-09 12:54:58 missed 3 ticks
  4  19   1   0   0  75| 504k   28k| 732k 4115M|   0     0 |3035k   14k|91.9G 85.5M  125G 34.8G|17-09 12:54:59
 10  14   9   1   0  68|   0     0 | 186k 1171M|   0     0 | 435k 5876 |91.9G 85.5M  125G 34.8G|17-09 12:55:00
  4  17   4   0   0  75| 512k  168k| 347k 2303M|   0     0 |1367k 7744 |91.9G 85.6M  125G 34.8G|17-09 12:55:01
 44  20   6   0   0  31|1388k  132k| 738k 2844M|   0     0 | 763k  149k|91.9G 85.6M  125G 34.8G|17-09 12:55:02
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ------memory-usage----- ----system----
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw | used  buff  cach  free|  date/time
 59  17  10   0   0  15| 892k   11M| 489k 5034M|   0     0 | 689k  220k|91.9G 85.7M  125G 34.9G|17-09 12:55:03
 59  19   7   0   0  14|1308k    0 | 421k 5376M|   0     0 | 664k  175k|91.9G 85.7M  125G 34.9G|17-09 12:55:04
 52  19   8   7   0  13|  91M  908k| 355k 5876M|   0     0 | 641k  153k|91.8G 85.8M  125G 34.9G|17-09 12:55:05

dstat出现missed ticks,根据dstat的源码,确定说明当时没有来得及调度,因为本来我们给dstat的周期是1s打印一次。

没有调度的原因看起来也比较简单,是因为中断占用cpu过高。如上图下划线所示,

软中断占比这么高,到底是软中断数目增加了,还是平均软中断的处理时间增加了呢?同时我发现,int的值也异常增长,出于敬仰,先看硬中断,因为硬中断会引发软中断。dstat源码如下:

  self.open('/proc/stat')
        self.nick = ('int', 'csw')
        self.vars = ('intr', 'ctxt')

打开这个 /proc/stat 文件看看:

cat /proc/stat |grep intr
intr 47707586836 370 0 0 0 1 0 0 0 25 177598 0 0 0 0 0 0 28 0 0 0 0 0 0 1356 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1248707 468171 236579 176923 1170510 258380 159936 138092 530318 179431 91093 91510 280776 84516 56632 46697 36605213 35396119 35150883 34212455 38451700 33671705 33605226 33923255 35373206 34823352 33881464 33572313 33225628 33575328 33564049 33483439 48526090 31620756 31692209 31814970 37979582 31340408 35108730 35197180 31659179 31661422 31824906 31656542 31340968 31337621 32226389 32660585 68 0 0 0 0 0 3 145 26 0 0 0 0 0 0 0 0 40014 40011 40011 40011 289824 40011 40011 40011 0 281826 40010 48260 40010 40010 40010 40011 40010 1 126344624 126233116 126102333 126155796 125971361 125958799 125895678 126178395 125654706 125440091 125365894 125518332 125883325 125625767 125411124 125527414 126177373 126107958 126212141 126149061 126050788 125843579 125961728 125752186 125589100 125266469 125694246 125501280 125748115 125565481 125423098 125276755 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 126507670 126280095 126189402 126205112 125922420 125625016 125958255 126058710 125710424 125216721 125444496 125439279 126009130 125328091 125438053 125536600 126157597 126249532 126042545 125723573 125715245 125903608 126097244 126006117 125560251 125709113 125654717 125996542 125638530 125566086 125691010 125252384 0 0 0 0 0 0 0 0 0 1 126421233 126433835 126435321 126525194 126305072 125889781 126068745 126402998 126085737 125708631 126680780 126623419 126965471 126743290 126635221 126091525 126021199 126130766 126176355 126151884 126431828 126009996 126311465 126268854 125935492 126034100 126218627 126215903 126133163 126006176 125883158 125869545 0 0 0 0 0 0 0 0 0 1 126691251 126773763 127364244 127402903 127260477 127358544 127290335 126775548 126392100 125867604 126007931 126094960 126330237 125788662 126054528 126195794 126174122 126425201 126464283 126458496 126804673 126540346 126811751 126536333 126009152 126090225 126065548 126357832 126689730 126436428 126161384 126048528 0 0 0 0 0 0 0 0 0 1 126578959 126547207 126686202 126505811 126125152 126113317 126247447 126585076 126452229 126230249 126505672 126266767 126308709 126268385 126156051 126211253 126436814 126601250 126533676 126528006 126401463 126280581 126519142 126353625 126485456 125964210 127544533 127302961 127552600 127385163 127382441 126594460 0 0 0 0 0 0 0 0 0 1 126970450 126429473 126800538 126507515 126541913 126451700 126588962 126563801 126471310 126171451 126062262 126282493 126435348 126128593 126445527 126167075 126645932 126615724 127508532 127250737 127200195 127289443 127597082 126671576 126489577 126026115 126237104 126815134 126429267 126541299 126599759 126219467 0 0 0 0 0 0 0 0 0 1 126719730 126473411 126290746 126332090 125812647 126240310 126159004 126359675 125926755 125707002 125503285 125491268 125985954 125547345 125649502 125528125 126580688 126267282 126350848 126427742 126213100 125896233 126237397 126169032 125553721 125655478 125600793 125801810 125909219 125794605 125383595 125282098 0 0 0 0 0 0 0 0 0 1 126569698 126574645 126495792 126381945 126316239 126211986 126182383 126231547 126241938 125785404 125741930 125549171 125918126 125974566 126061975 125623911 126309970 126624986 126118018 126538395 126714010 126111781 126333862 126225663 125899759 125745009 125888492 126055081 126380930 125590031 125962452 125763054 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

一开始我以为第二列的 47707586836 是后面的总和,所以很自然地将后面的中断数做了增量的排查,发现后面的中断数量没有暴增,但是总数却暴增了,

难道后面的中断数和总数并不一致?通过把后面的总和一加起来,发现真的对不上:

 cat /proc/stat |grep intr|awk '{b[NR]=$0; for(i=3;i<=NF;i++)a[NR]+=$i;}END{print $1,$2, a[NR]}'
intr 47688559516 33389988695

官方文档是怎么说的呢:

intr 1462898
                     This line shows counts of interrupts serviced since
                     boot time, for each of the possible system interrupts.
                     The first column is the total of all interrupts ser‐
                     viced including unnumbered architecture specific inter‐
                     rupts; each subsequent column is the total for that
                     particular numbered interrupt.  Unnumbered interrupts
                     are not shown, only summed into the total.

后面的各个irq的数量加起来,远远没有前面那个值多。看下内核源码核实一下:

根据读取的/proc/stat代码:

static int show_stat(struct seq_file *p, void *v)
{
    int i, j;
    unsigned long jif;
    cputime64_t user, nice, system, idle, iowait, irq, softirq, steal;
    cputime64_t guest, guest_nice;
    u64 sum = 0;-------------------------------------------硬
    u64 sum_softirq = 0;-----------------------------------软
    unsigned int per_softirq_sums[NR_SOFTIRQS] = {0};
    struct timespec boottime;

    user = nice = system = idle = iowait =
        irq = softirq = steal = cputime64_zero;
    guest = guest_nice = cputime64_zero;
    getboottime(&boottime);
    jif = boottime.tv_sec;

    for_each_possible_cpu(i) {
        user = cputime64_add(user, kstat_cpu(i).cpustat.user);
        nice = cputime64_add(nice, kstat_cpu(i).cpustat.nice);
        system = cputime64_add(system, kstat_cpu(i).cpustat.system);
        idle = cputime64_add(idle, get_idle_time(i));
        iowait = cputime64_add(iowait, get_iowait_time(i));
        irq = cputime64_add(irq, kstat_cpu(i).cpustat.irq);
        softirq = cputime64_add(softirq, kstat_cpu(i).cpustat.softirq);
        steal = cputime64_add(steal, kstat_cpu(i).cpustat.steal);
        guest = cputime64_add(guest, kstat_cpu(i).cpustat.guest);
        guest_nice = cputime64_add(guest_nice,
            kstat_cpu(i).cpustat.guest_nice);
        sum += kstat_cpu_irqs_sum(i);-------------------------------sum加
        sum += arch_irq_stat_cpu(i);--------------------------------sum加

        for (j = 0; j < NR_SOFTIRQS; j++) {
            unsigned int softirq_stat = kstat_softirqs_cpu(j, i);

            per_softirq_sums[j] += softirq_stat;
            sum_softirq += softirq_stat;
        }
    }
    sum += arch_irq_stat();-----------------------------------------sum加
......    /* sum again ? it could be updated? */
    for_each_irq_nr(j)----------------------------------------------这个只是遍历到 nr_irqs,也就是for (irq = 0; irq < nr_irqs; irq++),
seq_printf(p, " %u", kstat_irqs(j));--------------------------------打印各个中断自己的计数,时间上存在一点偏差,但不是本文的主要矛盾。
�。。。。。

这么看起来,sum主要是由三个函数的值构成:

kstat_cpu_irqs_sum,arch_irq_stat_cpu,arch_irq_stat 这三个函数,看起来这三个函数都差不多。下面依次来看这三个函数的实现:

static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
{
    return kstat_cpu(cpu).irqs_sum;
}

而增加这个某个中断percpu的计数的时候,会同时增加percpu变量的irqs_sum,主要是针对动态申请的irq。

#define kstat_incr_irqs_this_cpu(irqno, DESC)        \
do {                            \
    __this_cpu_inc(*(DESC)->kstat_irqs);        \
    __this_cpu_inc(kstat.irqs_sum);            \
} while (0)

grep CONFIG_GENERIC_HARDIRQS /boot/config-3.0.101-0.47.52-default

CONFIG_GENERIC_HARDIRQS=y

第二个函数:arch_irq_stat_cpu ,主要是针对arch相关的,看看它的实现如下:

/*
 * /proc/stat helpers
 */
u64 arch_irq_stat_cpu(unsigned int cpu)
{
    u64 sum = irq_stats(cpu)->__nmi_count;

#ifdef CONFIG_X86_LOCAL_APIC
    sum += irq_stats(cpu)->apic_timer_irqs;
    sum += irq_stats(cpu)->irq_spurious_count;
    sum += irq_stats(cpu)->apic_perf_irqs;
    sum += irq_stats(cpu)->apic_irq_work_irqs;
#endif
    if (x86_platform_ipi_callback)
        sum += irq_stats(cpu)->x86_platform_ipis;
#ifdef CONFIG_SMP
    sum += irq_stats(cpu)->irq_resched_count;
    sum += irq_stats(cpu)->irq_call_count;
    sum += irq_stats(cpu)->irq_tlb_count;
#endif
#ifdef CONFIG_X86_THERMAL_VECTOR
    sum += irq_stats(cpu)->irq_thermal_count;
#endif
#ifdef CONFIG_X86_MCE_THRESHOLD
    sum += irq_stats(cpu)->irq_threshold_count;
#endif
#ifdef CONFIG_X86_MCE
    sum += per_cpu(mce_exception_count, cpu);
    sum += per_cpu(mce_poll_count, cpu);
#endif
    return sum;
}

而且比较特殊的是,在cat /proc/stat里面的中断中,不会显示第二个函数的值。

第三个函数的是:arch_irq_stat :

u64 arch_irq_stat(void)
{
    u64 sum = atomic_read(&irq_err_count);
    return sum;
}

只是查看 irq_err_count 的计数而已。

这个计数一般在这个函数增长:

/*
 * This interrupt should never happen with our APIC/SMP architecture
 */
void smp_error_interrupt(struct pt_regs *regs)
{
    u32 v0, v1;
    u32 i = 0;
    static const char * const error_interrupt_reason[] = {
        "Send CS error",        /* APIC Error Bit 0 */
        "Receive CS error",        /* APIC Error Bit 1 */
        "Send accept error",        /* APIC Error Bit 2 */
        "Receive accept error",        /* APIC Error Bit 3 */
        "Redirectable IPI",        /* APIC Error Bit 4 */
        "Send illegal vector",        /* APIC Error Bit 5 */
        "Received illegal vector",    /* APIC Error Bit 6 */
        "Illegal register address",    /* APIC Error Bit 7 */
    };

    exit_idle();
    irq_enter();
    /* First tickle the hardware, only then report what went on. -- REW */
    v0 = apic_read(APIC_ESR);
    apic_write(APIC_ESR, 0);
    v1 = apic_read(APIC_ESR);
    ack_APIC_irq();
    atomic_inc(&irq_err_count);
�。。。。。

知道了sum和后面打印的各个中断计数的区别,我们来看一下有没有地方能够和sum值对上,其实在/proc/interrupt 文件中,有着能和sum对上的信息:

我用如下脚本统计一下:

linux:~ # cat /proc/interrupts |grep -v CPU |while read line; do echo $line| awk  '{b[NR]=$0; for(i=2;i<=33;i++)a[NR]+=$i;}END{print $1, a[NR]}' ;    done |awk 'BEGIN{sum=0}{sum+=$2}END{print sum}' && cat /proc/stat |grep -i intr|awk '{print $2}'
59196317851
59196355872
linux:~ # cat /proc/interrupts |grep -v CPU |while read line; do echo $line| awk  '{b[NR]=$0; for(i=2;i<=33;i++)a[NR]+=$i;}END{print $1, a[NR]}' ;    done |awk 'BEGIN{sum=0}{sum+=$2}END{print sum}' && cat /proc/stat |grep -i intr|awk '{print $2}'
59198390517
59198427650
linux:~ # cat /proc/interrupts |grep -v CPU |while read line; do echo $line| awk  '{b[NR]=$0; for(i=2;i<=33;i++)a[NR]+=$i;}END{print $1, a[NR]}' ;    done |awk 'BEGIN{sum=0}{sum+=$2}END{print sum}' && cat /proc/stat |grep -i intr|awk '{print $2}'
59198773937
59198811543

由于计算时间上的偏差,会导致两个有点偏差,但是 其增长量是一致的,感兴趣的可以点到这两个脚本的顺序看下,所以这个文件是和/proc/stat中intr那个总和一致的。当然,

我们也可以用源代码来确定下:

int show_interrupts(struct seq_file *p, void *v)
{
    static int prec;

    unsigned long flags, any_count = 0;
    int i = *(loff_t *) v, j;
    struct irqaction *action;
    struct irq_desc *desc;

    if (i > ACTUAL_NR_IRQS)
        return 0;

    if (i == ACTUAL_NR_IRQS)
        return arch_show_interrupts(p, prec);----------------------------统计arch相关的硬中断数量

    /* print header and calculate the width of the first column */
    if (i == 0) {
        for (prec = 3, j = 1000; prec < 10 && j <= nr_irqs; ++prec)
            j *= 10;

        seq_printf(p, "%*s", prec + 8, "");
        for_each_online_cpu(j)
            seq_printf(p, "CPU%-8d", j);
        seq_putc(p, '\n');
    }

    desc = irq_to_desc(i);
    if (!desc)
        return 0;

    raw_spin_lock_irqsave(&desc->lock, flags);
    for_each_online_cpu(j)
        any_count |= kstat_irqs_cpu(i, j);
    action = desc->action;
    if (!action && !any_count)
        goto out;

    seq_printf(p, "%*d: ", prec, i);
    for_each_online_cpu(j)
        seq_printf(p, "%10u ", kstat_irqs_cpu(i, j));---------------统计各个中断的各个cpu的触发数量。sum里面第一个函数的统计
。。。。。

调用了arch_show_interrupts:

/*
 * /proc/interrupts printing for arch specific interrupts
 */
int arch_show_interrupts(struct seq_file *p, int prec)---------------sum里面的第二个函数的统计
{
    int j;

    seq_printf(p, "%*s: ", prec, "NMI");
    for_each_online_cpu(j)
        seq_printf(p, "%10u ", irq_stats(j)->__nmi_count);
    seq_printf(p, "  Non-maskable interrupts\n");
#ifdef CONFIG_X86_LOCAL_APIC
    seq_printf(p, "%*s: ", prec, "LOC");
    for_each_online_cpu(j)
        seq_printf(p, "%10u ", irq_stats(j)->apic_timer_irqs);
    seq_printf(p, "  Local timer interrupts\n");

    seq_printf(p, "%*s: ", prec, "SPU");
    for_each_online_cpu(j)
        seq_printf(p, "%10u ", irq_stats(j)->irq_spurious_count);
    seq_printf(p, "  Spurious interrupts\n");
    seq_printf(p, "%*s: ", prec, "PMI");
    for_each_online_cpu(j)
        seq_printf(p, "%10u ", irq_stats(j)->apic_perf_irqs);
    seq_printf(p, "  Performance monitoring interrupts\n");
    seq_printf(p, "%*s: ", prec, "IWI");
    for_each_online_cpu(j)
        seq_printf(p, "%10u ", irq_stats(j)->apic_irq_work_irqs);
    seq_printf(p, "  IRQ work interrupts\n");
#endif
    if (x86_platform_ipi_callback) {
        seq_printf(p, "%*s: ", prec, "PLT");
        for_each_online_cpu(j)
            seq_printf(p, "%10u ", irq_stats(j)->x86_platform_ipis);
        seq_printf(p, "  Platform interrupts\n");
    }
#ifdef CONFIG_SMP
    seq_printf(p, "%*s: ", prec, "RES");
    for_each_online_cpu(j)
        seq_printf(p, "%10u ", irq_stats(j)->irq_resched_count);
    seq_printf(p, "  Rescheduling interrupts\n");
    seq_printf(p, "%*s: ", prec, "CAL");
    for_each_online_cpu(j)
        seq_printf(p, "%10u ", irq_stats(j)->irq_call_count);
    seq_printf(p, "  Function call interrupts\n");
    seq_printf(p, "%*s: ", prec, "TLB");
    for_each_online_cpu(j)
        seq_printf(p, "%10u ", irq_stats(j)->irq_tlb_count);
    seq_printf(p, "  TLB shootdowns\n");
#endif
#ifdef CONFIG_X86_THERMAL_VECTOR
    seq_printf(p, "%*s: ", prec, "TRM");
    for_each_online_cpu(j)
        seq_printf(p, "%10u ", irq_stats(j)->irq_thermal_count);
    seq_printf(p, "  Thermal event interrupts\n");
#endif
#ifdef CONFIG_X86_MCE_THRESHOLD
    seq_printf(p, "%*s: ", prec, "THR");
    for_each_online_cpu(j)
        seq_printf(p, "%10u ", irq_stats(j)->irq_threshold_count);
    seq_printf(p, "  Threshold APIC interrupts\n");
#endif
#ifdef CONFIG_X86_MCE
    seq_printf(p, "%*s: ", prec, "MCE");
    for_each_online_cpu(j)
        seq_printf(p, "%10u ", per_cpu(mce_exception_count, j));
    seq_printf(p, "  Machine check exceptions\n");
    seq_printf(p, "%*s: ", prec, "MCP");
    for_each_online_cpu(j)
        seq_printf(p, "%10u ", per_cpu(mce_poll_count, j));
    seq_printf(p, "  Machine check polls\n");
#endif
    seq_printf(p, "%*s: %10u\n", prec, "ERR", atomic_read(&irq_err_count));------------------------这个就是sum里面最后一个函数的统计
#if defined(CONFIG_X86_IO_APIC)
    seq_printf(p, "%*s: %10u\n", prec, "MIS", atomic_read(&irq_mis_count));
#endif
    return 0;
}

所以,源码决定了:

/proc/interrupt 中的中断计数的总数是和/proc/stat中intr 的第一列,是一致的。

下面就好办了,脚本跟踪一下看谁暴增,抓到的是这个中断:

 LOC:  525581694  514172958  466070724  423266312  404025702  385033843  379205558  372825646  483824260  489761020  441090851  404241012  383122024  365739399  358834335  352940307  466060977  454682754  416203259  384527958  367710479  354374016  348030881  344457326  434930494  435822175  381035648  356172741  343146180  333338249  328147660  329348394   Local timer interrupts

通过走查代码,我们会将hrtimer的超时时间设置为当前时间+timer_offset,我们发现有一个异常分支的时候,timer_offset 会为0,这个就会导致这个hrtimer迅速触发,触发之后,如果没有别的流程影响的话,这个为0的offset会再次被设置为0,直到另外一个条件满足才回归正常值。

总结:有同事问到,为什么我们看到siq的占比 增长,以及int 异常的时候,不先查软中断,而去查硬中断,这个其实是个经验问题。

因为大体上,硬中断受软中断影响较小,而在硬中断处理的时候,会顺带处理软中断,所以遇到两者同变化的时候,优先从硬中断入手,总是没错的。

在修改了正确的流程之后,本文还有个遗留的问题,就是int数正常之后,还是会出现dstat的miss tick,siq的占比依然会异常,但硬中断数目基本上还算正常。

 13  13  41  16   0  17|4827M   17M| 378k 4969M|   0     0 | 703k  312k|20-09 16:53:28| 117G  128M 95.9G 38.7G
  5  31  10   4   0  50|5089M   48k| 844k 6143M|   0     0 | 726k  149k|20-09 16:53:30| 117G  128M 95.6G 38.7G missed 2 ticks
 33  25   5  11   0  26|3332M  316k| 532k 2035M|   0     0 | 513k  172k|20-09 16:53:31| 117G  128M 95.3G 39.0G
 44  16   1  25   0  13|4959M  304k| 470k 3855M|   0     0 | 707k  259k|20-09 16:53:32| 117G  128M 95.3G 39.0G
 44  14   2  25   0  15|4896M  276k| 315k 4846M|   0     0 | 715k  266k|20-09 16:53:33| 117G  128M 95.3G 39.1G

解决完硬中断暴增这个问题,软中断cpu占比这个问题留着下篇博客来讲。