背景:公司线上服务器,FullGC耗时超过1秒报警。
2019-09-16T11:01:25.287+0800: 9566486.997: [GC (Allocation Failure) 2019-09-16
T11:01:25.288+0800: 9566486.997: [ParNew: 1683966K->4564K(1887488K), 0.0238815
secs] 31513 45K->1472607K(3984640K), 0.0245668 secs] [Times: user=0.09 sys=0.00,
real=0.02 secs] 2019-09-16T11:01:25.316+0800: 9566487.025: [GC (CMS Initial
Mark) [1 CMS-initial-mark: 1468043K(2097152K)] 1473371K(3984640K), 0.0139600
secs] [Times: user=0.02 sys=0.0 0, real=0.02 secs] 2019-09-16T11:01:25.330+0800:
9566487.039: [CMS-concurrent-mark-start] 2019-09-16T11:01:26.007+0800:
9566487.716: [CMS-concurrent-mark: 0.666/0.677 secs] [Times: user=0.95 sys=0.09,
real=0.67 secs] 2019-09-16T11:01:26.007+0800: 9566487.717: [CMS-concurrent-
preclean-start] 2019-09-16T11:01:26.023+0800: 9566487.733: [CMS-concurrent-
preclean: 0.015/0.016 secs] [Times: user=0.02 sys=0.01, real=0.02 secs] 2019-09-
16T11:01:26.023+0800: 9566487.733: [CMS-concurrent-abortable-preclean-start] CMS
: abort preclean due to time 2019-09-16T11:01:31.115+0800: 9566492.825: [CMS-
concurrent-abortable-preclean: 2.679/5.092 secs] [Times: user=3.46 sys=0.32,
real=5.09 secs] 2019-09-16T11:01:31.119+0800: 9566492.828: [GC (CMS Final Remark
) [YG occupancy: 422286 K (1887488 K)]2019-09-16T11:01:31.119+0800: 9566492.829:
[GC (CMS Final Remark) 2019-09-16T11:01:31.120+0800: 9566492.829: [ParNew:
422286K->3266K(1887488K), 0.0261946 secs] 1890329K->1471476K(3984640K),
0.0268209 secs] [Times: user=0.07 sys=0.00, real=0.02 secs] 2019-09-16T11:01:
31.146+0800: 9566492.856: [Rescan (parallel) , 0.0102439 secs]2019-09-16T11:01:
31.157+0800: 9566492.866: [weak refs processing, 1.6619564 secs]2019-09- 16T11:
01:32.819+0800: 9566494.528: [class unloading, 0.1607796 secs]2019-09-16T11:01:
32.979+0800: 9566494.689: [scrub symbol table, 0.0317450 secs]2019-09-16T11:01:
33.0 11+0800: 9566494.720: [scrub string table, 0.0030630 secs][1 CMS-remark:
1468209K(2097152K)] 1471476K(3984640K), 1.9392917 secs] [Times: user=2.01 sys=
0.00, real=1.93 s ecs] 2019-09-16T11:01:33.059+0800: 9566494.768: [CMS-
concurrent-sweep-start] 2019-09-16T11:01:34.235+0800: 9566495.945: [CMS-
concurrent-sweep: 1.056/1.176 secs] [Times: user=1.26 sys=0.00, real=1.18 secs]
2019-09-16T11:01:34.236+0800: 9566495.945: [CMS-concurrent-reset-start] 2019-09-
16T11:01:34.259+0800: 9566495.968: [CMS-concurrent-reset: 0.023/0.023 secs] [
Times: user=0.02 sys=0.00, real=0.02 secs] 2019-09-16T11:03:37.270+0800:
9566618.979: [GC (Allocation Failure) 2019-09-16T11:03:37.270+0800: 9566618.979:
[ParNew: 1681090K->4156K(1887488K), 0.0252135 secs] 25335 50K->856782K(3984640K)
, 0.0259454 secs] [Times: user=0.10 sys=0.00, real=0.02 secs]
CMS收集器,是基于“标记-清除”算法实现的,运作过程分为4个步骤,包括:
初始标记(CMS initial Mark)
并发标记(CMS concurrent Mark)
重新标记(CMS Remark)
并发清除(CMS concurrent sweep)
其中,初始标记、重新标记这两个步骤是“Stop The World”。
报警内容提示:FullGC 耗时时长 1953ms,根据以上日志,你可以计算出1953的数值么?可以先思考一下。
这里想先引申一个问题:jstat -gcutil
$jstat -gcutil 1837 1000 Warning: Unresolved Symbol: sun.gc.generation.2.space.
0.capacity substituted NaN Warning: Unresolved Symbol: sun.gc.generation.2.space
.0.used substituted NaN Warning: Unresolved Symbol: sun.gc.generation.2.space.0.
capacity substituted NaN S0 S1 E O P YGC YGCT FGC FGCT GCT1.24 0.00 7.22 4.64 �
103180 3024.968 9 6.840 3031.809 1.24 0.00 16.24 4.64 � 103180 3024.968 9 6.840
3031.809 1.24 0.00 24.95 4.64 � 103180 3024.968 9 6.840 3031.809 1.24 0.00 25.43
4.64 � 103180 3024.968 9 6.840 3031.809 1.24 0.00 25.55 4.64 � 103180 3024.968 9
6.840 3031.809 1.24 0.00 25.97 4.64 � 103180 3024.968 9 6.840 3031.809 1.24 0.00
26.37 4.64 � 103180 3024.968 9 6.840 3031.809
根据上图,我们可以看到FGC次数是9次,而我看应用启动之后的FullGC次数只有3次,执行jmap
-dump:live,format=b,file=heap-dump.bin <pid> dump文件引发的,与三次CMS。
最后查资料发现一个比较合理的说法,这边的FGC统计的是JVM STW的次数,而一次CMS会在Initial Mark 与Remark
有两次STW。所以最终的FGC = CMS*2 + FullGC-dump。
回归到上面的问题:CMS-FGC时长:GC日志中我们可以看到:
2019-09-16T11:01:25.316+0800: 9566487.025: [GC (CMS Initial Mark) [1 CMS-
initial-mark: 1468043K(2097152K)] 1473371K(3984640K), 0.0139600 secs] [Times:
user=0.02 sys=0.0 0, real=0.02 secs]
其中
[1 CMS-initial-mark: 1468043K(2097152K)] 1473371K(3984640K), 0.0139600 secs]
意思就是,CMS Initial Mark 初始标记“Stop The World” 0.0139600 secs
初始化标记我们已经获得时间了,那么CMS remark呢。
2019-09-16T11:01:31.119+0800: 9566492.828: [GC (CMS Final Remark) [YG occupancy
: 422286 K (1887488 K)]2019-09-16T11:01:31.119+0800: 9566492.829: [GC (CMS
Final Remark) 2019-09-16T11:01:31.120+0800: 9566492.829: [ParNew: 422286K->3266K
(1887488K), 0.0261946 secs] 1890329K->1471476K(3984640K), 0.0268209 secs] [Times
: user=0.07 sys=0.00, real=0.02 secs] 2019-09-16T11:01:31.146+0800: 9566492.856:
[Rescan (parallel) , 0.0102439 secs]2019-09-16T11:01:31.157+0800: 9566492.866: [
weak refs processing, 1.6619564 secs]2019-09- 16T11:01:32.819+0800: 9566494.528:
[class unloading, 0.1607796 secs]2019-09-16T11:01:32.979+0800: 9566494.689: [
scrub symbol table, 0.0317450 secs]2019-09-16T11:01:33.0 11+0800: 9566494.720: [
scrub string table, 0.0030630 secs][1 CMS-remark: 1468209K(2097152K)] 1471476K(
3984640K), 1.9392917 secs] [Times: user=2.01 sys=0.00, real=1.93 s ecs]
这边我们看到[GC (CMS Final Remark)...,但是这一行却没有标记耗时,这是为什么呢?
这边就涉及到CMS重新标记的设计了,简单说:
* 重新标记之前因为会需要扫描年轻代,为什么会需要扫描年轻代?
* 因为年老代对象,会存在被年轻代GC Root可达的情况。(后期待补充)
* 扫描年轻代的话,会因为年轻代对象比较多,而耗时较高,所以在扫描之前,最好是可以进行一次年轻代的回收。实际上,在这以前还会做一次
CMS-concurrent-abortable-preclean。什么是CMS-concurrent-abortable-preclean?(后期待补充)
* 所以,[GC (CMS Final Remark)...仅仅只是代表了,我即将要做重新标记了,而真正重新标记是发生在[Rescan (parallel)
这一行中了。 [1 CMS-remark: 1468209K(2097152K)] 1471476K(3984640K), 1.9392917 secs]
那么这里我们可以获取到,CMS Remark 重新标记“Stop The World” 1.9392917 secs。
总计:CMS Initial Mark(0.0139600 secs) + CMS Remark(1.9392917 secs)=1953.2517ms