A clock never keeps perfect time

Leap second

A leap second is a one-second adjustment that is occasionally applied to Coordinated Universal Time (UTC) in order to keep its time of day close to the mean solar time, or UT1. Without such a correction, time reckoned by Earth's rotation drifts away from atomic time because of irregularities in the Earth's rate of rotation. Since this system of correction was implemented in 1972, 27 leap seconds have been inserted, the most recent on December 31, 2016 at 23:59:60 UTC. (from wikipedia)

在最近的一次闰秒(December 31, 2016 at 23:59:60 UTC)时,也顺便学习了其他几家公司是如何在众多服务器和业务之上顺利度过闰秒的。

让我比较惊讶的是京东金融的这次方案做的很是用心。

Cloudflare caught out by much expected leap second

在UTC时间的新年午夜,Cloudflare的一个自己的RRDNS服务受到了闰秒的影响,程序异常退出了。这次故障影响了Cloudflare的102个数据中心的一小部分CNAME记录,影响了总DNS查询记录的0.2%,%1的HTTP请求出现了故障。让我们来看看这次故障的细节。

RRDNS 是用Go写的,并且使用了标准库中的 time.Now()函数,不幸的是,这个函数拿到的时间并不能保证是单调的,Go 现在并不支持monotonic time。在RRDNS中有这样一段代码:


// Update upstream sRTT on UDP queries, penalize it if it fails
if !start.IsZero() {
    rtt := time.Now().Sub(start)
    if success && rcode != dns.RcodeServerFailure {
        s.updateRTT(rtt)
    } else {
        // The penalty should be a multiple of actual timeout
        // as we don't know when the good message was supposed to arrive,
        // but it should not put server to backoff instantly
        s.updateRTT(TimeoutPenalty * s.timeout)
    }
}

在上面的代码中, 如果time.Now()拿到的时间比start早, 那么rtt就有可能是负值。如果时间一直正常往前走这段代码是没有问题的,不幸的是,如果时间发生了跳跃,比如说闰秒,那么rtt就可能是负值。有意思的是这个rtt值会传递到Go的 rand.Int63n()函数。如果传递的是一个负值,rand.Int63n()就会panic,这就从而造成了这次的闰秒故障。

当然这个问题的修复也很简单,判断下rtt是否为负值即可,另外也赞下Cloudflare的响应处理速度。

2017-01-01 00:00 UTC Impact starts 
2017-01-01 00:10 UTC Escalated to engineers 
2017-01-01 00:34 UTC Issue confirmed 
2017-01-01 00:55 UTC Mitigation deployed to one canary node and confirmed 
2017-01-01 01:03 UTC Mitigation deployed to canary data center and confirmed 
2017-01-01 01:23 UTC Fix deployed in most impacted data center 
2017-01-01 01:45 UTC Fix being deployed to major data centers 
2017-01-01 01:48 UTC Fix being deployed everywhere 
2017-01-01 02:50 UTC Fix rolled out to most of the affected data centers 
2017-01-01 06:45 UTC Impact ends

Monotonic Elapsed Time Measurements

“The wall clock is for telling time. The monotonic clock is for measuring time.”

现在的Go标准库的time API是Rob和rsc在2011年设计的,Go定义了 time.Time这种类型,time.Now()会返回现在的时间,t.Sub(u) 计算两个时间差,但是这是个wall clock time,而且在Go编程中,被广泛用于计算时间差,这些函数的底层实现是读取系统的wall clock,而不是monotonic clock,如果系统时钟重置或跳跃,时间测量就不准确。

Go最初的设计是用在google内部的生产环境,google的wall clock never resets。

Design

  • overload time.Time
  • provide a separate API for accessing the monotonic clock

上面是两种实现方式,大部分语言都是选择了第二种,但是为了保证不影响现有的代码库*https://golang.org/doc/go1compat*,dev团队打算使用第一种。

现在的time.Time:

type Time struct {
    sec  int64     // seconds since Jan 1, year 1 00:00:00 UTC
    nsec int32     // nanoseconds, in [0, 999999999]
    loc  *Location // location, for minute, hour, month, day, year
}

改进后的time.Time:

type Time struct {
    wall uint64    // wall time: 1-bit flag, 33-bit sec since 1950, 30-bit nsec
    ext  int64     // extended time information
    loc  *Location // location
}

统计了github上start最多的前100 Go项目,目前大约70%的time调用都没有单调的需求。剩下的30%有:

Basic counts:

4910 time.Now()
1511 time.Now().Add
  45 time.Now().AddDate
  69 time.Now().After
  77 time.Now().Before
   4 time.Now().Date
   5 time.Now().Day
   1 time.Now().Equal
 130 time.Now().Format
  23 time.Now().In
   8 time.Now().Local
   4 time.Now().Location
   1 time.Now().MarshalBinary
   2 time.Now().MarshalText
   2 time.Now().Minute
  68 time.Now().Nanosecond
  14 time.Now().Round
  22 time.Now().Second
  37 time.Now().String
 370 time.Now().Sub
  28 time.Now().Truncate
 570 time.Now().UTC
 582 time.Now().Unix
8067 time.Now().UnixNano
  17 time.Now().Year
   2 time.Now().Zone

不受影响的:

  45 time.Now().AddDate
   4 time.Now().Date
   5 time.Now().Day
 130 time.Now().Format
  23 time.Now().In
   8 time.Now().Local
   4 time.Now().Location
   1 time.Now().MarshalBinary
   2 time.Now().MarshalText
   2 time.Now().Minute
  68 time.Now().Nanosecond
  14 time.Now().Round
  22 time.Now().Second
  37 time.Now().String
  28 time.Now().Truncate
 570 time.Now().UTC
 582 time.Now().Unix
8067 time.Now().UnixNano
  17 time.Now().Year
   2 time.Now().Zone
9631 TOTAL

受影响的:

4910 time.Now()
1511 time.Now().Add
  69 time.Now().After
  77 time.Now().Before
   1 time.Now().Equal
 370 time.Now().Sub
6938 TOTAL

未来的设计可能是这样的:

  • time.Now() 返回一个 wall+monotonic time
  • 对于t.Add(d), 无论是wall+monotonic time或是wall-only,都是这个结果
  • time.Date, time.Unix, t.AddDate, t.In, t.Local, t.Round, t.Truncate, t.UTC这些函数都返回一个wall-only time
  • 对于t.Sub(u)而言,如果t和u都是wall+monotonic time,那么计算的结果也是一个wall+monotonic time,否则是一个all-only time。
  • t.After(u), t.Before(u), t.Equal(u) compare monotonics if available (just like t.Sub(u)), otherwise walls.
  • all the other functions that operate on time.Times use the wall time only. These include: t.Day, t.Format, t.Month, t.Unix, t.UnixNano, t.Year, and so on

如果正好在闰秒前:

t1 := time.Now()
... 10 ms of work
t2 := time.Now()
... 10 ms of work
t3 := time.Now()
... 10 ms of work
const f = "15:04:05.000"
fmt.Println(t1.Format(f), t2.Sub(t1), t2.Format(f), t3.Sub(t2), t3.Format(f))

新的输出是这样的:

23:59:59.985 10ms 23:59:59.995 10ms 23:59:59.005

它比下面的输出要正确的多:

23:59:59.985 10ms 23:59:59.995 -990ms 23:59:59.005

事实是 在t2和t3之前是 10ms 的间隔,而非 -990ms。