Go checksum database

上了床,抱着电脑准备睡觉,突然在网上看到了一篇文章,“Let’s Encrypt发布自己的Certificate Transparency日志”, 于是想静下心来写写这篇文章。

Go module

Golang 在 1.11 版本中推出了 Go module,试图解决长久以来被大家诟病的包依赖问题,非常有幸能参与到这个功能的原型开发中。但是由于还有一些功能没有完成,在 1.11 和 1.12 版本,包括即将在8月份发布的 1.13 版本中这项功能都不会被默认打开,但是每个版本都在不断的推进这项工作,从 1.11 的 off -> 1.12 的 auto1,1.12 的 auto1 -> 1.13 的 auto2,最终 1.13 的 auto2 -> 1.14 的 on。可以告诉大家的是这项功能基本已经符合线上标准,使用 GO111MODULE=on 环境变量开启改功能,具体使用可以参考官方的博客

Security

不得不说,Go 官方对安全问题前所未有的重视,随着 Go module 功能的引入,包依赖的安全问题引起了 Go team 的关注,经过激烈的讨论,最终决定推出一个全球的依赖包 Certificate Transparency log 中心。CT log最初其实是在证书领域被应用的,由于一些CA乱颁发或者由于安全问题被签发了很多有问题的可信证书,最终导致严重的安全问题。由于种种说不明的原因,我不想举例。 CT服务的推出让所有被签发的证书公开透明,很容易就能发现有人恶意签发可信证书,窃取用户加密数据。当然了,现在的chrome早就支持了CT。

回过头来,为了保证开发者的依赖库不被人恶意劫持篡改,Go team 推出了 Go module checksum database。服务器地址为:sum.golang.org。当你在本地对依赖进行变动(更新/添加)操作时,Go 会自动去这个服务器进行数据校验,保证你下的这个代码库和世界上其他人下的代码库是一样的。如果有问题,会有个大大的安全提示。当然背后的这些操作都已经集成在 Go 里面了,开发者不需要进行额外的操作。

下面是使用说明书,别告诉我你看不懂,翻译起来很累,而且英文理解起来其实更精准,加油。

The go command tries to authenticate every downloaded module,
checking that the bits downloaded for a specific module version today
match bits downloaded yesterday. This ensures repeatable builds
and detects introduction of unexpected changes, malicious or not.

In each module's root, alongside go.mod, the go command maintains
a file named go.sum containing the cryptographic checksums of the
module's dependencies.

The form of each line in go.sum is three fields:

	<module> <version>[/go.mod] <hash>

Each known module version results in two lines in the go.sum file.
The first line gives the hash of the module version's file tree.
The second line appends "/go.mod" to the version and gives the hash
of only the module version's (possibly synthesized) go.mod file.
The go.mod-only hash allows downloading and authenticating a
module version's go.mod file, which is needed to compute the
dependency graph, without also downloading all the module's source code.

The hash begins with an algorithm prefix of the form "h<N>:".
The only defined algorithm prefix is "h1:", which uses SHA-256.

Module authentication failures

The go command maintains a cache of downloaded packages and computes
and records the cryptographic checksum of each package at download time.
In normal operation, the go command checks the main module's go.sum file
against these precomputed checksums instead of recomputing them on
each command invocation. The 'go mod verify' command checks that
the cached copies of module downloads still match both their recorded
checksums and the entries in go.sum.

In day-to-day development, the checksum of a given module version
should never change. Each time a dependency is used by a given main
module, the go command checks its local cached copy, freshly
downloaded or not, against the main module's go.sum. If the checksums
don't match, the go command reports the mismatch as a security error
and refuses to run the build. When this happens, proceed with caution:
code changing unexpectedly means today's build will not match
yesterday's, and the unexpected change may not be beneficial.

If the go command reports a mismatch in go.sum, the downloaded code
for the reported module version does not match the one used in a
previous build of the main module. It is important at that point
to find out what the right checksum should be, to decide whether
go.sum is wrong or the downloaded code is wrong. Usually go.sum is right:
you want to use the same code you used yesterday.

If a downloaded module is not yet included in go.sum and it is a publicly
available module, the go command consults the Go checksum database to fetch
the expected go.sum lines. If the downloaded code does not match those
lines, the go command reports the mismatch and exits. Note that the
database is not consulted for module versions already listed in go.sum.

If a go.sum mismatch is reported, it is always worth investigating why
the code downloaded today differs from what was downloaded yesterday.

The GOSUMDB environment variable identifies the name of checksum database
to use and optionally its public key and URL, as in:

	GOSUMDB="sum.golang.org"
	GOSUMDB="sum.golang.org+<publickey>"
	GOSUMDB="sum.golang.org+<publickey> https://sum.golang.org"

The go command knows the public key of sum.golang.org; use of any other
database requires giving the public key explicitly. The URL defaults to
"https://" followed by the database name.

GOSUMDB defaults to "sum.golang.org" when GOPROXY="https://proxy.golang.org"
and otherwise defaults to "off". NOTE: The GOSUMDB will later default to
"sum.golang.org" unconditionally.

If GOSUMDB is set to "off", or if "go get" is invoked with the -insecure flag,
the checksum database is never consulted, but at the cost of giving up the
security guarantee of verified repeatable downloads for all modules.
A better way to bypass the checksum database for specific modules is
to use the GONOSUMDB environment variable.

The GONOSUMDB environment variable is a comma-separated list of
patterns (in the syntax of Go's path.Match) of module path prefixes
that should not be compared against the checksum database.
For example,

	GONOSUMDB=*.corp.example.com,rsc.io/private

disables checksum database lookups for modules with path prefixes matching
either pattern, including "git.corp.example.com/xyzzy", "rsc.io/private",
and "rsc.io/private/quux".

不能说的秘密

由于众所周知的原因,golang 的服务器由 Google 托管,所以这项服务我们并不能顺利享受到,那么我们如何能享受到这项集成在 1.13 版本中的服务呢,我业余时间写了个服务:gosum.io, 这个基本就是民间实现了,大家到时候可以试用一下,通过设置环境变量进行配置:

Bash:

export GOSUMDB=gosum.io+ce6e7565+AY5qEHUk/qmHc5btzW45JVoENfazw8LielDsaI+lEbq6

PowerShell:

$env:GOSUMDB = "gosum.io+ce6e7565+AY5qEHUk/qmHc5btzW45JVoENfazw8LielDsaI+lEbq6"

Disable:

 GOSUMDB='off'

但是需要说明的是,Google 用的是高大上的 spanner 最为后端的数据存储,我自己实现了 cockroachdb 作为后端存储,因为 spanner 真的太贵了,用了两天时间花了我400多港币。出于安全考虑,Google 一直希望自己能运行全球唯一一个这样的服务,但是社区是不同意这个服务一直放在 Google 手里的,毕竟还有一些竞争对手在使用go语言,对于远在彼岸的我们,主要是访问不到,在这里也可以说明一下,虽然我运行的 gosum.io 也遵循 Google 的用户隐私条款,但是处于安全考虑(服务器端劫持的可能)如果大家能使用 Google 的服务, 还是使用官方的服务(当然也可能被劫持)。

Goproxy

新版本的 Go 也将对 goproxy 进行一系列的改进和提升。

  • goproxy 变量将支持列表,第一个访问不到将使用第二个 goproxy 服务器,依次往后。
  • 加入 GONOPROXY 方便用户灵活配置不需要走代理的仓库(企业用户),保护用户隐私。
  • goproxy 支持 Go checksum database 代理,这个在一定程度上解决了beiqiang的问题。

虽然新版本的 Go 还没发布,但是 goproxy.io 已经率先完成了对 sum.golang.org 的代理的支持。

More

我们遇到的问题其实 Go team 都看得到,他们知道我们的痛点,但是有些问题不是几个人,一个team,甚至一个公司可以解决的,但是他们没有放弃,rsc 最近相继开源了 sumdbgoproxy 的开源版本。希望社区有些组织和个人能站出来互相帮助。

References: