GCP Cloud NAT + Golang’s HTTP client = 🤯

Sebastian Kunze
4 min readOct 14, 2020

--

» I sense a soul in search of answers… «
Adria (Diablo I)

Skip to TL;DR

It simply does not become boring in the tech area. Yesterday we had a funny problem with an actually easy task:

Execute 10.006.525 PUT requests with different parameters against an API

Easy right? I thought so too. But the sequential execution of 10 million requests would certainly take some time, and I like to build things and not watch them run. Let’s do it in Golang and use some goroutines for that. Said and done, and a few moments later I had my program running on my local machine. go build and scp the whole thing to a worker virtual machine running in the Google Cloud and execute:

INFO[2020-10-13T15:32:51Z] Request #1 finished
INFO[2020-10-13T15:32:51Z] Request #2 finished
-- snip --
INFO[2020-10-13T15:32:51Z] Request #62 finished
INFO[2020-10-13T15:32:51Z] Request #63 finished
INFO[2020-10-13T15:32:52Z] Request #64 finished

hmm why did it stop after 64 request? I checked my code, checked it again but i could not find anything unusual. If it is not the developer’s fault, it must be the bloody infrastructure. As always …

So i reached out to David Gleich, one of my company’s Google Cloud experts, and asked him for advise. The first thing we did was to build a simplified dummy without any bells and whistles to rule out a issue with the goroutines:

package main

import (
"log"
"net/http"
)

func main() {
cnt := 0
client := &http.Client{}

for {
cnt++
resp, err := client.Get("https://example.com/")
if err != nil {
log.Printf("Error on request #%d: %s", cnt, err)
continue
}
resp.Body.Close()
log.Printf("Request #%d finished", cnt)
}
}

But we ran into the same problem:

INFO[2020-10-13T15:48:51Z] Request #1 finished
INFO[2020-10-13T15:48:51Z] Request #2 finished
-- snip --
INFO[2020-10-13T15:48:51Z] Request #62 finished
INFO[2020-10-13T15:48:51Z] Request #63 finished
INFO[2020-10-13T15:48:52Z] Request #64 finished

Do you use Cloud NAT with this machine? David asked. Yes, because this virtual machine does not need to be reachable from the outside and therefore does not have an external IP address, but needs Cloud NAT to reach the Internet.

OK lets check the Cloud NAT setup:

64, that sounds familiar. The tooltip describes Minimum ports per VM instances as:

By default, a VM instance has 64 ports, which means it can open 64 connections to the same destination IP and port. A VM instance can open up to 64,000 connections across all protocols.

Aha big brain time. Lets change this Cloud NAT setting to 128.

INFO[2020-10-13T16:05:51Z] Request #1 finished
INFO[2020-10-13T16:05:51Z] Request #2 finished
-- snip --
INFO[2020-10-13T16:05:58Z] Request #126 finished
INFO[2020-10-13T16:05:58Z] Request #127 finished
INFO[2020-10-13T16:05:58Z] Request #128 finished

So it seems that there is some kind of correlation 😲 Bloody infrastructure, i told you … But why does the program not close or reuse the TCP connection? We already had the usual resp.Body.Close(). After some try and error we found this Stack Overflow post: Why is Go HTTPS Client not reusing connections? and there Matt Self answered:

The default client does reuse connections.

Be sure to close the response.

Callers should close resp.Body when done reading from it. If resp.Body is not closed, the Client’s underlying RoundTripper (typically Transport) may not be able to re-use a persistent TCP connection to the server for a subsequent “keep-alive” request.

Additionally, I’ve found that I also needed to read until the response was complete before calling Close().

e.g.

res, _ := client.Do(req)
io.Copy(ioutil.Discard, res.Body)
res.Body.Close()

Let’s give it a try:

package main

import (
"io"
"io/ioutil"
"log"
"net/http"
)

func main() {
cnt := 0
client := &http.Client{}

for {
cnt++
resp, err := client.Get("https://example.com/")
if err != nil {
log.Printf("Error on request #%d: %s", cnt, err)
continue
}
io.Copy(ioutil.Discard, resp.Body)
resp.Body.Close()
log.Printf("Request #%d finished", cnt)
}
}

go build, scp and fingers crossed:

INFO[2020-10-13T16:31:51Z] Request #1 finished
INFO[2020-10-13T16:31:51Z] Request #2 finished
-- snip --
INFO[2020-10-13T16:31:51Z] Request #126 finished
INFO[2020-10-13T16:31:51Z] Request #127 finished
INFO[2020-10-13T16:31:51Z] Request #128 finished
INFO[2020-10-13T16:31:51Z] Request #129 finished
INFO[2020-10-13T16:31:51Z] Request #130 finished
INFO[2020-10-13T16:31:51Z] Request #131 finished
-- and they lived happily ever after --

Very nice.

We could have stopped right there, but that is a little unsatisfactory. I assumed that the almightyresp.Body.Close() takes care of everything and looking at Golang’s code it looks like it already does. So the new question is:

Why did it not work in our setup?

Right now David Gleich and I are digging a little deeper and hopefully find something on our way through the debugger. I will update this article as soon as there is something new. If you have any idea please drop a comment below.

TL;DR

You need to fully read the body of the response before calling Close().

resp, _ := client.Do(req)
io.Copy(ioutil.Discard, resp.Body)
resp.Body.Close()

Source: Why is Go HTTPS Client not reusing connections? by Matt Self.

Disclaimer

  1. Part of this material might be labeled beta or even alpha by Google. These products or features are in a pre-release state and might change or have limited support. For more information, see the product launch stages. I will do my best to keep this guide up to date when something changes.
  2. It’s the year 2020 and actually I should not have to mention it. Please do not blindly copy & paste stuff from the Internet into your shell.

--

--