The Go Programming Language. Notes.

2024-04-06 01:45•数据库•阅读 1560

Tutorial
    Hello, World
    Command-Line Arguments
    Finding Duplicate Lines
    A Web Server
    Loose Ends
Program Structure
    Names
    Declarations
    Variables
    Assignments
    Type Declarations
    Packages and Files
    Scope
Basic Data Types
    Integers
    Floating-Point Number
    Booleans
    Strings
        Strings and Byte Slices
    Composite Types
Arrays
    Slices
    Maps
    Structs
    JSON
Functions
    Function Declarations
    Recursion
    Multiple Return Values
    Errors
        Error-Handling Strategies
    Function Values
    Anonymous Functions
    Variadic Functions
    Deferred Function Calls
    Panic
Methods
    Method Declarations
    Methods with a Pointer Receiver
        Nil Is a Valid Receiver Value
    Composing Types by Struct Embedding
    Method Values and Expressions
    Example: Bit Vector Type
    Encapsulation
Interfaces
    Interfaces as Contracts
    Interface Types
    Interface Satisfaction
    Parsing Flags with flag.Value
    Interface Values
        Caveat: An Interface containing a Nil Pointer Is Non-Nil
    Sorting with sort.Interface
    The error Interface
    Type Assertions
    Type Switches
    A Few Words of Advice
Goroutines and Channels
    Goroutines
    Example: Concurrent Clock Server
    Example: Concurrent Echo Server
    Channels
        Unbuffered Channels
        Pipelines
        Unidirectional Channel Types
        Buffered Channels
    Multiplexing with select
    Cancellation
Concurrency with Shared Variables
    Rece Conditions
    Mutual Exclusion: sync.Mutex
    Read/Write Mutexes: sync.RWMutex
    Memory Synchronization
    Lazy Initialization: sync.Once
    The Race Detector
    Example: Concurrent Non-Blocking Cache
    Goroutines and Threads
        Growable Stacks
        Goroutine Scheduling
        GOMAXPROCS
        Goroutines Have No Identity
Packages and the Go Tool
    Introduction
    Import Paths
    The Package Declaration
    Import Declarations
    Blank Imports
    Packages and Naming
    The Go Tool
        Workspace Organization
        Downloading Packages
        Documenting Packages
        Internal Packages
        Querying Packages
Testing
    The go test Tool
    Test Functions
        Randomized Testing
        White-Box Testing
        Writing Effective Tests
        Avoiding Brittle Tests
    Coverage
    Benchmark Functions
    Profiling
    Example Functions
Reflection
    Why Reflection?
    reflect.Type and reflect.Value
    Setting Variables with reflect.Value
    A Word of Caution
Low-Level Programming
    unsafe.Sizeof, Alignof, and Offsetof
    unsafe.Pointer
    Deep Equivalence
    Calling C Code with cgo
    Another Word of CautionTutorial

Tutorial

When you're learning a new language, there's a natural tendency to wirte code as you would have written it in a language you already know. Be aware of this bias as you learn Go and try to avoid it.

Hello, World

Package main is special. It defines a standalone executable program, not a library. Within package main the function main is also special -- it's where execution of the program begins. Whatever main does is what the program does.

Go does not require semicolons at the ends of statements or declarations, except where two or more appear on the line. In effect, newlines following certain tokens are converted into semicolons, so where newlines are placed matters to proper parsing of Go code.

Command-Line Arguments

A variable can be initialized as part of its declaration. If it is not explicitly initialized, it is implicitly initialized to the zero value for its type and the empty string "" for strings.

func main() {
    s, sep := "", ""
    for _, arg := range os.Args[1:] {
        s += sep + arg
        sep = " "
    }
    fmt.Println(s)
}

Each time around the loop, the string s gets completely new contents. The += statement makes a new string by concatenating the old string, a space character, and the next argument, then assigns the new string to s. The old contents of s are no longer in use, so they will be garbage-collected in due course.

If the amount of data involved is large, this could be costly. A simpler and more efficent solution would be to use the Join function from the strings package:

func main() {
    fmt.Println(strings.Join(os.Args[1:], " "))
}

Finding Duplicate Lines

The order of map iteration is not specified, but in practice it is random, varying from one run to another. This design is intentional, since it prevents programs from relying on any particular ordering where none is guaranteed.

A map is a reference to the data structure created by make. When a map is passed to a function, the function receives a copy of the reference, so any changes the called function makes to the underlying data structure will be visible through the caller's map reference too.

A Web Server

Go allows a simple statement such as a local variable declaration to precede the if condition, which is a particularly useful for error handling in this example.

if err := r.ParseForm(); err != nil {
    log.Print(err)
}

We could have written it as

err := r.ParseForm()
if err != nil {
    log.Print(err)
}

but combining the statements is shorter and reduces the scope of the variable err, which is good practice.

Loose Ends

The switch statement, which is a multi-way branch. Cases are evaluated from top to bottom, so the first matching one is executed. The optional default case matches if none of the other cases does; it may be placed anywhere. Cases do not fall through from one to the next as in C-like languages.

Program Structure

Names

If an entity is declared within a function, it is local to that function. If declared outside of a function, however, it is visible in all files of the package to which it belongs. The case of the first letter of a name determines its visibility across package boundaries. If the name begins with an upper-case letter, it is exported, which means that it is visible and accessible outside of its own package and may be referred to by other parts of the program, as with Printf in the fmt package. Package names themselves are always in lower case.

Declarations

A declaration names a program entity and specifies some or all of its properties. There is four major kinds of declarations: var, const, type, and func.

Variables

A var declaration has the general form

var name type = expression

Either the type or the = expression part may be omitted, but not both. If the type is omitted, it is determined by the initializer expression. If the expression is omitted, the initial value is the zero value for the type, which is 0 for numbers, false for booleans, "" for strings, and nil for interfaces and reference types (slice, pointer, map, channel, function). The zero value of an aggregate type like an array or a struct has the zero value of all of its elements or fields. The zero-value mechanism ensures that a variable always holds a well-defined value of its type; in Go there is no such thing as an uninitializd variable.

A var declaration tends to be reserved for local variables that need an explicit type that differs from that of the initializer expression, or for when the variable will be assigned a value later and its initial value is unimportant.

Keep in mind that := is a declaration, whereas = is assignment. A multi-variable declaration「i, j := 0, 1」 should not be confused with a tuple assignment, in which each variable on the left-hand side is assigned the corresponding value from the right-hand side:

i, j = j, i // swap values of i and j

It is perfectly safe for a function to return the address of a local variable. For instance, in the code below, the local variable v created by this particular call to f will remain in existence even after the call has returned, and the pointer p will still refer to it:

var p = f()

func f() *int {
    v := 1
    return &v
}

Each call of f returns a distinct value: 「fmt.Println(f() == f()) // "false"」

Since new is a predeclared fuction, not a keyword, it's possible to redefine the name for something else within a function, for example: 「func delta(old, new int) int { return new - old }」

How does the garbage collector know that a variable's storage can be reclaimed? The full story is much more detailed than we need here, but the basic idea is that every package-level variable, and every local variable of each currently active function, can potentially be the start or root of a path to the variable in question, following pointers and other kinds of references that ultimately lead to the variable. If no such path exists, the variable has become unreachable, so it can no longer affect the rest of the computation.

Because the lifetime of a variable is determined only by whether or not it is reachable, a local variable may outlive a single iteration of the enclosing loop. It may continue to exist even after its enclosing function has returned.

A compiler may choose to allocate local variables on the heap or on the stack but, perhaps surprisingly, this choice is not determined by whether var or new was used to declare the variable.

var global *int

func f() {                          func g() {
    var x int                           y := new(int)
    x = 1                               *y = 1
    global = &x                     }
}

Here, x must be heap-allocated because it is still reachable from the variable global after f has returned, despite being declared as a local variable; we say x escapes from f. Conversely, when g returns, the variable *y becomes unreachable and can be recycled. Since *y does not escape from g, it's safe for the compiler to allocate *y on the stack, even though it was allocated with new. In any case, the notion of escaping is not something that you need to worry about in order to wirte correct code, though it's good to keep in mind during performance optimization, since each variable that escapes requires an extra memory allocation.

Garbage collection is a tremendous help in writing correct programs, but it does not relieve you of the burden of thinking about memory. You don't need to explicitly allocate and free memory, but to write efficient programs you still need to be aware of the lifetime of variables. For example, keeping unnecessary pointers to short-lived objects within long-lived objects, especially global variables, will prevent the garbage collector from reclaiming the short-lived objects.

Assignments

Another form of assignment, known as tuple assignment, allows serveral variables to be assigned at once. All of the right-hand side expressions are evaluated before any of the variables are updated, making this form most useful when some of the variables appear on both sides of the assignment, as happens, for example, when swapping the values of two variables:

x, y = y, x
a[i], a[j] = a[j], a[i]

or when computing the gratest common divisor(GCD) of two integers:

func gcd(x, y int) int {
    for y != 0 {
        x, y = y, x%y
    }
    return x
}

or when computing the n-th Fibonacci number iteratively:

func fib(n int) int {
    x, y := 0, 1
    for i := 0; i < n; i++ {
        x, y = y, x+y
    }
    return x
}

Type Declarations

A type declaration defines a new named type that has the same underlying type as an existing type. The named type provides a way to seperate different and perhaps incompatible uses of the underlying type so that they can't be mixed unintentionally.

type name underlying-type

Type declarations most often appear at package level, where the named type is visible through-out the package, and if the name is exported(it starts with an upper-case letter), it's accessible from other package as well.

For every type T, there is a corresponding conversion operation T(x) that converts the value x to Type T.

Named types also make it possible to define new behavious for values of the type. These behaviors are expressed as a set of functions associated with the type, called the type's methods.

The declaration below, in which the Celsius parameter c appears before the function name, associates with the Celsius type a method named String that returns c's numeric value followed by ℃:

func (c Celsius) String() string { return fmt.Sprintf("%g℃", c) }

Many types declare a String method of this form because it controls how values of the type appear when printed as a string by the fmt package.

Packages and Files

Packages also let us hide information by controlling which names are visible outside the package, or exported. In Go, a simple rule governs which identifiers are exported and which are not: exported identifiers start with an upper-case letter.

Any file may contain any number of functions whose declaration is just

func init() { /* ... */ }

Such init function can't be called or referenced, but otherwise they are normal functions. Within each file, init functions are automatically executed when the program starts, in the order in which they are declared.

One package is initialized at a time, in the order of imports in the program, dependencies first, so a package p importing q can be sure that q is fully initialized before p's initialization begins. Initialization proceeds from the bottom up; the main package is the last to be initialized. In this manner, all packages are fully initialized before the application's main function begins.

Scope

Don't confuse scope with lifetime. The scope of a declaration is a region of the program text; it is a compile-time property. The lifetime of a variable is the range of time during execution when the variable can be referred to by other parts of the program; it is a run-time property.

When the compiler encounters a reference to a name, it looks for a declaration, starting with the innermost enclosing lexical block and working up to the universe block. If the compiler finds no declaration, it reports an "undeclared name" error. If a name is declared in both an outer block and an inner block, the inner declaration will be found first. In that case, the inner declaration is said to shadow or hide the outer one, making it inaccessible.

Basic Data Types

Go's types fall into four categories: basic types, aggregate types, reference types, and interface types.

Integers

The type rune is an synonym for int32 and conventionally indicates that a value is a Unicode code point. The tow names may used interchangeably. Similarly, the type byte is an synonym for uint8, and emphasizes that the value is a piece of raw data rather than a small numeric quantity.

There is an unsigned integer type uintptr, whose width is not specified but is sufficient to hold all the bits of a pointer value. The uintptr type is used only for low-level programming, such as at the boundary of a Go program with a C library or an operating system.

The &^ operator is bit clear (AND NOT): in the expression z = x &^ y, each bit of z is 0 if the corresponding bit of y is 1; otherwise it equals the corresponding bit of x.

Arithmetically, a left shift x<<n is equivalent to multiplication by 2^n and a right shift x>>n is equivalent to the floor of division by 2^n.

Left shifts fill the vacated bits with zeros, as do right shifts of unsigned number, but right shift of signed numbers fill the vacated bits with copies of the sign bit. For this reason, it is important to use unsigned arithmetic when you're treating an integer as a bit pattern.

Float to integer conversion discards any fractional part, truncating toward zero. You should avoid conversions in which the operand is out of range for the target type, because the behavior depends on the implementation.

Rune literals are written as a character within single quotes. The simplest example is an ASCII character like 'a', but it's possible to write any Unicode code point either directly or with numeric escapes. Rune are printed with %c, or with %q if quoting is desired:

ascii := 'a'
unicode := '国'
newline := '\n'
fmt.Printf("%d %[1]c %[1]q\n", ascii) // 97 a 'a'
fmt.Printf("%d %[1]c %[1]q\n", unicode) // 22269 国 '国'
fmt.Printf("%d %[1]q\n", newline) // 10 '\n

Floating-Point Number

A float32 provides approximately six decimal digits of precision, whereas a float64 provides about 15 digits; float64 should be preferred for most purpose because float32 computations accumulate error rapidly unless one is quit careful, and the smallest positive integer that cannot be exactly represented as a float32 is not large:

var f float32 = 16777216 // 1 << 24
fmt.Println(f == f+1) // "true"!

Booleans

Boolean values can be combined with the &&(AND) and ||(OR) operators, which have short-circuit behavior: if the answer is already determined by the value of the left operand, the right operand is not evaluated, making it safe to write expressions like this: if s != "" && s[0] == 'x', where s[0] would panic if applied to an empty string.

Since && have higher precedence that ||, no parentheses are required for conditions of this form:

if 'a' <= c && c <= 'z' ||
    'A' <= c && c <= 'Z' ||
    '0' <= c && c <= '9' {
    // ...ASCII letter or digit...
}

Strings

The substring operation s[i:j] yields a new string consisting of the bytes of the original string starting at index i and continuing up to, but not including, the byte at index j. The result contains j-i bytes.

Either or both of the i and j operands may be omitted, in which case the default values of 0(the start of the string) and len(s)(its end) are assumed, respectively.

The + operator makes a new string by concatenating two strings.

Strings may be compared with comparison operators like == and <; the comparison is done byte by byte, so the result is the natural lexicographic ordering.

A string is an immutable sequence of bytes: the byte sequence contained in a string value can never be changed, though of course we can assign a new value to a string variable. To append one string to another, for instance, we can write

s := "left foot"
t := s
s += ", right foot"

This does not modify the string that s originally held but causes s to hold the new string formed by += statement; meanwhile, t still contains the old string.

fmt.Println(s) // "left foot, right foot"
fmt.Println(t) // "left foot"

Since strings are immutable, constructions that try to modify a string's data in place are not allow: s[0] = 'L' // compile error: cannot assign to s[0]

Immutability means that it is safe for two copies of a string to share the same underlying memory, making it cheap to copy string of any length. Similarly, a string s and a substring like s[7:] may safely share the same data, so the substring operation is also cheap. No new memory is allocated in either case.

Because Go source files are always encoded in UTF-8 and Go text strings are conventionally interpreted as UTF-8, we can include Unicode code points in string literals.

A raw string literal is written `...`, using backquotes instead of double quotes. Within a raw string literal, no escape sequences are processed; the contents are taken literally, including backslashes and newlines, so a raw string literal may spread over several lines in the program source. The only processing is that carriage returns are deleted so that the value of the string is the same on all platforms, including those that conventionally put carriage returns in text files. Raw string literals are a convenient way to write regular expressions, which tend to have lots of backslashes. They are also useful for HTML templates, JSON literals, command usage messages, and the like, which often extend over multiple lines.

const GoUsage = `Go is a tool for managing Go source code.

Usage:
    go command [arguments]
...`

Thanks to the nice properties of UTF-8, many string operations don't require decoding. We can test whether one string contains another as a prefix, or as a suffix, or as a substring:

func HasPrefix(s, prefix string) bool {
    return len(s) >= len(prefix) && s[:len(prefix)] == prefix
}

func HasSuffix(s, suffix string) bool {
    return len(s) >= len(suffix) && s[len(s)-len(suffix):] == suffix
}

func Contains(s, substr string) bool {
    for i := 0; i < len(s); i++ {
        if HasPrefix(s[i:], substr) {
            return true
        }
    }
    return false
}

using the same logic for UTF-8-encoded text as for raw bytes. This is not true for other encodings.

Strings and Byte Slices

Four standard packages are particularly important for manipulating strings: bytes, strings, strconv, and unicode. The strings package provides many functions for searching, replacing, comparing, trimming, splitting, and joining strings.

The bytes package has similar functions for manipulating slices of bytes, of type []byte, which share some properties with strings. Because strings are immutable, building up strings incrementally can involve a lot of allocation and copying. In such cases, it's more efficient to use the bytes.Buffer type.

The strconv package provides functions for converting boolean, integer, and floating-point values to and from their string representations, and functions for quoting and unquoting strings.

The unicode package provides functions like IsDigit, IsLetter, IsUpper, and IsLower for classifying runes. Each function takes a single rune argument and returns a boolean. Conversion functions like ToUpper and ToLower convert a rune into the given case if it is a letter. All these functions use the Unicode standard categories for letters, digits, and so on. The strings package has similar functions, also called ToUpper and ToLower, that return a new string with the specified transformation applied to each character of the original string.

The basename function below was inspired by the Unix shell utility of the same name. In our version, basename(s) removes any prefix of s that looks like a file system path with components separated by slashes, and it removes any suffix that looks like a file type:

fmt.Println(basename("a/b/c.go")) // "c"
fmt.Println(basename("c.d.go")) // "c.d"
fmt.Println(basename("abc")) // "abc"

The first version of basename does all the work without the help of libraries:

// basename removes directory components and a .suffix.
// e.g., a => a, a.go => a, a/b/c.go => c, a/b.c.go => b.c
func basename(s string) string {
    // Discard last '/' and everything before.
    for i := len(s) - 1; i >= 0; i-- {
        if s[i] == '/' {
            s = s[i+1:]
            break
        }
    }
    // Preserve everything before last '.'.
    for i := len(s) - 1; i >= 0; i-- {
        if s[i] == '.' {
            s = s[:i]
            break
        }
    }
    return s
}

A simpler vision uses the strings.LastIndex library function:

func basename(s string) string {
    slash := strings.LastIndex(s, "/") // -1 if "/" not found
    s = s[slash+1:]
    if dot := strings.LastIndex(s, "."); dot >= 0 {
        s = s[:dot]
    }
    return s
}

Let's continue with another substring example. The task is to take a string representation of an integer, such as "12345", and insert commas every three places, as in "12,345". This version only works for integers:

// comma inserts commas in a non-negative decimal integer string.
func comma(s string) string {
    n := len(s)
    if n <= 3 {
        return s
    }
    return comma(s[:n-3]) + "," + s[n-3:]
}

A string contains an array of bytes that, once created, is immutable. By contrast, the elements of a byte slice can be freely modified.

Strings can be converted to byte slices and back again:

s := "abc"
b := []byte(s)
s2 := string(b)

Conceptually, the []byte(s) conversion allocates a new byte array holding a copy of the bytes of s, and yields a slice that references the entirety of that array. An optimizing compiler may be able to avoid the allocation and copying in some cases, but in general copying is required to ensure that the bytes of s remain unchanged even if those of b are subsequently modified. The conversion from byte slice back to string with string(b) also makes a copy, to ensure immutability of the resulting string s2.

The bytes package provides the Buffer type for efficient manipulation of byte slices. A Buffer starts out empty but grows as data of types like string, byte, and []byte are written to it. As the example below show, a bytes.Buffer variable requires no initialization because its zero value is usable:

// intsToString is like fmt.Sprintf(values) but adds commas.
func intsToString(values []int) string {
    var buf bytes.Buffer
    buf.WriteByte('[')
    for i, v := range values {
        if i > 0 {
            buf.WriteString(", ")
        }
        fmt.Fprintf(&buf, "%d", v)
    }
    buf.WriteByte(']')
    return buf.String()
}

func main() {
    fmt.Println(intsToString([]int{1, 2, 3})) // "[1, 2, 3]"
}

When appending the UTF-8 encoding of an arbitrary rune to a bytes.Buffer, it's best to user bytes.Buffer's WriteRune method, but WriteByte is fine for ASCII characters such as '[' and ']'.

The bytes.Buffer type is extremely versatile, and when we discuss interfaces, we'll see how it may be used as a replacement for a file whenever an I/O function requires a sink for bytes (io.Writer) as Fprintf does above, or a source of bytes(io.Reader).

Composite Types

In this chapter, we'll take about four composite types — arrays, slices, maps, and structs.

Arrays and structs are aggregate types; their values are concatenations of other values in memory. Arrays are homogeneous — their elements all have the same type — whereas structs are heterogeneous. Both arrays and structs are fixed size. In contrast, slices and maps are dynamic data structures that grow as values are added.

Arrays

In an array literal, if an ellipsis "..." appears in place of the length, the array length is determined by the number of initializers. The definition of q can be simplified to

q := [...]int{1, 2, 3}
fmt.Printf("%T\n", q) // [3]int

The size of an array is part of its type, so [3]int and [4]int are different types. The size must be a constant expression, that is, an expression whose value can be computed as the program is being compiled.

If an array's element type is comparable then the array type is comparable too, so we may directly compare two arrays of that type using the == operator, which reports whether all corresponding elements are equal. The != operator is it's negation.

a := [2]int{1, 2}
b := [...]int{1, 2}
c := [2]int{1, 3}
fmt.Println(a == b, a == c, b == c) // true false false
d := [3]int{1, 2}
fmt.Println(a == d) // compile error: mismatched types [2]int and [3]int

Slices

A slice has three componets: a pointer, a length, and a capacity. The pointer points to the first elements of the array that is reachable through the slice. The length is the number of slice elements, it can't exceed the capacity, which is usually the number of elements between the start of the slice and the end of the underlying array. The built-in functions len and cap return those values.

Multiple slices can share the same underlying array and may refer to overlapping parts of that array.

Slicing beyond cap(s) causes a panic, but slicing beyond len(s) extends the slice, so the result may be longer than the original.

Since a slice contains a pointer to an element of an array, passing a slice to a function permits the function to modify the underlying array elements.

Unlike arrays, slice are not comparable, so we cannot use == to test whether two slices contain the same elements. The standard library provides the highly optimized bytes.Equal function for comparing two slices of bytes ([[byte), but for other type of slice, we must do the comparison ourselves:

func equal(x, y []string) bool {
    if len(x) != len(y) {
        return false
    }
    for i := range x {
        if x[i] != y[i] {
            return false
        }
    }
    return true
}

The zero value of a slice type is nil. A nil slice has no underlying array. The nil slice has length and capacity zero, but there are also non-nil slices of length and capacity zero, such as []int{} or make([]int, 3)[3:]. As with any type that can have nil values, the nil value of a particular slice type can be written using a conversion expression such as []int(nil).

var s []int // len(s) == 0, s == nil
s = nil // len(s) == 0, s == nil
s = []int(nil) // len(s) == 0, s == nil
s = []int{} // len(s) == 0, s != nil

So if you need to test whether a slice is empty, use len(s) == 0, not s == nil.

The built-in function make creates a slice of a specified element type, length, and capacity. The capacity argument may be omitted, in which case the capacity equals the length.

make([]T, len)
make([]T, len, cap) // same as make([]T, cap)[:len]

Under the hood, make creates an unnamed array variable and return a slice of it; the array is accessible only through the returned slice. In the first form, the slice is a view of the entire array. In the second, the slice is a view of only the array's first len elements, but its capacity includes the entire array. The additional elements are set aside for future growth.

The built-in append function appends items to slices:

var runes []rune
for _, r := range "Hello, 世界" {
    runes = append(runes, r)
}
fmt.Printf("%q\n", runes) // ['H' 'e' 'l' 'l' 'o' ',' ' ' '世' '界']

The append function is crucial to understanding how slices work, so let's take a look at what is going on. Here's a version called appendInt that is specialized for []int slices:

func appendInt(x []int, y int) []int {
    var z []int
    zlen := len(x) + 1
    if zlen <= cap(x) {
        // There is room to grow. Extend the slice.
        z = x[:zlen]
    } else {
        // There is insufficient space. Allocate a new array.
        // Grow by doubling, for amortized linear complexity.
        zcap := zlen
        if zcap < 2*len(x) {
            zcap = 2 * len(x)
        }
        z = make([]int, zlen, zcap)
        copy(z, x) // a built-in function; see text
    }
    z[len(x)] = y
    return z
}

For efficiency, the new array is usually somewhat larger than the minimum needed to hold x and y. Expanding the array by doubling its size at each expansion avoids an excessive number of allocations and ensures that appending a single element takes constant time on average. This program demonstrates the effect:

func main() {
    var x, y []int
    for i := 0; i < 10; i++ {
        y = appendInt(x, i)
        fmt.Printf("%d\tcap=%d\t%v\n", i, cap(y), y)
        x = y
    }
}

Each change in capacity indicates an allocation and a copy:

0       cap=1   [0]
1       cap=2   [0 1]
2       cap=4   [0 1 2]
3       cap=4   [0 1 2 3]
4       cap=8   [0 1 2 3 4]
5       cap=8   [0 1 2 3 4 5]
6       cap=8   [0 1 2 3 4 5 6]
7       cap=8   [0 1 2 3 4 5 6 7]
8       cap=16  [0 1 2 3 4 5 6 7 8]
9       cap=16  [0 1 2 3 4 5 6 7 8 9]

The built-in append function may use a more sophisticated growth strategy than appendInt's simplistic one. Usually we don't know whether a given call to append will cause a reallocation, so we can't assume that the original slice refers to the same array as the resulting slice, nor that it refers to a different one. Similarly, we must not assume that operations on elements of the old slice will (or will not) be reflected in the new slice. As a result, it's usual to assign the result of a call to append to the same slice variable whose value we passed to append:runes = append(runes, r)

Updating the slice variable is required not just when calling append, but for any function that may change the length or capacity of a slice or make it refer to a different underlying array. To use slices correctly, it's important to bear in mind that although the elements of the underlying array are indirect, the slices's pointer, length, and capacity are not. To update them requires an assignment like the one above. In this respect, slices are not "pure" reference types but resemble an aggregate type such as this struct:

type IntSlice struct {
    ptr *int
    len, cap int
}

Our appendInt function adds a single element to a slice, but the built-in append lets us add more that one new element, or even a whole silce of them.

var x []int
x = append(x, 1)
x = append(x, 2, 3)
x = append(x, 4, 5, 6)
x = append(x, x...) // append the slice x
fmt.Println(x) // [1 2 3 4 5 6 1 2 3 4 5 6]

With the small modification shown below, we can match the behavior of the built-in append. The ellipsis "..." in the declaration of appendInt makes the function variadic: it accepts any number of final arguments. The corresponding ellipsis in the call above to append shows how to supply a list of arguments from a slice. We'll explain the mechanism in detail in next chapter.

func appendInt(x []int, y ...int) []int {
    var z []int
    zlen := len(x) + len(y)
    // ...expand z to at lease zlen...
    copy(z[len(x):], y)
    return z
}

The logic to expand z's underlying array remains unchanged and is not shown.

Let's see more example of functions that. Given a list of strings, the nonempty function returns the non-empty ones:

// Nonempty is an example of an in-place slice algorithm.
package main

import (
    "fmt"
)

// nonempty returns a slice holding only the non-empty strings.
// The underlying array is modified during the call.
func nonempty(strings []string) []string {
    i := 0
    for _, s := range strings {
        if s != "" {
            strings[i] = s
            i++
        }
    }
    return strings[:i]
}

The subtle part is that the input slice and the output slice share the same underlying array. This avoids the need to allocate another array, though of course the contents of data are partly overwritten, as evidenced by the second print statement:

data := []string{"one", "", "three"}
fmt.Printf("%q\n", nonempty(data)) // ["one" "three"]
fmt.Printf("%q\n", data) // ["one" "three" "three"]

Thus we would usually write: data = nonempty(data)

The nonempty function can also be written using append:

func nonempty2(strings []string) []string {
    out := strings[:0] // zero-length slice of original
    for _, s := range strings {
        if s != "" {
            out = append(out, s)
        }
    }
    return out
}

A slice can be used to implement a stack. Given an initially empty slice stack, we can push a new value onto the end of the slice with append:

stack = append(stack, v) // push v

The top of the stack is the last element:

top := stack[len(stack)-1] // top of stack

and shrinking the stack by popping that element is

stack = stack[:len(stack)-1] // pop

To remove an element from the middle of a slice, preserving the order of the remaining elements, use copy to slide the higher-numbered elements down by one to fill the gap:

func remove(slice []int, i int) []int {
    copy(slice[i:], slice[i+1:])
    return slice[:len(slice)-1]
}

func main() {
    s := []int{5, 6, 7, 8, 9}
    fmt.Println(remove(s, 2)) // [5 6 8 9]
}

And if we don't need to preserve the order, we can just move the last element into the gap:

func remove2(slice []int, i int) []int {
    slice[i] = slice[len(slice)-1]
    return slice[:len(slice)-1]
}
func main() {
    s := []int{5, 6, 7, 8, 9}
    fmt.Println(remove2(s, 2)) // [5 6 9 8]
}

Maps

The built-in function make can be used to create a map:

ages := make(map[string]int) // mapping from strings to ints

We can also use a map literal to create a new map populated with some initial key/value pairs:

ages := map[string]int{
    "alice": 31,
    "charlie": 34,
}

This is equivalent to

ages := make(map[string]int) // mapping from strings to ints
ages["alice"] = 31
ages["charlie"] = 34

so an alternative expression for a new empty map is map[string]int{}.

Map elements are accessed through the usual subscript notation, and removed with the built-in function delete:

delete(ages, "alice") // remove element ages["alice"]

All of these operations are safe even if the element isn't in the map; a map lookup using a key that isn't present returns the zero value for its type, so, for instance, the following works even when "bob" is not yet a key in the map because the value of ages["bob"] will be 0.

ages["bob"] = ages["bob"] + 1

The shorthand assignment forms x += y and x++ also work for map elements, so we can rewrite the statement above as ages["bob"] += 1, even more concisely as ages["bob"]++

Accessing a map element by subscripting always yield a value. If the key is present in the map, you get the corresponding value; if not, you get the zero value for element type, as we saw with ages["bob"]. For many purposes that's fine, but sometimes you need to know whether the element was really there or not. For example, if the element type is numeric, you might have to distinguish between a nonexistent element and an element that happens to have the value zero, using a test like this:

age, ok := ages["bob"]
if !ok { /* "bob" is not a key in this map; age == 0. */ }

You'll often see these two statements combined, like this:

if age, ok := ages["bob"]; !ok { /* ... */ }

Subscripting a map in this context yields two values; the second is a boolean that reports whether the element was present. The boolean variable is often called ok, especially if it is immediately used in an if condition.

The order of map iteration is unspecified, and different implementations might use a different hash function, leading to a different ordering. In practice, the order is random. This is intentional; making the sequence vary helps force programs to be robust across implementations.

The statement below creates a slice that is initially empty but has sufficient capacity to hold all the keys of the ages map:

names := make([]string, 0, len(ages))

As with slices, maps cannot be compared to each other; the only legal comparison is with nil. To test whether two maps contain the same keys and the same associated values, we must write a loop:

func equal(x, y map[string]int) bool {
    if len(x) != len(y) {
        return false
    }
    for k, xv := range x {
        if yv, ok := y[k]; !ok || yv != xv {
            return false
        }
    }
    return true
}

Observe how we use !ok to distinguish the "missing" and "present but zero" cases.

Go does not provide a set type, but since the keys of a map are distinct, a map can serve this purpose. To illustrate, the below program reads a sequence of lines and prints only the first occurrence of each distinct line. The program uses a map whose keys represent the set of lines that have already appeared to ensure that subsequent occurrences are not printed.

func main() {
    seen := make(map[string]bool) // a set of strings

    input := bufio.NewScanner(os.Stdin)
    for input.Scan() {
        line := input.Text()
        if !seen[line] {
            seen[line] = true
            fmt.Println(line)
        }
    }

    if err := input.Err(); err != nil {
        fmt.Fprintf(os.Stderr, "dedup: %v\n", err)
        os.Exit(1)
    }
}

Here's another example of maps in action, a program that counts the occurrences of each distinct Unicode code point in its input. Since there are a large number of possible characters, only a small fraction of which would appear in any particular document, a map in a natural way to keep track of just the ones that have been seen and their corresponding counts.

// Charcount computes counts of Unicode characters.
package main

import (
    "bufio"
    "fmt"
    "io"
    "os"
    "unicode"
    "unicode/utf8"
)

func main() {
    counts := make(map[rune]int) // counts of Unicode characters
    var utflen [utf8.UTFMax + 1]int // count of lengths of UTF-8 encodings
    invalid := 0 // count of invalid UTF-8 characters

    in := bufio.NewReader(os.Stdin)
    for {
        r, n, err := in.ReadRune() // returns rune, nbytes, error
        if err == io.EOF {
            break
        }
        if err != nil {
            fmt.Fprintf(os.Stderr, "charcount: %v\n", err)
            os.Exit(1)
        }
        if r == unicode.ReplacementChar && n == 1 {
            invalid++
            continue
        }
        counts[r]++
        utflen[n]++
    }

    fmt.Printf("rune\tcount\n")
    for c, n := range counts {
        fmt.Printf("%q\t%d\n", c, n)
    }
    fmt.Print("\nlen\tcount\n")
    for i, n := range utflen {
        if i > 0 {
            fmt.Printf("%d\t%d\n", i, n)
        }
    }
    if invalid > 0 {
        fmt.Printf("\n%d invalid UTF-8 characters\n", invalid)
    }
}

Structs

The name of a struct field is exported if it begins with a capital letter; this is Go's main access control mechanism. A struct type may contain a mixture of exported and unexported fields.

Let's see how to uses a binary tree to implement an insertion sort:

package main

import (
    "fmt"
)

type tree struct {
    value int
    left, right *tree
}

// Sort sorts values in place.
func Sort(values []int) {
    var root *tree
    for _, v := range values {
        root = add(root, v)
    }
    appendValues(values[:0], root)
}

// appendValues appends the elements of t to values in order
// and returns the resulting slice.
func appendValues(values []int, t *tree) []int {
    if t != nil {
        values = appendValues(values, t.left)
        values = append(values, t.value)
        values = appendValues(values, t.right)
    }
    return values
}

func add(t *tree, value int) *tree {
    if t == nil {
        // Equivalent to return &tree{value: value}.
        t = new(tree)
        t.value = value
        return t
    }
    if value < t.value {
        t.left = add(t.left, value)
    } else {
        t.right = add(t.right, value)
    }
    return t
}

func main() {
    arr := []int{1, 2, 0, -3, 5, 9, 8, 8, 3, 3, 1}
    Sort(arr)
    fmt.Println(arr) // [-3 0 1 1 2 3 3 5 8 8 9]
}

A value of a struct type can be written using a struct literal that specifies values for its fields.

type Point struct { X, Y int }
p := Point{1, 2}

There are two forms of struct literal. The first form, show above, requires that a value be specified for every field, in the right order. It burdens the writer(and reader) with remembering exactly what the field are, and it makes the code fragile should the set of fields later grow or be reordered.

More often, the second form is used, in which a struct value is initialized by listing some or all of the field names and their corresponding values:

p := Point{ Y: 2, X: 1}

If a field is omitted in this kind of literal, it is set to the zero value for its type. Because names are provided, the order of fields doesn't matter.

The two forms cannot be mixed in the same literal.

Struct values can be passed as arguments to functions and returned from them. For instance, this function scales a Point by a specified factor:

func Scale(p Point, factor int) Point {
    return Point{p.X * factor, p.Y * factor}
}
fmt.Println(Scale(Point{1, 2}, 5)) // {5, 10}

For efficiency, larger struct types are usually passed to or returned from functions indirectly using a pointer,

func Bonus(e *Employee, percent int) int {
    return e.Salary * percent / 100
}

and this is required if the function must modify its argument, since in a call-by-value language like Go, the called function receives only a copy of an argument, not a reference to the original argument.

Because structs are so commonly dealt with through pointers, it's possible to use this shorthand notation to create and initialize a struct variable and obtain its address:

pp := &Point{1, 2}

It is exactly equivalent to

pp := new(point)
*pp = Point{1, 2}

but &Point{1, 2} can be used directly within an expression, such as a function call.

If all the fields of a struct are comparable, the struct itself is comparable, so two expressions of that type may be compared using == or !=. Comparable struct types, like other comparable types, may be used as the key type of a map.

type address struct {
    hostname string
    port int
}
hits := make(map[address]int)
hits[address{"golang.org", 443}]++

JSON

A JSON object is a mapping from strings to values, written as a sequence of name:value pairs separated by commas and surrounded by braces.

Consider an application that gathers movie reviews and offers recommendations. Its Movie data type and a typical list of values are declared below.

package main

import (
    "encoding/json"
    "fmt"
    "log"
)

type Movie struct {
    Title string
    Year int `json:"released"`
    Color bool `json:"color,omitempty"`
    Actors []string
}

var movies = []Movie{
    {Title: "Casablanca", Year: 1942, Color: false,
        Actors: []string{"Humphrey Bogart", "Ingrid Bergman"}},
    {Title: "Cool Hand Luke", Year: 1967, Color: true,
        Actors: []string{"Paul Newman"}},
    {Title: "Bullitt", Year: 1968, Color: true,
        Actors: []string{"Steve McQueen", "Jacqueline Bisset"}},
}

func main() {
    // data, err := json.Marshal(movies)
    // if err != nil {
    //  log.Fatalf("JSON marshaling failed: %s", err)
    // }
    // fmt.Printf("%s\n", data) // [{"Title":"Casablanca","released":1942,"Actors":["Humphrey Bogart","Ingrid Bergman"]},{"Title":"Cool Hand Luke","released":1967,"color":true,"Actors":["Paul Newman"]},{"Title":"Bullitt","released":1968,"color":true,"Actors":["Steve McQueen","Jacqueline Bisset"]}]

    data, err := json.MarshalIndent(movies, "", "\t")
    if err != nil {
        log.Fatalf("JSON marshaling failed: %s", err)
    }
    fmt.Printf("%s\n", data)
    /*
    [
            {
                    "Title": "Casablanca",
                    "released": 1942,
                    "Actors": [
                            "Humphrey Bogart",
                            "Ingrid Bergman"
                    ]
            },
            {
                    "Title": "Cool Hand Luke",
                    "released": 1967,
                    "color": true,
                    "Actors": [
                            "Paul Newman"
                    ]
            },
            {
                    "Title": "Bullitt",
                    "released": 1968,
                    "color": true,
                    "Actors": [
                            "Steve McQueen",
                            "Jacqueline Bisset"
                    ]
            }
    ]
    */
}

Data structures like this are an excellent fit for JSON, and it's easy to convert in both directions. Converting a Go data structure like movies to JSON is called marshaling. Marshal produces a byte slice containing a very long string with no extraneous white space. This compact representation contains all the information but it's hard to read. For human consumption, a variant called json.MarshalIndent produces neatly indented output. Two additional arguments define a prefix for each line of output and a string for each level of indentation. Marshalling uses the Go struct field names as the field names for the JSON objects(through reflection). Only exported fields are marshaled, which is why we chose capitalized names for all the Go field names.

You may have noticed that the name of the Year field changed to released in the output, and Color changed to color. That's because of the field tags. A field tag is a string of metadata associated at compile time with the field of a struct. A field tag may be any literal string, but it is conventionally interpreted as a space-separated list of key:"value" pairs; since they contain double quotation marks, field tags are usually written with raw string literals. The json key controls the behavior the encoding/json package, and other encoding/... packages follow this convention. The first part of the json field tag specifies an alternative JSON name for the Go field. Field tags are often used to specify an idiomatic JSON name like total_count for a Go field named TotalCount. The tag for Color has an additonal option, omitempty, which indicates that no JSON output should be produced if the field has the zero value for its type(false, here) or is otherwise empty. Sure enough, the JSON output for Casablanca, a black-and-white movie, has no color field.

The inverse operation to marshaling, decoding JSON and populating a Go data structure, is called unmarshaling, and it is done by json.Unmarshal. The code below unmarshals the JSON movie data into a slice of structs whose only field is Title. By defining suitable Go data structures in this way, we can select which parts of the JSON input to decode and which to discard. When Unmarshal returns, it has filled in the slice with the Title information; other names in the JSON are ignored.

var titles []struct{ Title string }
if err := json.Unmarshal(data, &titles); err != nil {
    log.Fatalf("JSON unmarshaling failed: %s", err)
}
fmt.Println(titles) // [{Casablanca} {Cool Hand Luke} {Bullitt}]

Many web services provide a JSON interface—make a request with HTTP and back comes the desired information in JSON format. To illustrate, let's query the GitHub issue tracker using its web-service interface, let's see two file: github.go and issues.go

// Package github provides a Go API for the GitHub issue tracker.
// See https://developer.github.com/v3/search/#search-issues.
package github

import (
    "encoding/json"
    "fmt"
    "net/http"
    "net/url"
    "strings"
    "time"
)

const IssuesURL = "https://api.github.com/search/issues"

type IssuesSearchResult struct {
    TotalCount int `json:"total_count"`
    Items []*Issue
}

type Issue struct {
    Number int
    HTMLURL string `json:"html_url"`
    Title string
    State string
    User *User
    CreatedAt time.Time `json:"created_at"`
    Body string // in Markdown format
}

type User struct {
    Login string
    HTMLURL string `json:"html_url"`
}

// SearchIssues queries the GitHub issue tracker.
func SearchIssues(terms []string) (*IssuesSearchResult, error) {
    q := url.QueryEscape(strings.Join(terms, " "))
    resp, err := http.Get(IssuesURL + "?q=" + q)
    if err != nil {
        return nil, err
    }
    // we must close resp.Body on all execution paths.
    // (Chapter 5 presents 'defer', which makes this simpler.)
    if resp.StatusCode != http.StatusOK {
        resp.Body.Close()
        return nil, fmt.Errorf("Search query failed: %s", resp.Status)
    }

    var result IssuesSearchResult
    if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
        resp.Body.Close()
        return nil, err
    }
    resp.Body.Close()
    return &result, nil
}

// Issues prints a table of GitHub issues matching the search terms.
package main

import (
    "fmt"
    "log"
    "os"

    "./github"
)

func main() {
    result, err := github.SearchIssues(os.Args[1:])
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("%d issues:\n", result.TotalCount)
    for _, item := range result.Items {
        fmt.Printf("#%-5d %9.9s %.55s\n",
            item.Number, item.User.Login, item.Title)
    }
}
/*
Star-Wars:ch4 Jiang_Hu$ go run issues.go is:open json decoder
5407 issues:
#851   supercool JSON Parser and Decoder
#21    abraithwa Support `json.RawMessage` and `json.Decoder.UseNumber`
#461    mbrucher json.decoder.JSONDecodeError: Invalid control character
#5        miikka Decoder options
#170     durack1 JSON decoder timestamp error
#5     MazeChaZe Custom decoder
#36    billchenx json.decoder.JSONDecodeError
#10    jimfulton Make JSON encoder/decoder configurable.
#31        kozo2 JSONDecodeError with Python36 json decoder
#22        amitu automatic Json.Decode.Decoder out of schema?
#15    MazeChaZe Reimplement object decoder using TypeScript mapped type
#3569  stolowski snapd, snapctl: use json Decoder instead of Unmarshall
#25000  tvolkert JSON decoder should support lenient parsing
#23     keleshev JSON decoder example
#3       szarsti consider jsone as json decoder
#287   NausJessy Adds a Double decoder.
#16    vipulasri json.decoder.JSONDecodeError:
#111     marrony Adds java encoder/decoder for user types
#93       twhume JSON decoder error on startup, fixed by restarting serv
#23    Guillaum- Add BMT/FMT decoder elements (fitter, banks, ccdb confi
#20567 schmichae encoding/json: reduce allocations by Decoder for \uXXXX
#2          si14 JSON encoder/decoder behaviour
#4        vbraun Crash in JSON decoder
#4     beatgammi Streaming decoder
#1     HelloW0r1 json.decoder.JSONDecodeError: Expecting value: line 1 c
#2416  ArunNairI jupyter notebook list - json.decoder.JSONDecodeError:
#77    binary132 RTMEvent Decoder?
#28     teodorlu `original` decoder uses `Json.Decode.Pipeline` without
#2     sinisters Make JSON Decoder more lenient
#1532  perfectwe decoder error
Star-Wars:ch4 Jiang_Hu$
*/

Functions

Function Declarations

A function declaration has a name, a list of parameters, an optional list of results, and a body:

func name(parameter-list) (result-list) {
    body
}

A function that has a result list must end with a return statement unless execution clearly cannot reach the end of the function, perhaps because the function ends with a call to panic or an infinite for loop with no break.

The type of a function is sometimes called its signature. Two functions have the same type or signature if they have the same sequence of parameter types and the same sequence of result types. The names of parameters and results don't affect the type, nor does whether or not they ware declared using the factored form.

Every function call must provide an argument for each parameter, in the order in which the parameters were declared. Go has no concept of default parameter values, nor any way to specify arguments by name, so the names of parameters and results don't matter to the caller except as documentation.

Arguments are passed by value, so the function receives a copy of each argument; modifications to the copy do not affect the caller. However, if the argument contains some kind of reference, like a pointer, slice, map, function, or channel, then the caller may be affected by any modifications the function makes to variables indirectly referred to by the argument.

You may occasionally encounter a function declaration withou a body, indicating that the function is implemented in a language other that Go. Such a declaration defines the function signature.

package math
func Sin(x float64) float64 // implemented in assembly language

Recursion

Functions may be recursive, that is, they may call themselves, either directly or indirectly. Recursion is a powerful technique for many problems, and of course it's essential for processing recursive data structures. In this section, we'll use it for processing HTML documents. Let's see two program, fetch.go and findlinks1.go:

// Fetch prints the content found at a URL.
package main

import (
    "fmt"
    "io/ioutil"
    "net/http"
    "os"
)

func main() {
    for _, url := range os.Args[1:] {
        resp, err := http.Get(url)
        if err != nil {
            fmt.Fprintf(os.Stderr, "fetch: %v\n", err)
            os.Exit(1)
        }
        b, err := ioutil.ReadAll(resp.Body)
        resp.Body.Close()
        if err != nil {
            fmt.Fprintf(os.Stderr, "fetch: reading %s: %v\n", url, err)
            os.Exit(1)
        }
        fmt.Printf("%s", b)
    }
}

// Findlinks1 prints the links in an HTML document read from standard input.
package main

import (
    "fmt"
    "os"

    "golang.org/x/net/html"
)

func main() {
    doc, err := html.Parse(os.Stdin)
    if err != nil {
        fmt.Fprintf(os.Stderr, "findlinks1: %v\n", err)
        os.Exit(1)
    }
    for _, link := range visit(nil, doc) {
        fmt.Println(link)
    }
}

// visit appends to links each link found in n and returns the result.
func visit(links []string, n *html.Node) []string {
    if n.Type == html.ElementNode && n.Data == "a" {
        for _, a := range n.Attr {
            if a.Key == "href" {
                links = append(links, a.Val)
            }
        }
    }
    for c := n.FirstChild; c != nil; c = c.NextSibling {
        links = visit(links, c)
    }
    return links
}

To descend the tree for a node n, visit recursively calls itself for each of n's children, which are held in the FirstChild linked list.

Let's run findlinks1 on the Go home page, piping the output of fetch to the input of findlinks1:

Star-Wars:gopl Jiang_Hu$ ./fetch https://golang.org | ./findlinks1
/
/
#
/doc/
/pkg/
/project/
/help/
/blog/
http://play.golang.org/
#
//tour.golang.org/
https://golang.org/dl/
//blog.golang.org/
https://developers.google.com/site-policies#restrictions
/LICENSE
/doc/tos.html
http://www.google.com/intl/en/policies/privacy/

The next program uses recursion over the HTML node tree to print the structure of the tree in outline. As it encounters each element, it pushes the element's tag onto a string array which named stack, then prints the stack. The program outline.go:

package main

import (
    "fmt"
    "os"

    "golang.org/x/net/html"
)

func main() {
    doc, err := html.Parse(os.Stdin)
    if err != nil {
        fmt.Fprintf(os.Stderr, "outline: %v\n", err)
        os.Exit(1)
    }
    outline(nil, doc)
}

func outline(stack []string, n *html.Node) {
    if n.Type == html.ElementNode {
        stack = append(stack, n.Data) // push tag
        fmt.Println(stack)
    }
    for c := n.FirstChild; c != nil; c = c.NextSibling {
        outline(stack, c)
    }
}

Note one subtlety: although outline "pushes" an element on stack, there is no corresponding pop. When outline calls itself recursively, the callee receives a copy of stack. Although the callee may append elements to this slice, modifying its underlying array and perhaps even allocation a new array, it doesn't modify the initial elements that are visible to the caller, so when the function returns, the caller's stack is as it was before the call.

Here's the output of https://golang.org again edited for brevity:

Star-Wars:gopl Jiang_Hu$ ./fetch https://golang.org | ./outline
[html]
[html head]
[html head meta]
[html head meta]
[html head meta]
[html head title]
[html head link]
[html head link]
[html head link]
[html head script]
[html head script]
[html body]
[html body div]
[html body div]
[html body div div]
[html body div div div]
[html body div div div a]
...

As you can see by experimenting with outline, most HTML documents can be processed with only a few levels of recursion, but it's not hard to construct pathological web pages that require extremely deep recursion.

Many programming language implementations use a fixed-zise function call stack; sizes from 64KB to 2MB are typical. Fixed-size stacks impose a limit on the depth of recursion, so one must be careful to avoid a stack overflow when traversing large data structures recursively; fixed-size stacks may enve pose a security risk. In contrast, typical Go implementations use variable-size stacks that start small and grow as needed up a limit on the order of a gigabyte. This lets us use recursion safely and without worrying about overflow.

Multiple Return Values

The program below is a variation of findlinks that makes the HTTP request itself so that we no longer need to run fetch. Because the HTTP and parsing operations can fail, findLinks declares two result: the list of discovered links and an error. Incidentally, the HTML parser can usually recover from bad input and construct a document containing error nodes, so Parse rarely failes; when it does, its's typically due to underlying I/O errors.

package main

import (
    "fmt"
    "net/http"
    "os"

    "golang.org/x/net/html"
)

func main() {
    for _, url := range os.Args[1:] {
        links, err := findLinks(url)
        if err != nil {
            fmt.Fprintf(os.Stderr, "findlinks2: %v\n", err)
            continue
        }
        for _, link := range links {
            fmt.Println(link)
        }
    }
}

// findLinks performs an HTTP GET request for url, parses the
// response as HTML, and extracts and returns the links.
func findLinks(url string) ([]string, error) {
    resp, err := http.Get(url)
    if err != nil {
        return nil, err
    }
    if resp.StatusCode != http.StatusOK {
        resp.Body.Close()
        return nil, fmt.Errorf("getting %s: %s", url, resp.Status)
    }
    doc, err := html.Parse(resp.Body)
    resp.Body.Close()
    if err != nil {
        return nil, fmt.Errorf("parsing %s as HTML: %v", url, err)
    }
    return visit(nil, doc), nil
}

// visit appends to links each link found in n and returns the result.
func visit(links []string, n *html.Node) []string {
    if n.Type == html.ElementNode && n.Data == "a" {
        for _, a := range n.Attr {
            if a.Key == "href" {
                links = append(links, a.Val)
            }
        }
    }
    for c := n.FirstChild; c != nil; c = c.NextSibling {
        links = visit(links, c)
    }
    return links
}

We must ensure that resp.Body is closed so that network resources are properly released even in case of error. Go's garbage collector recycles unused memory, but do not assume it will release unused operating system resources like open files and network connections. They should be closed explicitly.

Well-chose names can document the significance of a function's results. Names are particularly valuable when a function returns multiple results of the same type. But it's not always necessary to name multiple results solely for documentation. For instance, convention dicates that a final bool result indicates success; an error result often needs no explanation.

In a function with named results, the operands of a return statement may be omitted. This is called a bare return.

// CountWordsAndImages does an HTTP GET request for the HTML
// document url and returns the number of words and images in it.
func CountWordsAndImages(url string) (words, images int, err error) {
    resp, err := http.Get(url)
    if err != nil {
        return
    }
    doc, err := html.Parse(resp.Body)
    resp.Body.Close()
    if err != nil {
        err = fmt.Errorf("parsing HTML: %s", err)
        return
    }
    words, images = countWordsAndImages(doc)
    return
}
func countWordsAndImages(n *html.Node) (words, images int) { /* ... */ }

A bare return is a shorthand way to return each of the named result variables in order, so in the function above, each return statement is equivalent to return words, images, err.

In functions like this one, with many return statements and several results, bare returns can reduce code duplication, but they rarely make code easier to understand. For instance, it's not obvious at first glance that the two early returns are equivalent to return 0, 0, err (because the result variables words and images are initialized to their zero values) and that the final return is equivalent to return words, images, nil. For this reason, bare returns are best used sparingly.

Errors

For many other functions, even in a well-written program, success is not assured because it depends on factors beyond the programmer's control. Any function that does I/O, for example, must confront the possibility of error, and only a naive programmer believes a simple read or write cannot fail. Indeed, it's when the most reliable operations fail unexpectedly that we most need to know why.

Errors are thus an important part of a package's API or an application's user interface, and failure is just one of several expected behaviors. This is the approach Go takes to error handling.

A function for which failure is an expected behavior returns an additional result, conventionally the last one. If the failure has only one possible cause, the result is a boolean, usually called ok.

More often, and especially for I/O, the failure may have a variety of causes for which the caller will need an explanation. In such cases, the type of the additional result is error.

The built-in type error is an interface type. For now it's enough to know that an error may be nil or not-nil, that nil implies success and non-nil implies failure, and that a non-nil error has an error message string which we can obtain by calling its Error method or print by calling fmt.Println(err) or fmt.Printf("%v", err).

Usually when a function returns a non-nil error, its other results are undefined and should be ignored. However, a few functions may return paritial results in error cases.

Go's approach sets it apart from many other languages in which failures are reported using exceptions, not ordinary values. Although Go does have an exception mechanism of sorts, it is used only for reporting truly unexpected errors that indicate a bug, not the routine errors that a robust program should be built to expect.

The reason for this design is that exceptions tend to entangle the description of an error with the control flow required to handle it, often leading to an undesirable outcome: routine errors are reported to the end use in the form of an incomprehensible stack trace, full of information about the structure of the program but lacking intelligible context about what went wrong.

By contrast, Go programs use ordinary control-flow mechanisms like if and return to respond to errors. This style undeniably demands that more attention be paid to error-handling logic, but that is precisely the point.

Error-Handling Strategies

When a function call returns an error, it's the caller's responsibility to check it and take appropriate action. Depending on the situation, there may be a number of possibilities. Let's take a look at five of them.

First, and most common, is to propagate the error, so that a failure in a subroutine becomes a failure of the calling routine. We saw examples of this in the findLinks function of Section Functions/Multiple Return Values. If the call to http.Get fails, findLinks returns the HTTP error to the caller without further ado:

resp, err := http.Get(url)
if err != nil {
    return nil, err
}

In contrast, if the call to html.Parse fails, findLinks does not return the HTML parser's error directly because it lacks two crucial pieces of information: that the error occurred in the parser, and the URL of the document that was being parsed. In this case, findLinks constructs a new error message that includes both pieces of information as well as the underlying parse error:

doc, err := html.Parse(resp.Body)
resp.Body.Close()
if err != nil {
    return nil, fmt.Errorf("parsing %s as HTML: %v", url, err)
}

The fmt.Errorf function formats an error message using fmt.Sprintf and returns a new error value. We use it to build descriptive errors by successively prefixing additional context information to the original error message. When the error is ultimately handled by the program's main function, it should provide a clear causal chain from the root problem to the overall failure, reminiscent of a NASA accident investigation:

genesis: crashed: no parachute: G-switch failed: bad relay orientation

Because error message are frequently chained together, message strings should not be capitalized and newlines should be avoided. The resulting errors may be long, but they will be selfcontained when found by tools like grep.

When designing error message, be deliberate, so that each one is a meaningful description of the problem with sufficient and relevant detail, and be consistent, so that errors returned by the same function or by a group of functions in the same package are similar in form and can be dealt with in the same way.

For example, the os package guarantees that every error returned by a file operation, such as os.Open or the Read, Write, or Close methods of an open file, describes not just the nature of the failure(permission denied, no such directory, and so on) but also the name of the file, so the caller needn't include this information in the error message it constructs.

In general, the call f(x) is responsible for reporting the attempted operation f and the argument value x as they relate to the context of the error. The caller is responsible for adding further information that it has but the call f(x) does not, such as the URL in the call to html.Parse above.

Let's move on to the second strategy for handling errors. For errors that represent transient or unpredictable problems, it may make sense to retry the failed operation, possibly with a delay between tries, and perhaps with a limit on the number of attempts or the time spent trying before giving up entirely.

// WaitForServer attempts to contact the server of a URL.
// It tries for one minute using exponential back-off.
// It reports an error if all attempts fail.
func WaitForServer(url string) error {
    const timeout = 1 * time.Minute
    deadline := time.Now().Add(timeout)
    for tries := 0; time.Now().Before(deadline); tries++ {
        _, err := http.Head(url)
        if err == nil {
            return nil // success
        }
        log.Printf("server not responding (%s); retrying...", err)
        time.Sleep(time.Second << uint(tries)) // exponential back-off
    }
    return fmt.Errorf("server %s failed to respond after %s", url, timeout)
}

Third, if progress is impossible, the caller can print the error and stop the program gracefully, but this course of action should generally be reserved for the main package of a program. Library functions should usually propagate errors to the caller, unless the error is a sign of an internal inconsistency—that is, a bug.

// (In function main.)
if err := WaitForServer(url); err != nil {
    fmt.Fprintf(os.Stderr, "Site is down: %v\n", err)
    os.Exit(1)
}

A more convenient way to achieve the same effect is to call log.Fatalf. As with all the log functions, by default it prefixes the time and date to the error message.

// (In function main.)
if err := WaitForServer(url); err != nil {
    fmt.Fprintf(os.Stderr, "Site is down: %v\n", err)
    os.Exit(1)
}

The default format is helpful in a long-running server, but less so for an interactive tool:

2017/09/20 15:04:05 Site is down: no such domain: bad.gopl.io

For a more attractive output, we can set the prefix used by the log package to the name of the command, and suppress the display of the data and time:

log.SetPrefix("wait: ")
log.SetFlags(0)

Fourth, in some cases, it's sufficient just to log the error and then continue, perhaps with reduced functionality. Again there's a choice between using the log package, which adds the usual prefix:

if err := Ping(); err != nil {
    log.Printf("ping failed: %v; networking disabled", err)
}

and printing directly to the standard error stream:

if err := Ping(); err != nil {
    fmt.Fprintf(os.Stderr, "ping failed: %v; networking disabled\n", err)
}

(All log functions append a newline if one is not already present.)

And fifth and finally, in rare cases we can safely ignore an error entirely:

dir, err := ioutil.TempDir("", "scratch")
if err != nil {
    return fmt.Errorf("failed to create temp dir: %v", err)
}
// ...use temp dir...
os.RemoveAll(dir) // ignore errors; $TMPDIR is cleaned periodically

The call to os.RemoveAll may fail, but the program ignores it because the operating system periodically cleans out the temporary directory. In this case, discarding the error was intentional, but the program logic would be the same had we forgotten to deal with it. Get into the habit of considering errors after every function call, and when you deliberately ignore one, document your intention clearly.

Error handling in Go has a particular rhythm. After checking an error, failure is usually dealt with before success. If failure causes the function to return, the logic for success is not indented within an else block but follows at the outer level. Functions tend to exhibit a common structure, with a series of initial checks to reject errors, followed by the substance of the function at the end, minimally indented.

Usually, the variety of errors that a fucntion may return is interesting to the end user but not to the intervening program logic. On occasion, however, a program must take different actions depending on the kind of error that has occurred. Consider an attempt to read n bytes of data from a file. If n is chosen to be the length of the file, any error represents a failure. On the other hand, if the caller repeatedly tries to read fixed-size chunks until the file is exhausted, the caller must respond differently to an end-of-file condition than it does to all other errors. For this reason, the io package guarantees that any read failure caused by an end-of-file condition is always reported by a distinguished error, io.EOF, which is defined as follows:

package io
import "errors"
// EOF is the error returned by Read when no more input is avaliable.
var EOF = errors.New("EOF")

The caller can detect this condition using a simple comparison, as in the loop below, which reads runes from the standard input. (The charcount program in Section 4.3 provides a more complete example.)

in := bufio.NewReader(os.Stdin)
for {
    r, _, err := in.REadRune()
    if err == io.EOF {
        break // finished reading
    }
    if err != nil {
        return fmt.Errorf("read failed: %v", err)
    }
    // ...use r...
}

Since in an end-of-file condition there is no information to report besides the fact of it, io.EOF has a fixed error message, "EOF". For other errors, we may need to report both the quality and quantity of the error, so to speak, so a fixed error value will not do. In chapter 7, we'll present a more systematic way to distinguish certain error values from others.

Function Values

Functions are first-class values in Go: like other values, function values have types, and they may be assigned to variables or passed to or returned from functions.

The zero value of a function type is nil. Calling a nil function value causes a panic:

var f func(int) int
f(3) // panic: call of nil function

Function values may be compared with nil:

var f func(int) int 
if f != nil {
    f(3)
}

but they are not comparable, so they may not be compared against each other or used as keys in a map.

Function values let us parameterize our functions over not just data, but behavior too.

Anonymous Functions

Named function can be declared only at the package level, but we can use a function litteral to denote a function value within any expression. A function literal is written like a function declaration, but without a name following the func keyword. It is an expression, and its value is called an anonymous function.

More importantly, function defined in this way have access to the entire lexical environment, so the inner function can refer to variables from the enclosing function, as this example shows:

// squares returns a function that returns
// the next squares number each time it is called.
func squares() func() int {
    var x int
    return func() int {
        x++
        return x * x
    }
}

func main() {
    f := squares()
    fmt.Println(f(), f(), f(), f()) // 1 4 9 16
}

The function squares returns another functiom, of type func() int. A call to squares creates a local variable x and returns an anonymous function that, each time it is called, increments x and returns its square. A second call to squares would create a second variable x and return a new anonymous function which increments that variable.

The squares example demonstrates that function values are not just code but can have state. The anonymous inner function can access and update the local variable of the enclosing function squares. These hidden variable reference are why we classify functions as reference types and why function values are not comparable. Function values like these are implemented using a technique called closures, and Go programmers often use this term for function values.

Here again we see an example where the lifetime of a variable is not determined by its scope: the variable x exists after squares has returnd within main, even though x hidden inside f.

As a somewhat academic example of anonymous functions, consider the problem of computing a sequence of computer science courses that statisfies the prerequisite requirements of each one. The prerequisites are given in the prereqs table below, which is a mapping from each course to the list of courses that must be completed before it.

// prereqs maps computer science courses to their prerequisites.
var prereqs = map[string][]string{
    "algorithm": {"data structures"},
    "calculus": {"linear argebra"},
    "compilers": {
        "data structures",
        "formal languages",
        "computer organization",
    },
    "data structures": {"discrete math"},
    "databases": {"data structures"},
    "discrete math": {"intro to programming"},
    "formal languages": {"discrete math"},
    "networks": {"operating systems"},
    "operating systems": {"data structures", "computer organization"},
    "programming language": {"data structures", "computer organization"},
}

func main() {
    for i, course := range topoSort(prereqs) {
        fmt.Printf("%d:\t%s\n", i+1, course)
    }
}

func topoSort(m map[string][]string) []string {
    var order []string
    seen := make(map[string]bool)
    var visitAll func(items []string)
    visitAll = func(items []string) {
        for _, item := range items {
            if !seen[item] {
                seen[item] = true
                visitAll(m[item])
                order = append(order, item)
            }
        }
    }
    var keys []string
    for key := range m {
        keys = append(keys, key)
    }
    sort.Strings(keys)
    visitAll(keys)
    return order
}

This kind of problem is known as topological sorting. Conceptually, the prerequisite information forms a directed graph with a node for each course and edges from each course to the courses that it depends on. The graph is acyclic: there is no path from a course that leads back to itself. We can compute a valid sequence using depth-first search through the graph with the code above.

When an anonymous function requires recursion, as in this example, we must first declare a variable, and then assign the anonymous function to the variable. Had these two steps been combined in the declaration, the function literal would not be within the scope of the variable visitAll so it would have no way to call itself recursively:

visitAll := func(items []string) {
    // ...
    visitAll(m[item]) // compile error: undefined: visitAll
    // ...
}

The output of the toposort program is shown below. It is deterministic, an often-desirable property that doesn't always come for tree. Here, the values of the prereqs map are slices, not more maps, so their iteration order is deterministic, and we sorted the keys of prereqs before making the initial call to visitAll.

1:      intro to programming
2:      discrete math
3:      data structures
4:      algorithm
5:      linear argebra
6:      calculus
7:      formal languages
8:      computer organization
9:      compilers
10:    databases
11:    operating systems
12:    networks
13:    programming language

Let's see our findLinks example. We used forEachNode to handle the traversal. Since Extract needs only the pre function, it passes nil for the post argument.

The links.go:

// Package links provide a link-extration function.
package links

import (
    "fmt"
    "net/http"

    "golang.org/x/net/html"
)

// Extract make an HTTP GET request to the specified URL, parses
// the response an HTML, and returns the links in the HTML document.
func Extract(url string) ([]string, error) {
    resp, err := http.Get(url)
    if err != nil {
        return nil, err
    }
    if resp.StatusCode != http.StatusOK {
        resp.Body.Close()
        return nil, fmt.Errorf("getting %s: %s", url, resp.Status)
    }

    doc, err := html.Parse(resp.Body)
    resp.Body.Close()
    if err != nil {
        return nil, fmt.Errorf("parsing %s as HTML: %v", url, err)
    }

    var links []string
    visitNode := func(n *html.Node) {
        if n.Type == html.ElementNode && n.Data == "a" {
            for _, a := range n.Attr {
                if a.Key != "href" {
                    continue
                }
                link, err := resp.Request.URL.Parse(a.Val)
                if err != nil {
                    continue // ignore bad URLs
                }
                links = append(links, link.String())
            }
        }
    }
    forEachNode(doc, visitNode, nil)
    return links, nil
}

// forEachNode calls the function pre(x) and post(x) for each node
// x in the tree rooted at n. Both functions are optional.
// pre is called before the children are visited(preorder) and
// post is called after(postorder).
func forEachNode(n *html.Node, pre, post func(n *html.Node)) {
    if pre != nil {
        pre(n)
    }

    for c := n.FirstChild; c != nil; c = c.NextSibling {
        forEachNode(c, pre, post)
    }

    if post != nil {
        post(n)
    }
}

and the findlinks3.go:

package main

import (
    "fmt"
    "links"
    "log"
    "os"
)

// breadthFirst calls f for each item in the worklist.
// Any items returned by f are added to the worklist.
func breadthFirst(f func(item string) []string, worklist []string) {
    seen := make(map[string]bool)
    for len(worklist) > 0 {
        items := worklist
        worklist = nil
        for _, item := range items {
            if !seen[item] {
                seen[item] = true
                worklist = append(worklist, f(item)...)
            }
        }
    }
}

func crawl(url string) []string {
    fmt.Println(url)
    list, err := links.Extract(url)
    if err != nil {
        log.Print(nil)
    }
    return list
}

func main() {
    // Crawl the web breadth-first,
    // starting from the command-line arguments.
    breadthFirst(crawl, os.Args[1:])
}

Crawling the web is, at its heart, a poblem of graph traversal. The topoSort example showed a depth-first traversal; for our web crawler, we'll use breadth-first traversal, at least initially.

The breadthFirst function encapsulates the essence of a breadth-first traversal. The caller provides an initial list worklist of items to visit and a function value f to call for each item. Each item is identified by a string. The function f returns a list of new items to append to the worklist. The breadthFirst function returns when all items have been visited. It maintains a set of strings to ensure that no item is visited twice.

As we explained in passing, the argument "f(item)..." causes all the items in the list returned by f to be appended to the worklist.

In our crawler, items are URLs . The crawl function we'll supply to breadthFirst prints the URL, extracts its links, and returns them so that they too are visited.

To start the crawler off, we'll use the command-line arguments as the initial URLs.

Let's crawl the web starting form https://golang.org. Here are some of the resulting links:

$ go run findlinks3.go https://golang.org
https://golang.org/
https://golang.org/doc/
https://golang.org/pkg/
https://golang.org/project/
https://code.google.com/p/go-tour/
https://golang.org/doc/code.html
https://www.youtube.com/watch?v=XCsL89YtqCs
http://research.swtch.com/gotour
https://vimeo.com/53221560
...

The process ends when all reachable web pages have been crawled or the memory of the computer is exhausted.

Variadic Functions

A variadic function is one that can be called with varying numbers of arguments. The most familiar examples are fmt.Printf and its variants. Printf requires one fixed argument at the beginning, then accepts any number of subsequent arguments.

To declare a variadic function, the type of the final parameter is preceded by an ellipsis, "...", which indicates that the function may be called with any number of arguments of this type.

func sum(vals ...int) int {
    total := 0
    for _, val := range vals {
        total += val
    }
    return total
}

The sum function above returns the sum of zero or more int arguments. Within the body of the function, the type of vals is an []int slice. When sum is called, any number of values may be provided for its vals parameter.

fmt.Println(sum()) // 0
fmt.Println(sum(3)) // 3
fmt.Println(sum(1, 2, 3, 4)) // 10

Implicitly, the caller allocates an array, copies the arguments into it, and passes a slice of the entire array to the function. The last call above thus behaves the same as the call below, which shows how to invoke a variadic function when the arguments are already in a slice: place an ellipsis after the final argument.

values := []int{1, 2, 3, 4}
fmt.Println(sum(values...)) // 10

Although the ...int parameter behaves like a slice within the function body, the type of a variadic function is distinct from the type of a function with an ordinary slice parameter.

func f(...int) {}
func g([]int) {}
fmt.Printf("%T\n", f) // "func(...int)"
fmt.Printf("%T\n", g) // "func([]int)"

Variadic functions are often used for string formatting. The errorf function below constructs a formatted error message with a line number at the beginning. The suffix f is a widely followed naming convention for variadic functions that accept a Printf-style format string.

func errorf(linenum int, format string, args ...interface{}) {
    fmt.Fprintf(os.Stderr, "Line %d: ", linenum)
    fmt.Fprintf(os.Stderr, format, args...)
    fmt.Fprintln(os.Stderr)
}
linenum, name := 12, "count"
errorf(linenum, "undefined: %s", name) // Line 12: undefined: count

The interface{} type means that this function can accept any values at all for its final arguments, as we'll explain in Chapter 7.

Deferred Function Calls

Syntactically, a defer statement is an ordinary function or method call prefixed by the keyword defer. The function and argument expressions are evaluated when the statement is executed, but the actual call is deferred until the function that contains the defer statement has finished, whether normally, by executing a return statement or falling off the end, or abnormally, by panicking. Any number of calls may be deferred; they are executed in the reverse of the order in which they were deferred.

A defer statement is often used with paired operations like open and close, connect and disconnect, or lock and unlock to ensure that resources are released in all cases, no matter how complex the control flow. The right place for a defer statement that release a resource is immediately after the resource has been successfully acquired.

The defer statement can also be used to pair "on entry" and "on exit" actions when debugging a complex function. The bigSlowOperation function below calls trace immediately, which does the "on entry* action then returns a fuction value that, when called, does the corresponding "on exit" action. By deferring a call to the returned function in this way, we can instrument the entry point and all exit points of a function in a single statement and even pass values, like the start time, between the two actions. But don't forget the final parentheses in the defer statement, or the "on entry" action will happen on exit and the on-exit action won't happen at all!

func bigSlowOperation() {
    defer trace("bigSlowOperation")() // don't forget the extra parentheses
    // ... lots of work ...
    time.Sleep(10 * time.Second) // simulate slow operation by sleeping
}

func trace(msg string) func() {
    start := time.Now()
    log.Printf("enter %s", msg)
    return func() { log.Printf("exit %s (%s)", msg, time.Since(start)) }
}

Each time bigSlowOperation is called, it logs its entry and exit and the elapsed time between them. (We used time.Sleep to simulate a slow operation.)

2017/09/21 11:35:15 enter bigSlowOperation
2017/09/21 11:35:25 exit bigSlowOperation (10.003859765s)

Deferred functions run after return statements have updated the function's result variables. Because an anonymous function can access its enclosing function's variables, including named result, a deferred anonymous function can observe the function's results.

Consider the function double:

func double(x int) int {
    return x + x
}

By naming its result variable and adding a defer statement, we can make the function print its arguments and results each time it is called.

func double(x int) (result int) {
    defer func() { fmt.Printf("double(%d) = %d\n", x, result) }()
    return x + x
}
_ = double(4) // Output: double(4) = 8

This trick is overkill for a function as simple as double but may be useful in functions with many return statements.

A deferred anonymous function can even change the values that the enclosing function returns to its caller:

func triple(x int) (result int) {
    defer func() { result += x }()
    return double(x)
}

fmt.Println(triple(4)) // 12

Because deferred functions aren't executed until the very end of a function's execution, a defer statement in a loop deserves extra scrutiny. The code below could run out of file descriptors since no file will be closed until all files have been processed:

func test() {
    for _, filename := range filenames {
        f, err := os.Open(filename)
        if err != nil {
            return err
        }
        defer f.Close() // NOTE: risky; could run out of file descriptors
        // ... process f ...
    }
}

One solution is to move the loop body, includingh the defer statement, into another function that is called on each iteration.

for _, filename := range filenames {
    if err := doFile(filename); err != nil {
        return err
    }
}

func doFile(filename string) error {
    f, err := os.Open(filename)
    if err != nil {
        return err
    }
    defer f.Close()
    // ... process f ...
}

Panic

Go's type system catches many mistakes at compile time, but others, like an out-of-bounds array access or nil pointer dereference, require checks at run time. When the Go runtime detects these mistakes, it panics.

During a typical panic, normal execution stops, all deferred function calls in that goroutine are executed, and the program crashes with a log message. This log message includes the panic value, which is usually an error message of some sort, and, for each goroutine, a stack trace showing the stack of function calls that were active at the time of the panic. This log message often has enough information to diagnose the root cause of the problem without running the program again, so it should always be included in a bug report about a panicking program.

Not all panics come from the runtime. The built-in panic function may be called directly; it accepts any value as an argument. A panic is often the best thing to do when some "impossible" situation happens, for instance, execution reaches a case that logically can't happen.

It's good practice to assert that the preconditions of a function hold, but this can easily be done to excess. Unless you can provide a more informative error message or detect an error sooner, there is no point asserting a condition that the runtime will check for you.

func Reset(x *Buffer) {
    if x == nil {
        panic("x is nil") // unnecessary!
    }
    x.elements = nil
}

Although Go's panic mechanism resembles exceptions in other language, the situations in which panic is used are quite different. Since a panic cause the program to crash, it is generally used for grave errors, such as a logical inconsistency in the program; diligent programmers consider any crash to be proof of a bug in their code. In a robust program, "expected" errors, the kind that arise from incorrect input, misconfiguration, or failing I/O, should be handled gracefully; they are best deal with using error values.

Consider the function regexp.Compile, which compiles a regular expression into an effcient form for matching. It returns an error if called with an ill-formed pattern, but checking this error is unnecessary and burdensome if the caller knows that a particular call cannot fail. In such cases, it's reasonable for the caller to handle an error by panicking, since it is believed to be impossible.

Since most regular expressions are literals in the program source code, the regexp package provides a wrapper function regexp.MustCompile that does this check:

package regexp

func Compile(expr string) (*Regexp, error) { /* ... */ }

func MustCompile(expr string) *Regexp {
    re, err := Compile(expr)
    if err != nil {
        panic(err)
    }
    return re
}

The wrapper function makes it convenient for clients to initialize a package-level variable with a compiled regular expression, like this:

var httpSchemeRE = regexp.MustCompile(`^https?:`) // "http:" or "https:"

Of course, MustCompile should not be called with untrusted input values. The Must prefix is a common naming convention for functions of this kind.

When a panic occurs, all deferred function are run in reverse order, starting with those of the topmost function on the stack and processding up to main, as the program below demonstrates:

package main

import (
    "fmt"
)

func main() {
    f(3)
}

func f(x int) {
    fmt.Printf("f(%d)\n", x+0/x) // panics if x == 0
    defer fmt.Printf("defer %d\n", x)
    f(x - 1)
}

When run, the program prints the following to the standard output:

f(3)
f(2)
f(1)
defer 1
defer 2
defer 3

A panic occurs during the call f(0), causing the three deferred calls to fmt.Printf to run. Then the runtime terminates the program, printing the panic message and a stack dump to the standard error stream:

panic: runtime error: integer divide by zero

goroutine 1 [running]:
main.f(0x0)
        /Users/Jiang_Hu/Desktop/all/gopl/ch5/src/defer1.go:12 +0x1b1
main.f(0x1)
        /Users/Jiang_Hu/Desktop/all/gopl/ch5/src/defer1.go:14 +0x180
main.f(0x2)
        /Users/Jiang_Hu/Desktop/all/gopl/ch5/src/defer1.go:14 +0x180
main.f(0x3)
        /Users/Jiang_Hu/Desktop/all/gopl/ch5/src/defer1.go:14 +0x180
main.main()
        /Users/Jiang_Hu/Desktop/all/gopl/ch5/src/defer1.go:8 +0x2a
exit status 2

Methods

Although there is no universally accepted definition of object-oriented programming, for our purposes, an object is simply a value or variable that has methods, and a method is a function associated with a particular type. An object-oriented program is one that uses methods to express the properties and operations of each data structure so that clients need not access the object's representation directly.

We'll cover two key principles of object-oriented programming, encapsulation and composition.

Method Declarations

A method is declared with a variant of the ordinary function declaration in which an extra parameter appears before the function name. The parameter attaches the function to the type of that parameter.

The extra parameter is called the method's receiver, a legacy from early object-oriented languages that descirbed calling a method as "sending a message to an object".

In Go, we don't use a special name like this or self for the receiver; we choose receiver names just as we would for any other parameter.

Methods may be declared on any name type defined in the same package, so long as its underlying type is neither a pointer nor an interface.

All methods of a given type must have unique names, but different types can use the same name for a method. The first benefit to using methods over ordinary functions: method name can be shorter. The another benefit is magnified for calls originating outside the package, since they can use the shorter name and omit the package name.

Methods with a Pointer Receiver

Because calling a function makes a copy of each argument value, if a function needs to update a variable, or if an argument is so large that we wish to avoid copying it, we must pass the address of the variable using a pointer. The same goes for method that need to update the receiver variable: we attach them to the pointer type.

To avoid ambiguities, method declarations are not permitted on named type that are themselves pointer types:

type P *int
func (P) f() {} // compile error: invalid receiver type

If receiver argument t is a variable of type T but the method requires a *T receiver, we can use this shorthand: t.f() and the compiler will perform an implicit &t on the variable.

Or the receiver argument has type *T and the receiver parameter has type T. The compiler implicitly dereferences the receiver, in other words, loads it value.

If all the methods of a named type T have a receiver type of T itself (not *T), it is safe to copy instance of that type; calling any of its methods necessarily makes a copy. For example, time.Duration values are liberally copied, including as arguments to functions. But if any method has a pointer receiver, you should avoid copying instances of T because doing so may violate internal invariants. For example, copying an instance of bytes.Buffer would cause the original and the copy to alias the same underlying array of bytes. Subsequent method calls would have unpredictable effects.

Nil Is a Valid Receiver Value

Just as some functions allow nil pointers as arguments, so do some methods for their receiver, especially if nil is a meaningful zero value of the type, as with maps and slices. In this simple linked list of integers, nil represents the empty list:

// An IntList is a linked list of integers.
// A nil *IntList represents the empty list.
type IntList struct {
    Value int
    Tail *IntList
}

// Sum return the sum of the list elements.
func (list *IntList) Sum() int {
    if list == nil {
        return 0
    }
    return list.Value + list.Tail.Sum()
}

When you define a type whose methods allow nil as a receiver value, it's worth pointing this out explicitly in its documentation comment, as we did above.

Composing Types by Struct Embedding

A struct type may have more than one anonymous field. Such as, we declare ColoredPoint as

type ColoredPoint struct {
    Point
    color.RGBA
}

then a value of this type would have all the methods of "Point", all the methods of "RGBA", and any additional methods declared on ColoredPoint directly. When the compiler resolves a selector such as cp.ScaleBy to a method, if first looks for a directly declared method named ScaleBy, then for methods promoted once from ColoredPoint's embedded fields, then for methods promoted twice from embedded field within Point and RGBA, and so on. The compiler reports an error if the selector was ambiguous because two methods were promoted from the same rank.

Methods can be declared only on named types(like Point) and pointers to them(*Point), but thanks to embedding, it's possible and sometimes useful for unnamed struct type to have methods too.

Here's a nice trick to illustrate. This example shows part of a simple cache implemented using two package-level variables, a mutex and the map that it guards:

var (
    mu sync.Mutex // guards mapping
    mapping = make(map[string]string)
)

func Lookup(key string) string {
    mu.Lock()
    v := mapping[key]
    mu.Unlock()
    return v
}

The version below is functionally equivalent but groups together the two related variables in a single package-level variable, cache:

var cache = struct {
    sync.Mutex
    mapping map[string]string
}{
    mapping: make(map[string]string),
}

func Lookup2(key string) string {
    cache.Lock()
    v := cache.mapping[key]
    cache.Unlock()
    return v
}

The new variable gives more expressive names to the variables related to the cache, and because the sync.Mutex field is embedded within it, its Lock and Unlock methods are promoted to the unnamed struct type, allowing us to lock the cache with a self-explanatory syntax.

Method Values and Expressions

A method expression, written T.f or (*T).f where T is a type, yields a funtion value with a regular first parameter taking the place of the receiver, so it can be called in the usual way.

Method expression can be helpful when you need a value to represent a choice among several methods belonging to the same type so that you can call the chosen method with many different receivers. In the following example, the variable op represents either the addition or the subtraction method of type Point, and Path.TranslateBy calls it for each point in the Path:

type Point struct{ X, Y float64 }

func (p Point) Add(q Point) Point { return Point{p.X + q.X, p.Y + q.Y} }
func (p Point) Sub(q Point) Point { return Point{p.X + q.X, p.Y - q.Y} }

type Path []Point

func (path Path) TranslateBy(offset Point, add bool) {
    var op func(p, q Point) Point
    if add {
        op = Point.Add
    } else {
        op = Point.Sub
    }
    for i := range path {
        // Call either path[i].Add(offset) or path[i].Sub(offset)
        path[i] = op(path[i], offset)
    }
}

Example: Bit Vector Type

Sets in Go are usually implemented as a map[T]bool, Where T is the element type. A set represented by a map is very flexible but, for certain problems, a specialized representation may outperform it. For example, in domains such as dataflow analysis where set elements are small non-negative integers, sets have many elements, and set operations like union and intersection are common, a bit vector is ideal.

A bit vector uses a slice of unsigned integer values or "words", each bit of which represents a possible element of the set. The set contains i if the i-th bit is set. The following program demonstrates a simple bit vector type:

// An IntSet is a set of small non-negative integers.
// Ints zero value represents the empty set.
type IntSet struct {
    words []uint64
}

// Has reports whether the set contains the non-negative value x.
func (s *IntSet) Has(x int) bool {
    word, bit := x/64, uint(x%64)
    return word < len(s.words) && s.words[word]&(1<<bit) != 0
}

// Add adds the non-negative value x to the set.
func (s *IntSet) Add(x int) {
    word, bit := x/64, uint(x%64)
    for word >= len(s.words) {
        s.words = append(s.words, 0)
    }
    s.words[word] |= 1 << bit
}

// UnionWith sets s to the union of s and t.
func (s *IntSet) UnionWith(t *IntSet) {
    for i, tword := range t.words {
        if i < len(s.words) {
            s.words[i] |= tword
        } else {
            s.words = append(s.words, tword)
        }
    }
}

// String returns the set as a string of the form "{1 2 3}".
func (s *IntSet) String() string {
    var buf bytes.Buffer
    buf.WriteByte('{')
    for i, word := range s.words {
        if word == 0 {
            continue
        }
        for j := 0; j < 64; j++ {
            if word&(1<<uint(j)) != 0 {
                if buf.Len() > len("{") {
                    buf.WriteByte(' ')
                }
                fmt.Fprintf(&buf, "%d", 64*i+j)
            }
        }
    }
    buf.WriteByte('}')
    return buf.String()
}

func main() {
    var x, y IntSet
    x.Add(1)
    x.Add(144)
    x.Add(9)
    fmt.Println(x.String()) // {1 9 144}

    y.Add(9)
    y.Add(42)
    fmt.Println(y.String()) // {9 42}

    x.UnionWith(&y)
    fmt.Println(x.String()) // {1 9 42 144}

    fmt.Println(x.Has(9), x.Has(123)) // true false
}

A word of caution: we declared String and Has as methods of the pointer type *IntSet not out of necessity, but for consistency with the other two methods, which need a pointer receiver because they assign to s.words. Consequently, an IntSet value does not have a String method, occasionally leading to surprises like this:

fmt.Println(&x) // {1 9 42 144}
fmt.Println(x.String()) // {1 9 42 144}
fmt.Println(x) // {[4398046511618 0 65536]}

In the first case, we print an *IntSet pointer, which does have a String method. In the second case, we call String() on an IntSet variable; the compiler inserts the implicit & operation, giving us a pointer, which has the String method. But in the third case, because the IntSet value does not have a String method, fmt.Println prints the representation of the struct instead. It's important not to forget the & operator. Making String a method of IntSet, not *IntSet, might be a good idea, but this is a case-by-case judgment.

Encapsulation

A variable or method of an object is said to be encapsulated if it is inaccessible to clients of the object. Encapsulation, sometimes called information hiding, is a key aspect of object-oriented programming.

Go has only one mechanism to control the visibility of names: capitalized identifiers are exported from the package in which they are defined, and uncapitalized names are not. The same mechanism that limits access to members of a package also limits access to the fields of a struct or the methods of a type. As a consequence, to encapsulate an object, we must make it a struct.

That's the reason the IntSet type from the previous section was declared as a struct type even though it has only a single field:

type IntSet struct {
    words []uint64
}

We could instead define IntSet as a slice type as follows, though of course we'd have to replace each occurrence of s.words by *s in its methods:

type IntSet []uint64

Although this version of IntSet would be essentially equivalent, it would allow client from other package to read and modify the slice directly. Put another way, whereas the expression *s could be used in any package, s.words may appear only in the package that defines IntSet.

Another consequence of this name-based mechanism is that the unit of encapsulation is the package, not the type as in many other languages. The fields of a struct type are visible to all code within the same package. Whether the code appears in a function or a method makes no difference.

Encapsultion provides three benefits. First, because clients cannot directly modify the object's variables, one need inspect fewer statements to understand the possible values of those variables.

Second, hiding implementation details prevents clients from depending on things that might change, which gives the designer greater freedom to evolve the implementation without breaking API compatibility.

The third benefit of encapsulation, and in many cases the most important, is that it prevents clients from setting an object's variables arbitrarily. Because the object's variables can be set only by functions in the same package, the author of that package can ensure that all those functions maintain the object's internal invariants. For example, the Counter type below permits clients to increment the counter or to reset it to zero, but not to set it to some arbitrary value:

type Counter struct{ n int }

func (c *Counter) N() int { return c.n }
func (c *Counter) Increment() { c.n++ }
func (c *Counter) Reset() { c.n = 0 }

Functions that merely access or modify internal value of a type, such as the methods of the Logger type from log package, below, are called getters and setters. However, when naming a getter method, we usually omit the Get prefix. This preference for brevity extends to all methods, not just field accessors, and to other redundant prefixes as well, such as Fetch, Find, and Lookup.

package log

type Logger struct {
    flag int
    prefix string
    // ...
}

func (l *Logger) Flags() int
func (l *Logger) SetFlags(flag int)
func (l *Logger) Prefix() string
func (l *Logger) SetPrefix(prefix string)

In this chapter, we learned how to associate methods with named types, and how to call those methods. Although methods are crucial to object-oriented programming, they're only half the picture. To complete it, we need interfaces, the subject of the next chapter.

Interfaces

Interface types express generalizations or abstractions about the behaviors of other types. By generalizing, interfaces let us write functions that are more flexible and adaptable because they are not tied to the details of one particular implementation.

Many object-oriented languages have some notion of interfaces, but what makes Go's interfaces so distinctive is that they are satisfied implicitly. In other words, there's no need to declare all the interfaces that a given concrete type satisfies; simply possessing the necessary method is enough. This design lets you create new interfaces that are satisfied by existing concrete types without changing the existing types, which is particularly useful for types defined in package that you don't control.

Interfaces as Contracts

There is another kind of type in Go called an interface type. An interface is an abstract type. It doesn't expose the representation or internal structure of its values, or the set of basic operations they support; it reveals only some of their methods. When you have a value of an interface type, you know nothing about what it is; you know only what it can do, or more precisely, what behaviors are provided by its methods.

Interface Types

An interface type specifies a set of methods that a concrete type must possess to be considered an instance of that interface.

The io.Writer type is one of the most widely used interfaces because it provides an abstraction of all the types to which bytes can be written, which includes files, memory buffers, network connection, HTTP clients, archivers, hashers, and so on. The io package defines many other useful interfaces. A Reader represents any type from which you can read bytes, and a Closer is any value that you can close, such as a file or a network connection. (By now you've probably noticed the naming convention for many of Go's single-method interfaces.)

package io

type Reader interface {
    Read(p []byte) (n int, err error)
}

type Writer interface {
    Write(p []byte) (n int, err error)
}

type Closer interface {
    Close() error
}

Looking farther, we find declarations of new interface type as combinations of existing ones. Here are two examples:

type ReadWriter interface {
    Reader
    Writer
}

type ReadWriteCloser interface {
    Reader
    Writer
    Closer
}

The syntax used above, which resembles struct embedding, lets us name another interface as a shorthand for writing out all of its methods. This is called embedding an interface. We could have written io.ReadWriter without embedding, albeit less succinctly, like this:

type ReadWriter interface {
    Read(p []byte) (n int, err error)
    Write(p []byte) (n int, err error)
}

or even using a mixture of the two styles:

type ReadWriter interface {
    Read(p []byte) (n int, err error)
    Writer
}

All three declarations have the same effect. The order in which the methods apper is immaterial. All that matters is the set of methods.

Interface Satisfaction

A type satisfies an interface if it possesses all the methods the interface requires. For example, an *os.File satisfies io.Reader, Writer, Closer, and ReadWriter. A *bytes.Buffer satisfies Reader, Writer, and ReadWriter, but does not satisfy Closer because it does not have a Close method. As a shorthand, Go programmers often say that a concrete type "is a" particular interface type, meaning that it satisfies the interface. For example, a *bytes.Buffers is an io.Writer; an os.File is an io.ReadWriter.

The assignability rule for interfaces is very simple: an expression may be assigned to an interface only if its type satisfies the interface. So:

var w io.Writer
w = os.Stdout // OK: *os.File has Write method
w = new(bytes.Buffer) // OK: *bytes.Buffer has Write method
w = time.Second // compile error: time.Duration lacks Write method

var rwc io.ReadWriteCloser
rwc = os.Stdout // OK: *os.File has Read, Write, Close methods
rwc = new(bytes.Buffer) // compile error: *bytes.Buffer lacks Close method

This rule applies even when the right-hand side is itself an interface:

w = rwc // OK: io.ReadWriteCloser has Write method
rwc = w // compile error: io.Writer lacks Close method

Before we go further, we should explain one subtlety in what it means for a type to have a method. For each named concrete type T, some of its methods have a receiver of type T itself whereas others require a *T pointer. Also that it is legal to call a *T method on an argument of type T so long as the argument is a variable; the compiler implicitly takes its address. But this is mere syntactic sugar: a value of type T does not possess all methods that a *T pointer does, and as a result it might satisfy fewer interfaces.

An example will make this clear. The String method of the IntSet type requires a pointer receiver, so we cannot call that method on a non-addressable IntSet value:

type IntSet struct { /* ... */ }
func (*IntSet) String() string

var _ = IntSet{}.String() // compile error: String requires *IntSet receiver

but we can call it on an IntSet variable:

var s IntSet
var _ = s.String() // OK: s is a variable and &s has a String method

However, since only *IntSet has a String method, only *IntSet satisfies the fmt.Stringer interface:

var _ fmt.Stringer = &s // OK
var _ fmt.Stringer = s // compile error: IntSet lacks String method

The type interface{}, which has no methods at all, tell us about the concrete types that satisfy it? That's right: nothing. This may seem useless, but in fact the type interface{}, which is called the empty interface type, is indispensable. Because the empty interface type places no demands on the types that satisfy it, we can assign any value to the empty interface.

var any interface{}
any = true
any = 12.34
any = "hello"
any = map[string]int{"one": 1}
any = new(bytes.Buffer)

But, we can do nothing directly to the value it holds since the interface has no methods. We need a way to get the value back out again. We'll see how to do that using a type assertion latter.

A concrete type may satisfy many unrelated interfaces.

Each grouping of concrete types based on their shared behaviors can be expressed as an interface type. Unlike class-based languages, in which the set of interfaces satisfied by a class is explicit, in Go we can define new abstractions or groupings of interest when we need them, without modifying the declaration of the concrete type. This is particulary useful when the concrete type comes from a package wittern by a different author.

Parsing Flags with flag.Value

In this section, we'll see how another standard interface, flag.Value, help us define new notations for command-line flags.

It's easy to define new flag notations for our data types. We need only define a type that satisfies the flag.Value interface, whose declaration is below:

package flag
// Value is the interface to the value stored in a flag
type Value interface {
    String() string
    Set(string) error
}

The String method formats the flag's value for use in command-line help message; thus every flag.Value is also a fmt.Stringer. The Set method parses its string argument and updates the flag value. In effect, the Set method is the inverse of String method, and it its good practice for them to use the same notation.

Let's define a celsiusFlag type that allows a temperature to be sepcified in Celsius, or in Fahrenheit with an appropriate conversion. Notice that celsiusFlag embeds a Celsius, thereby getting a String method for free. To satisfy flag.Value, we need only declare the Set method:

// *celsiusFlag satisfies the flag.Value interface.
package main

import (
    "flag"
    "fmt"
)

type Celsius float64
type Fahrenheit float64

func (c Celsius) String() string { return fmt.Sprintf("%g℃", c) }

// FToC converts a Fahrenheit temperature to Celsius.
func FToC(f Fahrenheit) Celsius { return Celsius(f-32) * 5 / 9 }

type celsiusFlag struct{ Celsius }

func (f *celsiusFlag) Set(s string) error {
    var unit string
    var value float64
    fmt.Sscanf(s, "%f%s", &value, &unit) // no error check needed
    switch unit {
    case "C", "°C":
        f.Celsius = Celsius(value)
        return nil
    case "F", "°F":
        f.Celsius = FToC(Fahrenheit(value))
        return nil
    }
    return fmt.Errorf("invalid temperature %q", s)
}

// CelsiusFlag defines a Celsius flag with the specified name,
// default value, and usage, and returns the address of the flag variable.
// The flag argument must have a quantity and a uint, e.g., "100C".
func CelsisuFlag(name string, value Celsius, usage string) *Celsius {
    f := celsiusFlag{value}
    flag.CommandLine.Var(&f, name, usage)
    return &f.Celsius
}

var temp = CelsisuFlag("temp", 20.0, "the temparature")

func main() {
    flag.Parse()
    fmt.Println(*temp)
}

Here's a typical session:

$ ./tempflag
20°C
$ ./tempflag -temp -18C
-18°C
$ ./tempflag -temp 212°F
100°C
$ ./tempflag -temp 273.15K
invalid value "273.15K" for flag -temp: invalid temperature "273.15K"
Usage of ./tempflag:
  -temp value
        the temperature (default 20°C)
$ ./tempflag -help
Usage of ./tempflag:
  -temp value
        the temperature (default 20°C)

Interface Values

Conceptually, a value of an interface type, or interface value, has two components, a concrete type and a value of that type. These are called the interface's dynamic type and dynamic value.

For a statically typed language like Go, types are a compile-time concept, so a type is not a value. In our conceptual model, a set of values called type descriptors provide information about each type, such as its name and methods. In an interface value, the type component is represented by the appropriate type descriptor.

In the four statements below, the variable w takes on three values. (The initial and final values are the same.)

var w io.Writer
w = os.Stdout
w = new(bytes.Buffer)
w = nil

Let's take a closer look at the value and dynamic behavior of w after each statement. The first statement declares w:

var w io.Writer

In Go, variables are always initialized to a well-defined value, and interfaces are no exception. The zero value for an interface has both its type and value components set to nil:

An interface value is described as nil or non-nil based on its dynamic type, so that is a nil interface value. You can test whether an interface value is nil using w == nil or w != nil.

Calling any method of a nil interface value causes a panic:

w.Write([]byte("hello")) // panic: nil pointer dereference

The second statement assigns a value of type *os.File to w:

w = os.Stdout

This assignment involves an implicit conversion from a concrete type to an interface type, and is equivalent to the explicit conversion io.Writer(os.Stdout). A conversion of this kind, whether explicit or implicit, captures the type and the value of its operand. The interface value's dynamic type is set to the type descriptor for the pointer type *os.File, and its dynamic value holds a copy of os.Stdout, which is a pointer to the os.File variable representing the standard output of the process.

Calling the Write method on an interface value containing an os.File pointer causes the (os.File).Write method to be called. The call prints "hello".

w.Write([]byte("hello")) // "hello"

In general, we cannot know at compile time what the dynamic type of an interface value will be, so a call through an interface must use dynamic dispatch. Instead of a direct call, the compiler must generate code to obtain the address of the method named Write from the type descriptor, then make an indirect call to that address. The receiver argument for the call is a copy of the interface's dynamic value, os.Stdout. The effect is as if we had made this call directly:

os.Stdout.Write([]byte("hello")) // "hello"

The third statement assigns a value of type *bytes.Buffer to the interface value:

w = new(bytes.Buffer)

The dynamic type is now *bytes.Buffer and the dynamic value is a pointer to the newly allocated buffer:

A call to the Write method uses the same mechanism as before:

w.Write([]byte("hello")) // writes "hello" to the bytes.Buffer

This time, the type descriptor is bytes.Buffer, so the (bytes.Buffer).Write method is called, with the address of the buffer as the value of the receiver parameter. The call appends "hello" to the buffer.

Finally, the fourth statement assigns a nil to the interface value:

w = nil

This resets both its components to nil, restoring w to the same state as when it was declared.

An interface value can hold arbitrarily large dynamic values. For example, the time.Time type, which represents an instant in time, is a struct type with several unexported fields. If we create an interface value from it,

var x interface{} = time.Now()

the result might look like:

Conceptually, the dynamic value always fits inside the interface value, no matter how large its type. (This is only a conceptual model; a realistic implementation is quite different.)

Interface values may be compared using == and !=. Two interface values are equal if both are nil, or their dynamic types are identical and their dynamic values are equal according to the usual behavior of == for that type. Because interface values are comparable, they may be used as the keys of a map or as the operand of a switch statement.

However, if two interface values are compared and have the same dynamic type, but that type is not comparable (a slice, for instance), then the comparison fails with a panic:

var x interface{} = []int{1, 2, 3}
fmt.Println(x == x) // panic: comparing uncomparable type []int

In this respect, interface types are unusual. Other types are either safely comparable (like basic types and pointers) or not comparable at all (like slices, maps, and functions), but when comparing interface values or aggregate types that contain interface values, we must be aware of the potential for a panic. A similar risk exists when using interfaces as map keys or switch operands. Only compare interface values if you are certain that they contain dynamic values of comparable types.

When handling errors, or during debugging, it is often helpful to report the dynamic type of interface value. For that, we use the fmt package's %T verb:

var w io.Writer
fmt.Printf("%T\n", w) // "<nil>"
w = os.Stdout
fmt.Printf("%T\n", w) // "*os.File"
w = new(bytes.Buffer)
fmt.Printf("%T\n", w) // "*bytes.Buffer"

Internally, fmt uses reflection to obtain the name of the interface's dynamic type. We'll look at reflection in after chapter.

Caveat: An Interface containing a Nil Pointer Is Non-Nil

A nil interface value, which contains no value at all, is not the same as an interface value containing a pointer that happens to be nil. This subtle distinction creates a trap into which every Go programmer has stumbled.

Consider the program below. With debug set to true, the main function collects the output of the function f in a bytes.Buffer.

const debug = true

func main() {
    var buf *bytes.Buffer
    if debug {
        buf = new(bytes.Buffer) // enable collection of output
    }
    f(buf) // NOTE: subtly incorrect!
    if debug {
        // ...use buf...
        fmt.Println(buf)
    }
}

// If out is non-nil, output will be written to it.
func f(out io.Writer) {
    // ...do something...
    if out != nil {
        out.Write([]byte("done!\n"))
    }
}

We might expect that changing debug to false would disable the collection of the output, but in fact it causes the program to panic during the out.Write call:

if out != nil {
    out.Write([]byte("done!\n")) // panic: nil pointer dereference
}

When main calls f, it assigns a nil pointer of type *bytes.Buffer to the out parameter, so the dynamic value of out is nil. However, its dynamic type is *bytes.Buffer, meaning that out is a non-nil interface containing a nil pointer value, so the defensive check out != nil is still true.

The problem is that although a nil *bytes.Buffer pointer has the methods needed to satisfy the interface, it doesn't statisfy the behavioral requirements of the interface. In particular, the call violates the implicit precondition of (*bytes.Buffer).Write that its receiver is not nil, so assigning the nil pointer to the interface was a mistake. The solution is to change the type of buf in main to io.Writer, thereby avoiding the assignment of the dysfunction value to the interface in the first place:

var buf io.Writer
if debug {
    buf = new(bytes.Buffer) // enable collection of output
}
f(buf) // ok

Sorting with sort.Interface

Go's sort.Sort function assumes nothing about the representation of either the sequence or its elements. Instead, it uses an interface, sort.Interface, to specify the contract between the generic sort algorithm and each sequence type that may be sorted. An implementation of this interface determines both the concrete representation of the sequence, which is often a slice, and the desired ordering of its elements.

An in-place sort algorithm needs three things—the length of the sequence, a means of comparing two elements, and a way to swap two elements—so they are the three methods of sort.Interface:

package sort

type Interface interface {
    Len() int
    Less(i, j int) bool // i, j are indices of sequence elements
}

To sort any sequence, we need to define a type that implements these three methods, then apply sort.Sort to an instance of that type.

By the way, when we want to sort a struct array. Each element should be indirect, a pointer to a struct. Although the sort would work if we stored the struct directly, the sort function will swap many pairs of elements, so it will run faster if each element is a pointer, which is a single machine word, instead of an entire struct, which might be eight words or more.

The error Interface

error type is an interface type with a single method that returns an error message:

type error interface {
    Error() string
}

The simplest way to create an error is by calling errors.New, which returns a new error for a given error message. The entire errors package is only four lines long:

package errors
func New(text string) error { return &errorString{text} }
type errorString struct { text string }
func (e *errorString) Error() string { return e.text }

The underlying type of errorString is a struct, not a string, to protect its representation from inadvertent(or premeditated) updates. And the reason that the pointer type *errorString, not errorString alone, satisfies the error interface is so that every call to New allocates a distinct error instance that is equal to no other. We would not want a distinguished error such as io.EOF to compare equal to one that merely happened to have the same message.

fmt.Println(errors.New("EOF") == errors.New("EOF")) // "false"

Calls to errors.New are relatively infrequent because there's a convenient wrapper function, fmt.Errorf, that does string formatting too:

package fmt
import "errors"

func Errorf(format string, args ...interface{}) error {
    return errors.New(Sprintf(format, args...))
}

Type Assertions

A type assertion is an operation applied to an interface value. Syntactically, it looks like x.(T), where x is an expression of an interface type and T is a type, called the asserted type. A type assertion checks that the dynamic type of its operand matches the asserted type.

There are two possibilities. First, if the asserted type T is a concrete type, then the type assertion checks whether x's dynamic type is identical to T. If this check succeeds, the result of the type assertion is x's dynamic value, whose type is of course T. In other words, a type assertion to a concrete type extracts the concrete value from its operand. If the check fails, then the operation panic.

Second, if instead the asserted type T is an interface type, then the type assertion checks whether x's dynamic type satisfies T. If this check succeeds, the dynamic value is not extracted; the result is still an interface value with the same type and value components, but the result has the interface type T. In other words, a type assertion to an interface type changes the type of the expression, making a different(and usually larger) set of methods accessible, but it preserves the dynamic type and value components inside the interface value.

Often we're not sure of the dynamic type of an interface value, and we'd like to test whether it is some particular type. If the type assertion appears in an assignment in which two results are expected, such as the following declarations, the operation does not panic on failure but instead returns an addition second result, a boolean indicating success:

var w io.Writer = os.Stdout
f, ok := w.(*os.File) // success: ok, f == os.Stdout
b, ok := w.(*bytes.Buffer) // failure: !ok, b == nil

The second result is conventionally assigned to a variable named ok. If the operation failed, ok is false, and the first result is equal to the zero value of the asserted type, which in this example is a nil *bytes.Buffer.

The ok result is often immediately used to decide what to do next. The extended form of the if statement makes this quite compact:

if f, ok := w.(*os.File); ok {
    // ...use f...
}

When the operand of a type assertion is a variable, rather than invent another name for the new local variable, you'll sometimes see the original name reused, shadowing the original, like this:

if w, ok := w.(*os.File); ok {
    // ...use w...
}

Type Switches

Interfaces are used in two distinct styles. In the first style, examplified by io.Reader, io.Writer, fmt.Stringer, sort.Interface, http.Handler, and error, an interface's methods express the similarities of the concrete types that satisfy the interface but hide the representation details and intrinsic operations of those concrete types. The emphasis in on the methods, not no the concrete types.

The second style exploits the ability of an interface value to hold values of a variety of concrete types and considers the interface to be the union of those types. Type assertions are used to discriminate among these types dynamically and treat each case differently. In this style, the emphasis is on the concrete types that satisfy the interface, not on the interface's methods(if indeed it has any), and there is no hiding of information. We'll describe interfaces used this way as discriminated unions.

A switch statement simplifies an if-else chain that performs a series of value equality tests. An analogous type switch statement simplifies an if-else chain of type assertions.

In its simplest form, a type switch looks like an ordinary switch statement in which the operand is x.(type) — that's literally the keyword type — and each case has one or more types. A type switch enables a multi-way branch based on the interface value's dynamic type. The nil case matches if *x == nil *, and the default case matches if no other case does. A type switch look like this:

switch x.(type) {
case nil: // ...
case int, uint: // ...
case bool: // ...
case string: // ...
default: // ...
}

If some logic such as the bool and string cases needs access to the value extracted by the type assertion, since this is typical, the type switch statement has an extended form that binds the extracted value to a new variable within each case:

switch x := x.(type) { /* ... */ }

A Few Words of Advice

When designing a new package, novice Go programmers often start by creating a set of interfaces and only later define the concrete types that satisfy them. This approach results in many interfaces, each of which has only a single implementation. Don't do that. Such interfaces are unnecessary abstractions; they also have a run-time cost. You can restrict which methods of a type or fields of a struct are visible outside a package using the export mechanism. Interfaces are only needed when there are two or more concrete types that must be dealt with in a uniform way.

We make an exception to this rule when an interface is satisfied by a single concrete type but that type cannot live in that same package as the interface because of its dependencies. In that case, an interface is a good way to decouple two packages.

Because interfaces are used in Go only when they are satisfied by two or more types, they necessarily abstract away from the details of any particular implementation. The result is smaller interfaces with fewer, simpler methods, often just one as with io.Writer or fmt.Stringer. Small interfaces are easier to satisfy when new types come along. A good rule of thumb for interface design is ask only for what you need.

This concludes our tour of methods and interfaces. Go has great support for the object-oriented style of programming, but this does not mean you need to use it exclusively. Not everything need be an object; standalone functions have their place, as do unencapsulated data types.

Goroutines and Channels

Go enables two styles of concurrent programming. This chapter presents goroutines and channels, which support communicating sequential processes or CSP, a model of concurrency in which values are passed between independent activites(goroutines) but variables are for the most part confined to a single activity. Next chapter covers some aspects of the most traditional model of shard memory multithreading, which will be familiar if you've used threads in other mainstream language.

Goroutines

In Go, each concurrently executing activity is called a goroutine.

The differences between threads and goroutines are essentially quantiative, not qualitative.

When a program starts, it's only goroutine is the one that calls the main function, so we call it the main goroutine. New goroutines are created by the go statement. Syntactically, a go statement is an ordinary function or method call prefixed by the keyword go. A go statement causes the function to be called in a newly created goroutine. The go statement itself completes immediately:

f() // call f(); wait for it to return
go f() // create a new goroutine that calls f(); don't wait

In the example below, the main goroutine computes the 45th Fibonacci number. Since it uses the terribly inefficient recursive algorithm, it runs for an appreciable time, during which we'd like to provide the user with a visual indication that the program is still running, by displaying an animated textual "spinner".

package main

import (
    "fmt"
    "time"
)

func main() {
    go spinner(100 * time.Millisecond)
    const n = 45
    fibN := fib(n) // slow
    fmt.Printf("\rFibonacci(%d) = %d\n", n, fibN)
}

func spinner(delay time.Duration) {
    for {
        for _, r := range `-\|/` {
            fmt.Printf("\r%c", r)
            time.Sleep(delay)
        }
    }
}

func fib(x int) int {
    if x < 2 {
        return x
    }
    return fib(x-1) + fib(x-2)
}

After several seconds of animation, the fib(45) call returns and the main function prints its result: Fibonacci(45) = 1134903170.

The main function then returns. When this happens, all goroutines are abruptly terminated and the program exits. Other than by returning from main or exiting the program, there is no programmatic way for one goroutine to stop another, but as we will see later, there are ways to communicate with a goroutine to request that it stop itself.

Notice how the program is expressed as the composition of two autonomous activities, spinning and Fibonacci computation. Each is written as a separate function but both make progress concurrently.

Example: Concurrent Clock Server

Networking is a natural domain in which to use concurrency since servers typically handle many connections from their clients at once, each client being essentially independent of the others. In this section, we'll introduce the net package, which provides the components for building networked client and server programs that communicate over TCP, UDP, or Unix domain sockets.

Our first example is a sequential clock server that writes the current time to the client once per second:

// Clock1 is a TCP server that periodically writes the time
package main

import (
    "io"
    "log"
    "net"
    "time"
)

func main() {
    listener, err := net.Listen("tcp", "localhost:8000")
    if err != nil {
        log.Fatal(err)
    }
    for {
        conn, err := listener.Accept()
        if err != nil {
            log.Print(err) // e.g., connection aborted
            continue
        }
        handleConn(conn) // handle one connetion at a time
    }
}

func handleConn(c net.Conn) {
    defer c.Close()
    for {
        _, err := io.WriteString(c, time.Now().Format("15:04:05\n"))
        if err != nil {
            return // e.g., client disconnected
        }
        time.Sleep(1 * time.Second)
    }
}

The Listen function creates a net.Listener, an object that listens for incoming connections on a network port, in this case TCP port localhost:8000. The listener's Accept method blocks until an incoming connection request is made, then return a net.Conn object representing the connection.

The handleConn function handles one complete client connection. In a loop, it writes the current time, time.Now(), to the client. Since net.Conn satisfies the io.Writer interface, we can write directly to it. The loop ends when the write fails, most likely because the client has disconnected, at which point handleConn closes its side of the connection using a deferred call to Close and go back to waiting for another connection request.

To connect to the server, we'll need a client program such as nc("netcat"), a standard utility program for manipulating network connections:

Star-Wars:ch8 Jiang_Hu$ go run clock1.go &
[1] 25398
Star-Wars:ch8 Jiang_Hu$ nc localhost 8000
15:37:13
15:37:14
15:37:15
15:37:16
15:37:17
15:37:18
^C
Star-Wars:ch8 Jiang_Hu$

The client display the time sent by the server each second until we interrupt the client with Control-C, which on Unix systems is echoed as ^C by the shell. If nc or netcat is not installed on your system, you can use telnet or this simple Go version of netcat that uses net.Dial to connect to a TCP server:

// Netcat1 is a read-only TCP client.
package main

import (
    "io"
    "log"
    "net"
    "os"
)

func main() {
    conn, err := net.Dial("tcp", "localhost:8000")
    if err != nil {
        log.Fatal(err)
    }
    defer conn.Close()
    mustCopy(os.Stdout, conn)
}

func mustCopy(dst io.Writer, src io.Reader) {
    if _, err := io.Copy(dst, src); err != nil {
        log.Fatal(err)
    }
}

This program reads data from the connection and writes it to the standard output until an end-of-file condition or an error occurs. The mustCopy function is a utility used in several examples in this section. Let's run two clients at the same time on different terminals, one shown to the left and one to the right:

$ ./netcat1
13:58:54                                    $ ./netcat1
13:58:55
13:58:56
^C       
                                            13:58:58
                                            13:58:59
                                            13:59:00
                                            ^C

$ killall clock1

The killall command is a Unix utility that kills all process with the given name.

The second client must wait until the first client is finished because the server is sequential; it deals with only one client at a time. Just one small change is needed to make the server concurrent: adding the go keyword to the call to handleConn causes each call to run in its own goroutine.

for {
    conn, err := listener.Accept()
    if err != nil {
        log.Print(err) // e.g., connection aborted
        continue
    }
    go handleConn(conn) // handle connections concurrently
}

Now, multiple clients can receive the time at once.

Example: Concurrent Echo Server

The clock server used one goroutine per connection. In the section, we'll build an echo server that uses multiple goroutins per connection. Most echo servers merely write whatever they read, which can be done with this trivial version of handleConn:

func handleConn(c net.Conn) {
    io.Copy(c, c) // NOTE: ignoring errors
    c.close()
}

A more interesting echo server might simulate the reverberations of a real echo, with the response loud at first ("HELLO!"), then moderate ("Hello!") after a delay, then quiet ("hello!") before fading to nothing, as in this version of handleConn, see reverb1.go:

package main

import (
    "bufio"
    "fmt"
    "log"
    "net"
    "strings"
    "time"
)

func echo(c net.Conn, shout string, delay time.Duration) {
    fmt.Fprintln(c, "\t", strings.ToUpper(shout))
    time.Sleep(delay)
    fmt.Fprintln(c, "\t", shout)
    time.Sleep(delay)
    fmt.Fprintln(c, "\t", strings.ToLower(shout))
}

func handleConn(c net.Conn) {
    input := bufio.NewScanner(c)
    for input.Scan() {
        echo(c, input.Text(), 1*time.Second)
    }
    // NOTE: ignoring potential errors from input.Err()
    c.Close()
}

func main() {
    l, err := net.Listen("tcp", "localhost:8000")
    if err != nil {
        log.Fatal(err)
    }
    for {
        conn, err := l.Accept()
        if err != nil {
            log.Print(err) // e.g., connection aborted
        }
        go handleConn(conn)
    }
}

We'll need to upgrade our client program so that it sends terminal input to the server while also copying the server response to the output, which presents another opportunity to use concurrency in netcat2.go:

// Netcat1 is a read-only TCP client.
package main

import (
    "io"
    "log"
    "net"
    "os"
)

func main() {
    conn, err := net.Dial("tcp", "localhost:8000")
    if err != nil {
        log.Fatal(err)
    }
    defer conn.Close()
    go mustCopy(os.Stdout, conn)
    mustCopy(conn, os.Stdin)
}

func mustCopy(dst io.Writer, src io.Reader) {
    if _, err := io.Copy(dst, src); err != nil {
        log.Fatal(err)
    }
}

While the main goroutine reads the standard input and sends it to the server, a second goroutine reads and prints the server's response.

In the session below, the client's input is left-aligned and the server's responses are indented.

The client should at the echo server three times:

$ ./reverb1 &
$ ./netcat2
Hello?
    HELLO?
    Hello?
    hello?
Is there anybody there?
    IS THERE ANYBODY THERE?
Yooo-hooo!
    Is there anybody there?
    is there anybody there?
    YOOO-HOOO!
    Yooo-hooo!
    yooo-hooo!
^D
$ killall reverb1

Notice that the third shout (Yooo-hooo!) from the client is not dealt with until the second shout has petered out, which is not very realistic. A real echo would consist of the compostiont of the three independent shouts. To simulate it, we'll need more goroutines. Again, all we need to do is add the go keyword, this time to call to echo:

func handleConn(c net.Conn) {
    input := bufio.NewScanner(c)
    for input.Scan() {
        go echo(c, input.Text(), 1*time.Second)
    }
    // NOTE: ignoring potential errors from input.Err()
    c.Close()
}

The arguments to the function started by go are evaluated when the go statement itself is executed; thus input.Text() is evaluated in the main goroutine.

Now the echoes are concurrent and overlap in time:

$ ./reverb2 &
$ ./netcat2
Is there anybody there?
    IS THERE ANYBODY THERE?
Yooo-hooo!
    Is there anybody there?
    YOOO-HOOO!
    is there anybody there?
    Yooo-hooo!
    yooo-hooo!
^D

All that was required to make the server use concurrency, not just to handle connections from multiple clients but even within a single connection, was the insertion of two go keywords.

However in adding these keywords, we had to consider carefully that it is safe to call methods of net.Conn concurrently, which is not true for most type. We'll discuss the crucial concept of concurrency safety in the next chapter.

Channels

If goroutines are the activities of a concurrent Go program, channels are the connections between them. A channel is a communication mechanism that lets one goroutine send values to another goroutine. Each channel is a conduit for values fo a particular type, called the channel's element type. The type of a channel whose elements have type int is written chan int.

To create a channel, we use the built-in make function:

ch := make(chan int) // ch has type 'chan int'

As with maps, a channel is a reference to the data structure created by make. When we copy a channel or pass one as an argument to a function, we are copying a reference, so caller and callee refer to the same data structure. As with other reference types, the zero value of a channel is nil.

Two channels of the same type may be compared using ==. The comparsion is true if both are reference to the same channel data structure. A channel may also be compared to nil.

A channel has two principal operations, send and receive, collectively known as communications. A Send statement transmits a value from one goroutine, through the channel, to another goroutine executing a corresponding receive expression. Both operations are written using the <- operator. In a send statement, the <- separates the channel and value operands. In a receive expression, <- precedes the channel operand. A receive expression whose result is not used is a valid statement.

ch <- x // a send statement

x = <-ch // a receive expression in an assignment statement
<-ch // a receive statement; result is discarded

Channels support a third operation, close, which sets a flag indicating that no more values will ever be sent on this channel; subsequent attempts to send will panic. Recieve operations on a closed channel yield the values that have been sent until no more values are left; any receive operations thereafter complete immediately and yield the zero value of the channel's element type.

To close a channel, we call the built-in close function:

close(ch)

A channel created with a simple call to make is called an unbuffered channel, but make accepts an optional second argument, an integer called the channel's capacity. If the capacity is non-zero, make creates a buffered channel.

ch = make(chan int) // unbuffered channel
ch = make(chan int, 0) // unbuffered channel
ch = make(chan int, 3) // buffered channel with capacity 3

Unbuffered Channels

A send operation on an unbuffered channel blocks the sending goroutine until another goroutine executes a corresponding receive on the same channel, at which point the value is transmitted and both goroutines may continue. Conversely, if the receive operation was attempted first, the receiving goroutine is blocked until another goroutine performs a send on the same channel.

Communication over an unbuffered channel cause the sending and receiving goroutines to synchronize. Because of this, unbuffered channels are sometimes called synchronous channels. When a value is sent on an unbuffered channel, the receipt of the value happens before the reawakening of the sending goroutine.

In discussions of concurrency, when we say x happens before y, we don't mean merely that x occurs earlier in time than y; we mean that it is guranteed to do so and that all its prior effects, such as updates to variables, are complete and that you may rely on them.

When x neither happens before y nor after y, we say that x is concurrent with y. This doesn't mean that x and y are necessarily simultaneous, merely that we cannot assume anything about their ordering. As we'll see in the next chapter, it's necessary to order certain events during the program's execution to avoid the problems that arise when two goroutines access the same variable concurrently.

The client program in netcat2.go copies input to the server in its main goroutine, so the client program terminates as soon as the input stream closes, even if the background goroutine is still working. To make the program wait for the background goroutine to complete before exiting, we use a channel to synchronize the two goroutines:

func main() {
    conn, err := net.Dial("tcp", "localhost:8000")
    if err != nil {
        log.Fatal(err)
    }
    done := make(chan struct{})
    go func() {
        io.Copy(os.Stdout, conn) // NOTE: ignoring errors
        log.Println("done")
        done <- struct{}{} // signal the main goroutine
    }()
    mustCopy(conn, os.Stdin)
    conn.Close()
    <-done // wait for background goroutine to finish
}

When the user closes the standard input stream, mustCopy returns and the main goroutine calls conn.Close(), closing both halves of the network connection. Closing the write half of the connection causes the server to see an end-of-file condition. Closing the read half causes the background goroutine's call to io.Copy to return a "read from closed connection" error.

Before it returns, the background goroutine logs a message, then sends a value on the done channel. The main goroutine waits until it has received this value before returning. As a result, the porgram always logs the "done" message before exiting.

Message sent over channels have two important aspects. Each message has a value, but sometimes the fact of communication and the moment at which it occurs are just as important. We call messages events when we wish to stress this aspect. When the event carries no additional information, that is, its sole purpose is synchronization, we'll emphasize this by using a channel whose element type is struct{}, though it's common to use a channel of bool or int for the same purpose since done <- 1 is shorter than done <- struct{}{}.

Pipelines

Channels can be used to connect goroutines together so that the output of one is the input to another. This is called a pipeline. The program below consists of three goroutines connected by two channels, as show schematically in below figure:

corresponding program:

// Pipeline1 demonstrates an infinite 3-stage pipeline.
package main

import (
    "fmt"
)

func main() {
    naturals := make(chan int)
    squares := make(chan int)

    // Counter
    go func() {
        for x := 0; ; x++ {
            naturals <- x
        }
    }()

    // Squarer
    go func() {
        for {
            x := <-naturals
            squares <- x * x
        }
    }()

    // Printer (in main goroutine)
    for {
        fmt.Println(<-squares)
    }
}

As you might expect, the program prints the infinite series of squares 0, 1, 4, 9 and so on. But what if we want to send only a finite number of values through the pipeline?

If the sender knows that no further values will ever be sent on a channel, it is useful to communicate this fact to the receiver goroutines so that they can stop waiting. This is accomplished by closing the channel using the built-in close function: close(naturals).

After a channel has been closed, any further send operations on it will panic. After the closed channel has been drained, that is, after the last sent element has been received, all subsequent receive operations will proceed without blocking but will yeild a zero value. Closing the naturals channel above would cause the squarer's loop to spin as it receives a never-ending stream of zero values, and to send these zeros to the printer.

There is no way to test directly whether a channel has been closed, but there is a variant of the receive operation that produces two results: the received channel element, plus a boolean value, conventionally called ok, which is true for a successful receive and false for a receive on a closed and drained channel. Using this feature, we can modify the squarer's loop to stop when the naturals channel is drained and close the squares channel in turn.

// Squarer
go func() {
    for {
        x, ok := <-naturals
        if !ok {
            break // channel was closed and drained
        }
        squares <- x * x
    }
    close(squares)
}()

Because the syntax above is clumsy and this pattern is common, the language lets us use a range loop to iterate over channels too. This is a more convenient syntax for receiving all the values sent on a channel and terminating the loop after the last one.

In the pipeline below, when the counter goroutine finishes its loop after 100 elements, it closes the naturals channel, causing the squarer to finish its loop and close the squares channel.(In a more complex program, it might make sense for the counter and squarer functions to defer the calls to close at the outset.) Finally, the main goroutine finishes its loop and the program exits.

package main

import (
    "fmt"
)

func main() {
    naturals := make(chan int)
    squares := make(chan int)

    // Counter
    go func() {
        for x := 0; x < 100; x++ {
            naturals <- x
        }
        close(naturals)
    }()

    // Squarer
    go func() {
        for x := range naturals {
            squares <- x * x
        }
        close(squares)
    }()

    // Printer (in main goroutine)
    for x := range squares {
        fmt.Println(x)
    }
}

You needn't close every channel when you've finished with it. It's only necessary to close a channel when it is important to tell the receiving goroutines that all data have been sent. A channel that the garbage collector determines to be unreachable will have its resources reclaimed whether or ont it is closed. (Don't confuse this with the close operation for open files. It is important to call the Close method on every file when you've finished with it.)

Attempting to close an already-closed channel causes a panic, as does closing a nil channel.

Unidirectional Channel Types

The Go type system provides unidirectional channel types that expose only one or the other of the send and receive operations. The type chan<- int, a send-only channel of int, allows sends but not receives. Conversely, the type <-chan int, a receive-only channel of int, allows receives but not sends. (The position of the <- arrow relative to the chan keyword is a mnemonic.) Violations of this discipline are detected at compile time.

Since the close operation asserts that no more sends will occur on a channel, only the sending goroutine is in a position to call it, and for this reason it is a compile-time error to attempt to close a receive-only channel.

Here's the squaring pipeline once more, this time with unidirectional channel types:

package main

import (
    "fmt"
)

func main() {
    naturals := make(chan int)
    squares := make(chan int)

    go counter(naturals)
    go squarer(squares, naturals)
    printer(squares)
}

func counter(out chan<- int) {
    for x := 0; x < 100; x++ {
        out <- x
    }
    close(out)
}

func squarer(out chan<- int, in <-chan int) {
    for v := range in {
        out <- v * v
    }
    close(out)
}

func printer(in <-chan int) {
    for v := range in {
        fmt.Println(v)
    }
}

The call counter(naturals) implicitly converts naturals, a value of type chan int, to the type of the parameter, chan<- int. The printer(squares) call does a similar implicit conversion to <-chan int. Conversions from bidirectional to unidirectional channel types are permitted in any assignment. There is no going back, however: once you have a value of a unidirectional type such as chan<- int, there is no way to obtain from it a value of type chan int that refers to the same channel data structure.

Buffered Channels

A buffered channel has a queue of elements. A send operation on a buffered channel inserts an elements at the back of the queue, and a receive operation removes an element from the front. If the channel is full, the send operation blocks its goroutine until space is made available by another goroutine's receive. Conversely, if the channel is empty, a receive operation blocks until a value is sent by another goroutine.

Novices are sometimes tempted to use buffered channels within a single goroutine as a queue, lured by their pleasingly simple syntax, but this is a mistake. Channels are deeply connected to goroutine scheduling, and without another goroutine receiving from the channel, a sender—and perhaps the whole program—risks becoming blocked forever. If all you need is a simple queue, make one using a slice.

Multiplexing with select

The general form of a select statement like a switch statement, it has a number of cases and an optional default. Each case specifies a communication (a send or receive operation on some channel) and an associated block of statements.

A select waits until a communication for some case is ready to proceed. It then performs that communication and executes the case's associated statement; the other communications do not happen. A select with no case, select{}, wait forever.

If multiple cases are ready, select picks one at random, which ensures that every channel has an equal chance of being selected.

The zero value for a channel is nil. Perhaps surprisingly, nil channels are sometimes useful. Because send and receive operations on a nil channel block forever, a case in a select statement whose channel is nil is never selected. This lets us use nil to enable or disable cases that correspond to features like handling timeouts or cancellation, responding to other input events, or emitting output.

Cancellation

Sometimes we need to instruct a goroutine to stop what it is doing, for example, in a web server performing a computation on behalf of a client that has disconnected.

There is no way for one goroutine to terminate another directly, since that would leave all its shared variables in undefined states.

Recall that after a channel has been closed and drained of all sent values, subsequent receive operations proceed immediately, yielding zero values. We can exploit this to create a broadcast mechanism: don't send a value on the channel, close it.

Concurrency with Shared Variables

In this chapter, we'll take a closer look at the mechanics of concurrency. In particular, we'll point out some of the problems associated with sharing variables among multiple goroutines, the analytical techniques for recognizing those problems, and the patterns for solving them. Finally, we'll explain some of the technical differences between goroutines and operating system threads.

Rece Conditions

In general we don't know whether an even x in one goroutine happens before an event y in another goroutine, or happens after it, or is simultaneous with it. When we cannot confidently say that one event happens before the other, then the events x and y are concurrent.

Consider a function that works correctly in a sequential program. That function is concurrency-safe if it continues to work correctly even when called concurrently, that is, from two or more goroutines with no additional synchronization. We can generalize this notion to a set of collaborating functions, such as the method and operations of a particular type. A type is concurrency-safe if all its accessible methods and operations are concurrency-safe.

We avoid concurrent access to most variablee either by confining them to a single goroutine or by maintaining a higher-level invariant of mutual exclusion. In contrast, exported package-level functions are generally expected to be concurrency-safe. Since package-level variables cannot be confined to a single goroutine, functions that modify them enforce mutual exclusion.

There are many reasons a function might not work when called concurrently, including deadlock, livelock, and resource starvation. We don't have space to discuss all of them, so we'll focus on the most important one, the race condition.

A particular kind of race condition called a data race. A data race occurs whenever two goroutines access the same variable concurrently and at least one of the accesses is a write. It follows from this definition that there are three ways to avoid a data race.

The first way is not to write the variable. Data structures that are never modified or are immutable are inherently concurrency-safe and need no synchronization.

The second way to avoid a data race is to avoid accessing the variable from multiple goroutines.

The third way to avoid a data race is to allow many goroutines to access the variable, but only one at a time. This approach is known as mutual exclusion and is the subject of the next section.

Mutual Exclusion: sync.Mutex

We can use a channel of capacity 1 to ensure that at most one goroutine accesses a shared variable at a time. A semaphore that counts only to 1 is called a binary semaphore.

var (
    sema = make(chan struct{}, 1) // a binary semaphore guarding balance
    balance int
)

func Deposit(amount int) {
    sema <- struct{}{} // acquire token
    balance = balance + amount
    <- sema // release token
}

func Balance() int {
    sema <- struct{}{} // acquire token
    b := balance
    <- sema // release token
    return b
}

This pattern of mutual exclusion is so useful that it is supported directly by the Mutex type from the sync package. Its Lock method acquires the token(called a Lock) and its Unlock method releases it:

var (
    mu sync.Mutex // guards balance
    balance int
)

func Deposit(amount int) {
    mu.Lock()
    balance = balance + amount
    mu.Unlock()
}

func Balance() int {
    mu.Lock()
    b := balance
    mu.Unlock()
    return b
}

Each function acquires a mutex lock at the beginning and releases it at the end, thereby ensuring that the shared variables are not accessed concurrently.

In more complex critical sections, especially those in which errors must be dealt with by returning early, it can be hard to tell that calls to Lock and Unlock are strictly paired on all paths. Go's defer statement comes to the rescue: by deferring a call to Unlock, the critical section implicitly extends to the end of the current function, freeing us from having to remember to insert Unlock calls in one or more places far from the call to Lock.

func Balance() int {
    mu.Lock()
    defer mu.Unlock()
    return balance
}

In the example above, the Unlock execute after the return statement has read the value of balance, so the Balance function is concurrency-safe. As a bonus, we no longer need the local variable b.

Furthermore, a deferred Unlock will run even if the critical section panics, which may be important in porgrams that make use of recover. A defer is marginally more expensive than an explicit call to Unlock, but not enough to justify less clear code. As always with concurrent programs, favor clarity and resist premature optimization. Where possible, use defer and let critical sections extend to the end of a function.

Go's mutexes are not re-entrant—it's not possible to lock a mutex that's already locked—this leads to a deadlock where nothing can proceed.

Encapsulation, by reducing unexpected interactions in a program, helps us maintain data structure invariants. For the same reason, encapsulation also helps us maintain concurrency invariants. When you use a mutex, make sure that both it and the variables it guards are not exported, whether they are package-level variables or the fields of a struct.

Read/Write Mutexes: sync.RWMutex

Since the Balance function only needs to read the state of the variable, it would in fact be safe for multiple Balance calls to run concurrently, so long as no Deposit call is running. In this scenario we need a special kind of lock that allows read-only operations to proceed in parallel with each other, but write operations to have fully exclusive access. This lock is called a multiple readers, single writer lock, and in Go it's provided by *sync.RWMutex:

var (
    mu sync.RWMutex
    balance int
)

func Balance() int {
    mu.RLock() // readers lock
    defer mu.RUnlock()
    return balance
}

The Balance function now calls the RLock and RUnlock methods to acquire and release a readers or shared lock. The Deposit function, which is unchanged, calls the mu.Lock and mu.Unlock methods to acquire and release a writer or exclusive lock.

It's only profitable to use an RWMutex when most of the goroutines that acquire the lock are readers, and the lock is under contention, that is, goroutines routinely have to wait to acquire it. An RWMutex requires more complex internal bookkeeping, making it slower than a regular mutex for uncontended locks.

Memory Synchronization

Synchronization is about more than just the order of execution of multiple goroutines; synchronization also affects memory.

In a modern computer there may be dozens of processors, each with its own local cache of the main memory. For efficiency, writes to memory are buffered within each processor and flushed out to main memory only when necessary. They may even be committed to main memory in a different order than they were written by writing goroutine. Synchronization primitives like channel communications and mutex operations cause the processor to flush out and commit all its accumulated writes so that the effects of goroutine execution up to that point are guaranteed to be visible to goroutines running on other processors.

If the two goroutines execute on different CPUs, each with its own cache, writes by one goroutine are not visible to the other goroutine's Print until the caches are synchronized with main memory.

All these concurrency problems can be avoided by the consistent use of simple, established patterns. Where possible, confine variables to a single goroutine; for all other variables, use mutual exclusion.

Lazy Initialization: sync.Once

It is good practice to defer an expensive initialization step until the moment it is needed. Initializing a variable up front increases the start-up latency of a program and is unnecessary if execution doesn't always reach the part of the program that uses that variables. Such when we run our program, maybe we need to load some image uses lazy initialization:

var icons map[string]image.Image

func loadIcons() {
    icons = map[string]image.Image {
        "spades.png" : loadIcon("spades.png"),
        "hearts.png" : loadIcon("hearts.png"),
        "diamonds.png" : loadIcon("diamonds.png")
        "clubs.png" : loadIcon("clubs.png")
    }
}

// NOTE: not concurrency-safe!
func Icon(name string) image.Image {
    if icons == nil {
        loadIcons() // one-time initialization
    }
    return icons[name]
}

For a variable accessed by only a single goroutine, we can use the pattern above, but this partern is not safe if Icon is called concurrently. Such as, a goroutine finding icons to be non-nil may may not assume that initialization of the variable is complete.

The simplest correct way to ensure that all goroutines observe the effects of loadIcons is to synchronize them using a mutex:

var mu sync.Mutex // guards icons
var icons map[string]image.Image

// Concurrency-safe.
func Icon(name string) image.Image {
    mu.Lock()
    defer mu.Unlock()
    if icons == nil {
        loadIcons()
    }
    return icons[name]
}

However, the cost of enforcing mutually exclusive access to icons is that two goroutines cannot access the variable concurrently, even once the variable has been safely initialized and will never be modified again. This suggest a multiple-readers lock:

var mu sync.Mutex // guards icons
var icons map[string]image.Image

// Concurrency-safe.
func Icon(name string) image.Image {
    mu.RLock()
    if icons != nil {
        icon := icons[name]
        mu.RUnlock()
        return icon
    }
    mu.RUnlock()

    // acquire an exclusive lock
    mu.Lock()
    if icons == nil { // NOTE: must recheck for nil
        loadIcons()
    }
    icon := icons[name]
    mu.Unlock()
    return icon
}

There is no way to upgrade a shared lock to an exclusive one without first releasing the shared lock, so we must recheck the icons variable in case another goroutine already initialized it in the interim.

The pattern above gives us greater concurrency but is complex and thus error-prone. Fortunately, the sync package provides a specialized solution to the problem of noe-time initialization: sync.Once. Conceptually, a Once consists of a mutex and a boolean variable that records whether initialization has taken place; the mutex guards both the boolean and the client's data structures. The sole method, Do, accept the initialization function as its argument. Let's use Once to simplify the Icon function:

var loadIconsOnce sync.Once
var icons map[string]image.Image

// Concurrency-safe.
func Icon(name string) image.Image {
    loadIconsOnce.Do(loadIcons)
    return icons[name]
}

Each call to Do(loadIcons) locks the mutex and checks the boolean variable. In the first call, in which the variable is false, Do calls loadIcons and sets the variable to true. Subsequent calls do nothing, but the mutex synchronization ensures that effects of loadIcons on memory(specifically, icons) become visible to all goroutines. Using sync.Once in this way, we can avoid sharing variables with other goroutines until they have been properly constructed.

The Race Detector

Even with the greatest of care, it's all too easy to make concurrency mistakes. Fortunately, the Go runtime and toolchain are equipped with a sophisiticated and easy-to-use dynamic analysis tool, the race detector.

Just add the -race flag to your go build, go run, or go test command. This causes the compiler to build a modified version of your application or test with additional instrumentation that effectively records all accesses to shared variables that occurred during execution, along with the identity of the goroutine that read or wrote the variable.

Example: Concurrent Non-Blocking Cache

In this section, we'll build a concurrent non-blocking cache, an abstraction that solves a problem that arises often in real-world concurrent programs but is not well addressed by existing libraries. This is the problem of memoizing a function, that is, caching the result of a function so that it need be computed only once. Our solution will be concurrency-safe and will avoid the contention associated with designs based on a single lock for the whole cache.

Whole programs has three files: memo.go, memotest.go, memo_test.go:

memo.go:

package memo

import "sync"

// Func is the type of the function to memoize.
type Func func(string) (interface{}, error)

type result struct {
    value interface{}
    err error
}

type entry struct {
    res result
    ready chan struct{} // closed when res is ready
}

func New(f Func) *Memo {
    return &Memo{f: f, cache: make(map[string]*entry)}
}

type Memo struct {
    f Func
    mu sync.Mutex // guards cache
    cache map[string]*entry
}

func (memo *Memo) Get(key string) (value interface{}, err error) {
    memo.mu.Lock()
    e := memo.cache[key]
    if e == nil {
        // This is the first request for this key.
        // This goroutine becomes responsible for computing
        // the value and broadcasting the ready condition.
        e = &entry{ready: make(chan struct{})}
        memo.cache[key] = e
        memo.mu.Unlock()

        e.res.value, e.res.err = memo.f(key)

        close(e.ready) // broadcast ready condition
    } else {
        // This is a repeat request for this key.
        memo.mu.Unlock()

        <-e.ready // wait for ready condition
    }
    return e.res.value, e.res.err
}

memotest.go:

package memotest

import (
    "fmt"
    "io/ioutil"
    "log"
    "net/http"
    "sync"
    "testing"
    "time"
)

func httpGetBody(url string) (interface{}, error) {
    resp, err := http.Get(url)
    if err != nil {
        return nil, err
    }
    defer resp.Body.Close()
    return ioutil.ReadAll(resp.Body)
}

var HTTPGetBody = httpGetBody

func incomingURLs() <-chan string {
    ch := make(chan string)
    go func() {
        for _, url := range []string{
            "https://golang.org",
            "https://godoc.org",
            "https://play.golang.org",
            "http://gopl.io",
            "https://golang.org",
            "https://godoc.org",
            "https://play.golang.org",
            "http://gopl.io",
        } {
            ch <- url
        }
        close(ch)
    }()
    return ch
}

type M interface {
    Get(key string) (interface{}, error)
}

func Sequential(t *testing.T, m M) {
    for url := range incomingURLs() {
        start := time.Now()
        value, err := (m).Get(url)
        if err != nil {
            log.Print(err)
            continue
        }
        fmt.Printf("%s, %s, %d bytes\n",
            url, time.Since(start), len(value.([]byte)))
    }
}

func Concurrent(t *testing.T, m M) {
    var n sync.WaitGroup
    for url := range incomingURLs() {
        n.Add(1)
        go func(url string) {
            defer n.Done()
            start := time.Now()
            value, err := m.Get(url)
            if err != nil {
                log.Print(err)
                return
            }
            fmt.Printf("%s, %s, %d bytes\n",
                url, time.Since(start), len(value.([]byte)))
        }(url)
    }
    n.Wait()
}

memo_test.go

package memo_test

import (
    "testing"

    "../memo"
    "../memotest"
)

var httpGetBody = memotest.HTTPGetBody

func Test(t *testing.T) {
    m := memo.New(httpGetBody)
    memotest.Sequential(t, m)
}

// NOTE: not concurrency-safe! Test fails.
func TestConcurrent(t *testing.T) {
    m := memo.New(httpGetBody)
    memotest.Concurrent(t, m)
}

Execute output is:

Star-Wars:memo Jiang_Hu$ go test -v -race memo_test.go
=== RUN Test
https://golang.org, 2.682292625s, 6172 bytes
https://godoc.org, 1.575156858s, 6835 bytes
https://play.golang.org, 1.605538777s, 5821 bytes
http://gopl.io, 3.815152336s, 4154 bytes
https://golang.org, 4.315µs, 6172 bytes
https://godoc.org, 2.212µs, 6835 bytes
https://play.golang.org, 2.242µs, 5821 bytes
http://gopl.io, 1.705µs, 4154 bytes
--- PASS: Test (9.68s)
=== RUN TestConcurrent
https://play.golang.org, 555.878343ms, 5821 bytes
https://play.golang.org, 555.255645ms, 5821 bytes
https://golang.org, 556.575656ms, 6172 bytes
https://golang.org, 555.946009ms, 6172 bytes
https://godoc.org, 557.376009ms, 6835 bytes
https://godoc.org, 556.288988ms, 6835 bytes
http://gopl.io, 1.115608955s, 4154 bytes
http://gopl.io, 1.11506062s, 4154 bytes
--- PASS: TestConcurrent (1.12s)
PASS
ok command-line-arguments 11.832s
Star-Wars:memo Jiang_Hu$

In memotest.go, We use the httpGetBody function as example of the type of function we might want to memoize. It makes an HTTP GET request and reads the request body. Calls to this function are relatively expensive, so we'd like to avoid repeating them unnecessarily.

A Memo instance holds the function f to memoize, of type Func, and the cache, which is a mapping from strings to results. Each result is simply the pair of results returned by a call to f—a value and an error.

We use the testing package to systematically investigate the effect of memorization.

Since HTTP requests are a great opportunity for parallelism, let's use Concurrent function which in memotest.go so that it makes all requests concurrently. The testcase uses a sync.WaitGroup to wait until the last request is complete before returning.

We'd like to avoid the redundant work. This feature is sometimes called duplicate suppression. In memo.go, each map element is a pointer to an entry struct. Each entry contains the memoized result of a call to the function f, as before, but it additionally contains a channel called ready. Just after the entry's result has been set, this channel will be closed, to broadcast to any other goroutines that is it now safe for them to read the result from the *entry.

A call to Get involves acquiring the mutex lock that guards the cache map, looking in the map for a pointer to an existing entry, allocating and inserting a new entry if none was found, then releasing the lock. If there was an existing entry, its value is not necessarily ready yet—another goroutine could still be calling the slow function f—so the calling goroutine must wait for the entry's "ready" condition before it reads the entry's result. It does this by reading a value from the ready channel, since this operation blocks until the channel is closed.

If there was no existing entry, then by inserting a new "not ready" entry into the map, the current goroutine becomes responsible for invoking the slow function, undating the entry, and broadcasting the readiness of the new entry to any other goroutines that might(by then) be waiting for it.

Notice that the variables e.res.value and e.res.err in the entry are shared among multiple goroutines. The goroutine that creates the entry sets their values, and other goroutines read their values once the "ready" condition has been broadcast. Despite being accessed by multiple goroutines, no mutex lock is necessary. The closing of the ready channel happens before any other goroutine receives the broadcast event, so the write to those variables in the first goroutines happens before they read by subsequent goroutines. There is no data race.

Our concurrent, duplicate-suppressing, non-blocking cache is complete.

Goroutines and Threads

Although the differences between them are essentially quantiative, a big enough quantitative difference becomes a qualitative one, and so it is with goroutines and threads. The time has now come to distinguish them.

Growable Stacks

Each OS thread has a fixed-size block of memory(often as large as 2MB) for its stack, the work area where it saves the local variables of function calls that are in progress or temporarily suspended while another function is called. This fixed-size stack is simultaneously too much and too little. A 2MB stack would be a huge waste of memory for a little goroutine, such as one that merely waits for a WaitGroup then closes a channel. It's not uncommon for a Go program to create hundreds of thousands of goroutines at one time, which would be impossible with stacks this large. Yet despite their size, fixed-size stacks are not always big enough for the most complex and deeply recursive of functions. Changing the fixed size can improve space efficiency and allow more threads to be created, or it can enable more deeply recursive functions, but it cannot do both.

In contrast, a goroutine starts life with a small stack, typically 2KB. A goroutine's stack, like the stack of an OS thread, holds the local variables of active and suspended function calls, but unlike an OS thread, a goroutine's stack is not fixed; it grows and shrinks as needed. The size limit for a goroutine stack may be as much as 1GB, orders of magnitude larger than a typical fixed-size thread stack, though of course few goroutines use that much.

Goroutine Scheduling

OS threads are scheduled by the OS kernel. Every few milliseconds, a hardware timer interrupts the processor, which causes a kernel function called the scheduler to be invoked. This function suspends the currently executing thread and saves its registers in memory, looks over the list of threads and decides which one should run next, restores that thread's registers from memory, then resumes the execution of that thread. Because OS threads are scheduled by the kernel, passing control from one thread to another requires a full context switch, that is, saving the state of one user thread to memory, restoring the state of another, and updating the scheduler's data structures. This operation is slow, due to its poor locality and the number of memory accesses required, and has historically only gotten worse as the number of CPU cycles required to access memory has increased.

The Go runtime contains its own scheduler that uses a technique known as m:n scheduling, because it multiplexes(or shedules) m goroutines on n OS threads. The job of the Go scheduler is analogous to that of the kernel scheduler, but it is concerned only with the goroutines of a single Go program.

Unlike the operating system's thread scheduler, the Go scheduler is not invoked periodically by a hardware timer, but implicitly by certain Go language constructs. For example, when a goroutine call time.Sleep or blocks in a channel or mutex operation, the scheduler puts it to sleep and runs another goroutine until it is time to wake the first one up. Because it doesn't need a switch to kernel context, rescheduling a goroutine is much cheaper than rescheduling a thread.

GOMAXPROCS

The Go scheduler uses a parameter called GOMAXPROCS to determine how many OS threads may be actively executing Go code simultaneously. Its default value is the number of CPUs on the machine, so on a machine with 8 CPUs, the scheduler will schedule Go code on up to 8 OS threads at once.(GOMAXPROCS is the n in m:n scheduling.) Goroutines that are sleeping or blocked in a communication do not need a thread at all. Goroutines that are blocked in I/O or other system calls or are calling non-Go functions, do need an OS thread, but GOMAXPROCS need not account for them.

You can explicitly control this parameter using the GOMAXPROCS environment variable or the runtime.GOMAXPROCS function. We can see the effect of GOMAXPROCS on the little program, which prints an endless stream of zeros and ones:

for {
    go fmt.Print(0)
    fmt.Print(1)
}

$ GOMAXPROCS=1 go run hacker-cliché.go
111111111111111111110000000000000000000011111...

$ GOMAXPROCS=2 go run hacker-cliché.go
101010101010101010110011001010110100101001101...

In the first run, at most one goroutine was executed at a time. Initially, it was the main goroutine, which prints ones. After a period of time, the Go scheduler put it to sleep and woke up the goroutine that prints zeros, giving it a turn to run on the OS thread. In the second run, there were two OS threads available, so both goroutines ran simultaneously, printing digits at about the same rate. We must stress that many factors are involved in goroutine scheduling, and the runtime is constantly evolving, so your results may differ from the ones above.

Goroutines Have No Identity

In most operating systems and programming languages that support multithreading, the current thread has a distinct identity that can be easily obtained as an ordinary value, typically an integer or pointer. This makes it easy to build an abstraction called thread-local storage, which is essentially a global map keyed by thread identity, so that each thread can store and retrieve values independent of other threads.

Goroutines have no notion of identity that is accessible to the programmer. This is by design, since thread-local storage tends to be abused. For example, in a web server implemented in a language with thread-local storage, it's common for many functions to find information about the HTTP request on whose behalf they are currently working by looking in that storage. However, just as with programs that rely excessively on global variables, this can lead to an unhealthy "action as a distance" in which the behavior of a function is not determined by its arguments alone, but by the identity of the thread in which it runs. Consequently, if the identity of the thread should change—some worker threads are enlisted to help, say—the function misbehaves mysteriously.

Go encourages a simpler style of programming in which parameters that affect the behavior of a function are explicit. Not only does this make programs easier to read, but it lets us freely assign subtasks of a given function to many different goroutines without worrying about their identity.

You've now learned about all the language features you need for writing Go programs. In the next two chapters, we'll step back to look at some of the practices and tools that support programming in the large: how to structure a project as a set of packages, and how to obtain, build, test, benchmark, profile, document, and share those packages.

Packages and the Go Tool

Go comes with over 100 standard packages that provide the foundations for most applications. The Go community, a thriving ecosystem of package design, sharing, reuse, and improvement, has published many more, and you can find a searchable index of them at http://godoc.org. In this chapter, we'll show how to use existing packages and create new ones.

Go also comes with the go tool, a sophisticated but simple-to-use command for managing workspaces of Go packages. We can use the go tool to download, build, and run example programs. In addition, we'll look at the tool's underlying concepts and tour more of its capabilities, which include printing documentation and querying metadata about the package in the workspace. In the next chapter we'll explore its testing features.

Introduction

The purpose of any package system is to make the design and maintenance of large programs practical by grouping related features together into units that can easily understood and changed, independent of the other packages of the program. This modularity allows packages to be shared and reused by different projects, distributed within an organization, or made available to the wider world.

Each package defines a distinct name space that encloses its identifiers.

Packages also provide encapsulation by controlling which names are visible or exported outside the package.

Import Paths

Each package is identified by a unique string called its import path. Import paths are the string that appear in import declarations.

The Package Declaration

A package declaration is required at the start of every Go source file. For example, every file of the math/rand package starts with package rand, so when you import this package, you can access its members as rand.Int, rand.Float64, and so on.

Import Declarations

A Go source file may contain zero or more import declarations immediately after the package declaration and before the first non-import declaration. Each import declaration may specify the import path of a single package, or multiple packages in a parenthesized list. The two forms below are equivalent but the second form is more common.

import "fmt"
import "os"

import (
    "fmt"
    "os"
)

If we need to import two package whose names are the same, like math/rand and crypto/rand, into a third package, the import declaration must specify an alternative name for at least one of them to avoid a conflict. This is called a renaming import.

import (
    "crypto/rand"
    mrand "math/rand" // alternative name mrand avoids conflict
)

The alternative name affects only the importing file. Other files, even ones in the same package, may import the package using its default name, or a different name.

A renaming import may be useful even when there is no conflict. If the name of the imported package is unwieldy, as is sometimes the case for automatically generated code, an abbraviated name may be more convenient. Choosing an alternative name can help avoid conflict with common local variable names. For example, in a file with many local variables named path, we might import the standard "path" package as pathpkg.

Blank Imports

It is an error to import a package into a file but not refer to the name it defines within that file. However, on occasion we must import a package merely for the side effects of doing so: evaluation of the initializer expressions of its package-level variables and execution of its init function. To suppress the "unused import" error we would otherwise encounter, we must use a renaming import in which the alternative name is _, the blank identifer. As usual, the blank identifier can never be referenced.

import _ "image/png" // register PNG decoder

This is known as a blank import. It is most often used to implement a compile-time mechanism whereby the main program can enable optional features by blank-importing additional packages.

Packages and Naming

In this section, we'll offer some advice on how to follow Go's distinctive conventions for naming packages and their members.

When creating a package, keep its name short, but not so short as to be cryptic.

Be descriptive and unambiguous where possible. For example, don't name a utility package util when a name such as imageutil or ioutil is specific yet still concise. Avoid choosing package names that are commonly used for related local variables, or you may compel the package's client to use renaming imports, as with the path package.

Package names usually take the singular form.

Avoid package names that already have other connotations.

Now let's turn to the naming of package members. Since each reference to a member of another package uses a qualified identifier such as fmt.Println, when designing a package, consider how the two parts of a qualified identifier work together, not the member name alone.

We can identify some common naming patterns. The strings package provides a number of independent functions for manipulating strings:

package strings

func Index(needle, haystack string) int
type Replacer struct { /* ... */ }
func NewReplacer(oldnew ...string) *Replacer
type Reader struct { /* ... */ }
func NewReader(s string) *Reader

The word string does not apper in any of their names. Clients refer to them as strings.Index, strings.Replacer, and so on.

Other package that we might describe as single-type packages, such as html/template and math/rand, expose one principal data type plus its method, and often a New function to create instances.

package rand // "math/rand"
type Rand struct{ /* ... */ }
func New(source Source) *Rand

This can lead to repetition, such as rand.Rand, which is why the names of this packages are often especially short.

At the other extreme, there are packages like net/http that have a lot of names without a lot of structure, because they perform a complicated task. Despite having over twenty types and many more functions, the package's most important members have the simplest names: Get, Post, Handle, Error, Client, Server.

The Go Tool

The rest of this chapter concerns the go tool, which is used for downloading, querying, formatting, building, testing, and installing packages of Go code.

Workspace Organization

The only configuration most users even need is the GOPATH environment variable, which specifies the root of the workspace. When switching to a different workspace, users update the value of GOPATH.

GOPATH has three subdirectories. The src subdirectory holds source code. Each package resides in a directory whose name relative to $GOPATH/src is the package's import path. Observe that a single GOPATH workspace contains multiple version-control repositories beneath src, such as github.com or golang.org. The pkg subdirectory is where the build tools store compiled packages, and the bin subdirectory holds executable programs.

A second environment variable, GOROOT, specifies the root directory of the Go distribution, which provides all the packages of the standard libary. The directory structure beneath GOROOT resembles that of GOPATH, so, for example, the source files of the fmt package reside in the $GOROOT/src/fmt directory. Users never need to set GOROOT since, by default, the go tool will use the location where it was installed.

The go env command prints the effective values of the environment variables relevant to the toolchain, including the default values for the missing ones. GOOS specifies the target operating system(for example, android, linux, darwin, or windows) and GOARCH specifies the target processor architecture, such as amd64, 386, or arm. Although GOPATH is the only variable you must set, the other occasionally appear in our explanations.

Downloading Packages

The go get command can download a single package or an entire subtree or repository using the ... notation. The tool also computes and downloads all the dependencies of the initial packages.

Once go get has downloaded the packages, it builds them and then installs the libraries and commands.

The go get command has support for popular code-hosting site like GitHub, Bitbucket, and Launchpad and can make the appropriate requests to their version-control systems. For less well-known sites, you may have to indicate which version-control protocol to use in the import path, such as Git or Mercurial. Run go help importpath for the details.

If you specify the -u flag, go get will ensure that all packages it visits, including dependencies, are updated to their latest version before being built and installed. Without that flag, packages that already exist locally will not be updated.

The go get -u command generally retrieves the latest version of each package, which is convenient when you're getting started but may be inappropriate for deployed projects, where precise control of dependencies is critical for release hygiene. The usual solution to this problem is to vendor the code, that is, to make a persistent local copy of all the necessary dependencies, and to update this copy carefully and deliberately.

Documenting Packages

Go style strongly encourage good documentation of package APIs. Each declaration of an exported package member and the package declaration itself should be immediately preceded by a comment explaining its purpose and usage.

Go doc comments are always complete sentences, and the first sentence is usually a summary that starts with the name being declared. Function parameters and other identifiers are mentioned without quotation or markup. For example, here's the doc comment for fmt.Fprintf:

// Fprintf formats according to a format specifier and writes to w.
// It returns the number of bytes written and any write error encountered.
func Fprintf(w io.Writer, format string, a ...interface{}) (n int, err error)

A comment immediately preceding a package declaration is considered the doc comment for the package as a whole. There must be only one, though it may appear in any file. Longer package comments may warrant a file of their own; fmt's is over 300 lines. This file is usually called doc.go.

The go doc tool prints the declaration and doc comment of the entity specified on the command line, which may be a package, such as go doc time, or a package member, such as go doc time.Since, or a method, such as go doc time.Duration.Seconds. The tool does not need complete import paths or correct identifier case.

The second tool, confusingly named godoc, servers cross-linked HTML pages that provide the same information as go doc and much more. The godoc server at https://golang.org/pkg covers the standard library.

You can also run an instance of godoc in your workspace if you want to browse your own packages. Visit http://localhost:8000/pkg in your browser while running this command: $ godoc -http :8000.

Internal Packages

The package is the most important mechanism for encapsulation in Go programs. Unexported identifiers are visible only within the same package, and exported identifiers are visible to the world.

Sometimes, though, a middle ground would be helpful, a way to define identifiers that are visible to a small set of trusted packages, but not to everyone.

To address these needs, the go build tool treats a package specially if its import path contains a path segment named internal. Such packages are called internal packages. An internal package may be imported only by another package that is inside the tree rooted at the parent of the internal directory. For example, given the packages below, net/http/internal/chunked can be imported from net/http/httputil or net/http, but not from net/url. However, net/url may import net/http/httputil.

net/http
net/http/internal/chunked
net/http/httputil
net/url

Querying Packages

The go list tool reports information about available packages. In its simplest form, go list tests whether a package is present in the workspace and prints its import path if so:

$ go list golang.org/x/tools/cover
golang.org/x/tools/cover

An argument to go list may contain the "..." wildcard, which matches any substring of a package's import path. We can use it to enumerate all the packages within a Go workspace: "go list ...", or within a specific subtree: "go list net/...", or related to a particular topic: "go list ...xml..."

The go list command obtains the complete metadata for each package, not just the import path, and makes this information available to users or other tools in a variety of formats. The -json flag causes go list to print the entire record of each package in JSON format, such as: $ go list -json fmt.

In this chapter, we've explained all the important subcommands of the go tool — except one. In the next chapter, we'll see how the go test command is used for testing Go programs.

Testing

Programs today are far larger and more complex than the past time, of course, and a great deal of effort has been spent on techniques to make this complexity manageble. Two techniques in particular stand out for their effectiveness. The first is routine peer review of programs before they are deployed. The second, the subject of this chapter, is testing.

Go's approach to testing can seem rather low-tech in comparison. It relies on one command, go test, and a set of conventions for writing test functions that go test can run.

In practice, writing test code is not much different from writing the original program itself. We write short functions that focus on one part of the task. We have to be careful of boundary conditions, think about data structures, and reason about what result a computation should produce from suitable inputs. But this is the same process as writing ordinary Go code.

The go test Tool

The go test subcommand is a test driver for Go packages that are organized according to certain conventions. In a package directory, files whose names end with _test.go are not part of the package ordinarily built by go build but are a part of it when built by go test.

Within *_test.go files, three kinds of functions are treated specially: test, benchmarks, and examples. A test function, which is a function whose name begins with Test, exercises some program logic for correct behavior; go test calls the test function and reports the result, which is either PASS or FAIL. A benchmark function has a name beginning with Benchmark and measures the performance of some operation; go test reports the mean execution time of the operation. And an example function whose name starts with Example, provides machine-checked documentation.

The go test tool scans the *.test.go files for these special functions, generates a temporary main package that calls them all in the proper way, builds and runs it, reports the results, and then cleans up.

Test Functions

Each test file must import the testing package. Test functions have the following signature:

func TestName(t *testing.T) {
    // ...
}

Test function names must begin with Test; the optional suffix Name must begin with a capital letter. The t parameter provides methods for reporting test failures and logging additional information.

A go test(or go build) command with no package arguments operates on the package in the current directory.

The -v flag prints the name and execution time of each test in the package, and the -run flag, whose argument is a regular expression, causes go test to run only those tests whose function name matches the pattern.

The style of table-driven testing is very common in Go. It is straightforward to add new table entries as needed, and since the assertion logic is not duplicated, we can invest more effort in producing a good error message.

Test failure messages are usually of the form "f(x) = y, want z", where f(x) explains the attempted operation and its input, y is the actual result, and z the expected result.

Randomized Testing

Table-driven tests are convenient for checking that a function works on inputs carefully selected to exercise interesting cases in the logic. Another approach, randomized testing, explores a broader range of inputs by constructing inputs at a random.

Since randomized tests are nondeterministic, it is critical that the log of the failing test record sufficient information to reproduce the failure. It may be simpler to log the seed of the pseudo-random number generator than to dump the entire input data structure. Armed with that seed value, we can easily modify the test to reply the failure deterministically.

By using the current time as a source of randomness, the test will explore novel inputs each time it is run, over the entire course of its lifetime. This is especially valuable if your project uses an automated system to run all its tests periodically.

White-Box Testing

One way of categorizing tests is by the level of knoledge they require of the internal workings of the package under test. A black-box test assumes nothing about the package other than what is exposed by its API and specified by its documentation; the package's internals are opaque. In contrast, a white-box test has privileged access to the internal function and data structures of the package and can make observations and changes that an ordinary client cannot. For example, a white-box test can check that the invariants of the package's data types are maintained after every operation. (The name white box is traditional, but clear box would be more accurate.)

The two approaches are complementary. Black-box tests are usually more robust, needing fewer updates as the software evolves. They also help the test author empathize with the client of the package and can reveal flaws in the API desing. In contrast, white-box tests can provide more detailed coverage of the trickier parts of the implementation.

Writing Effective Tests

Many newcomers to Go are surprised by the minimalism of Go's testing framework. Other languages' frameworks provide mechanisms for identifying test functions (often using reflection or metadata), hooks for performing "setup" and "teardown" operations before and after the tests run, and libraries of utility functions for asserting common predicates, comparing values, formatting error messages, and aborting a failed test(often using exceptions). Although these mechanisms can make tests very concise, the resulting tests often seem like they are written in a foreign language. Furthermore, although they may report PASS or FAIL correctly, their manner may be unfriendly to the unfortunate maintainer, with cryptic failure message like "assert: 0 == 1" or page after page of stack traces.

Go's attitude to testing stands in stark contrast. It expects test authors to do most of this work themselves, defining functions to avoid repetition, just as they would for ordinary programs. The process of testing is not one of rote form filling; a test has a user interface too, albeit one whose only users are also its maintainers. A good test does not explode on failure but prints a clear and succinct description of the symptom of the problem, and perhaps other relevant facts about the context. Ideally, the maintainer should not need to read the source code to decipher a test failure. A good test should not give up after one failure but should try to report several errors in a single run, since the pattern of failures may itself be revealing.

The assertion function below compares two values, constructs a generic error message, and stop the program. It's easy to use and it's correct, but when it fails, the error message is almost useless. It does not solve the hard problem of providing a good user interface.

import (
    "fmt"
    "string"
    "testing"
)

// A poor assertion function.
func assertEqual(x, y int) {
    if x != y {
        panic(fmt.Sprintf("%d != %d", x, y))
    }
}

func TestSplit(t *testing.T) {
    words := strings.Split("a:b:c", ":")
    assertEqual(len(words), 3)
    // ...
}

In this sense, assertion function suffer from premature abstraction: by treating the failure of this particular test as a mere difference of two integers, we forfeit the opportunity to provide meaningful context. We can provide a better message by starting from the concrete details, as in the example below. Only once repetitive patterns emerge in a given test suite is it time to introduce abstractions.

func TestSplit(t *testing.T) {
    s, sep := "a:b:c", ":"
    words := strings.Split(s, sep)
    if got, want := len(words), 3; got != want {
        t.Errorf("Split(%q %q) returned %d words, want %d", s, sep, got, want)
    }
}

Now the test reports the function that was called, its inputs, and the siginificance of the result; it explicitly identifies the actual value and the expectation; and it continues to execute even if this assertion should fail. Once we've written a test like this, the natural next step is often not to define a function to replace the entire if statement, but to execute the test in a loop in which s, sep, and want vary, like the table-driven test.

The previous example didn't need any utility functions, but of course the shouldn't stop us from introducing functions when they help make the code simpler. The key to a good test is to start by implementing the concrete behavior that you want and only then use functions to simplify the code and eliminate repetition. Best results are rarely obtained by starting with a library of abstract, generic testing functions.

Avoiding Brittle Tests

An application that often fails when it encounters new but valid inputs is called buggy; a test that spuriously fails when a sound change was made to the program is called brittle. Just as a buggy program frustrates its users, a brittle test exasperates its maintainers. The most brittle tests, which fail for almost any change to the production code, good or bad, are sometimes called change detector or status quo tests, and the time spent dealing with them can quickly deplete any benefit they once seemed to provide.

When a function under test produces a complex output such as a long string, an elaborate data structure, or a file, it's tempting to check that the output is exactly equal to some "golden" value that was expected when the test was written. But as the program evolves, parts of the output will likely change, probably in good ways, but change nonetheless. And it's not just the output; functions with complex inputs often break because the input used in a test is no longer valid.

The easiest way to avoid brittle tests is to check only the properties you care about. Test your program's simpler and more stable interfaces in preference to its internal functions. Be selective in your assertions. Don't check for exact string matches, for example, but look for relevant substrings that will remain unchanged as the program evolves. It's often worth writing a substantial function to distill a complex output down to its essence so that assertions will be reliable. Even though that may seem like a lot of up-front effort, it can pay for itself quickly in time that would otherwise be spent fixing spuriously failing tests.

Coverage

By its nature, testing is never complete. As the influential computer scientist Edsger Dijkstra put it, "Testing shows the presence, not the absence of bugs." No quantity of tests can ever prove a package free of bugs. At best, they increse our confidence that the package works well in a wide range of important scenarios.

The degree to which a test suite exercises the package under test is called the test's coverage. Coverage can't be quantified directly—the dynamics of all but the most trivial programs are beyond precise measurement—but there are heuristics that can help us direct our testing efforts to where they are more likely to be useful.

Statement coverage is the simplest and most widely used of these heuristics. The statement coverage of a test suite is the fraction of source statements that are executed at least once during the test. In this section, we'll use Go's cover tool, which is integrated into go test, to measure statement coverage and help identify obvious gaps in the tests.

First, we should use go test -v -run=TestCoverageFuncName to check that the test passes. Then, we run the test with the -coverprofile flag, such as go test -run=TestCoverageFuncName -coverprofile=c.out. This flag enables the collection of coverage data by instrumenting the production code. That is, it modifies a copy of the source code so that before each block of statements is executed, a boolean variable is set, with one variable per block. Just before the modified program exits, it writes the value of each variable to the specified log file c.out and prints a summary of the fraction of statements that were executed. (If all you need is the summary, use go test -cover).

If go test is run with the -covermode=count flag, the instrumentation for each block increments a counter instead of setting a boolean. The resulting log of execution counts of each block enables quantiative comparisons between "hotter" blocks, which are more frequently executed, and "colder" ones.

Having gathered the data, we run the cover tool, which processes the log, generates an HTML report, and opens it in a new browser window (by $go tool cover -html=c.out).

Each statement is colored green if it was covered or red if it was not covered.

Achieving 100% statement coverage sounds like a noble goal, but it is not usually feasible in practice, nor it is likely to be a good use of effort. Just because a statement is executed does not mean it is bug-free; statements containing complex expressions must be executed many times with different inputs to cover the interesting cases. Some statements, such as those that handle esoteric errors, are hard to exercise but rarely reached in practice. Testing is fundamentally a pragmatic endeavor, a trade-off between the cost of writing tests and the cost of failures that could have been prevented by tests. Coverage tools can help identify the weakest spots, but devising good test cases demands the same rigorous thinking as programming in general.

Benchmark Functions

Benchmarking is the practice of measuring the performance of a program on a fixed workload. In Go, a benchmark function looks like a test function, but with the Benchmark prefix and a *testing.B parameter that provides most of the same methods as a *testing.T, plus a few extra related to performance measurement. It also exposes an integer field N, which specifies the number of times to perform the operation being measured.

Here's a benchmark for IsPalindrome that calls it N times in a loop.

func BenchmarkIsPalindrome(b *testing.B) {
    for i := 0; i < b.N; i++ {
        IsPalindrome("A man, a plan, a canal: Panama")
    }
}

We run it with the command below. Unlike tests, by default no benchmarks are run. The argument to the -bench flag selects which benchmarks to run. It is a regulare expression matching the name of Benchmark functions, with default value that matches none of them. The "." pattern causes it to match all benchmarks in the currently package. Let's see a output result about Benchmark:

Star-Wars:word2 Jiang_Hu$ go test -bench=.
goos: darwin
goarch: amd64
BenchmarkIsPalindrome-4          5000000              380 ns/op
PASS
ok      _/Users/Jiang_Hu/Desktop/all/gopl/src/ch11/word2        2.314s

The benchmark name's numeric suffix, 4 here, indicates the value of GOMAXPROCS, which is important for concurrent benchmarks.

The report tell us that each call to IsPalindrome took about 0.38 microseconds, averaged over 5,000,000 runs. Since the benchmark runner initially has no idea how long the operation takes, it makes some initial measurements using small values of N and then extrapolates to a value large enough for a stable timing measurement to be made.

The reason the loop is implemented by the benchmark function, and not by the calling code in the test driver, is so that the benchmark function has the opportunity to execute any necessary one-time setup code outside the loop without this adding to the measured time of each iteration. If this setup code is still perturbing the results, the testing.B parameter provides methods to stop, resume, and reset the timer, but these are rarely needed.

In fact, the fastest program is often the one that makes the fewest memory allocations. The -benchmem command-line flag will include memory allocation statistics in its report.

Benchmarks like this tell us the absolute time required for a given operation, but in many settings the interesting performance questions are about the relative timings of two different operations. For example, if a function takes 1ms to process 1,000 elements, how long will it take to process 10,000 or a million? Such comparisons reveal the asymptotic growth of the running time of the function. Another example: what is the best size for an I/O buffer? Benchmarks of application throughput over a range of sizes can help us choose the smallest buffer that delivers satisfactory performance. A third example: which algorithm performs best for a given job? Benchmarks that evaluate two different algorithms on the same input data can often show the strengths and weaknesses of each one on important or representative workloads.

Comparative benchmarks are just regular code. They typically take the form of a single parameterized function, called from several Benchmark functions with different values, like this:

func benchmark(b *testing.B, size int) { /* ... */ }
func Benchmark10(b *testing.B) { benchmark(b, 10) }
func Benchmark100(b *testing.B) { benchmark(b, 100) }
func Benchmark1000(b *testing.B) { benchmark(b, 1000) }

The parameter size, which specifies the size of the input, varies across benchmarks but is constant within each benchmark. Resist the temptation to use the parameter b.N as the input size. Unless you interpret it as an iteration count for a fixed-size input, the results of your benchmark will be meaningless.

Patterns revealed by comparative benchmarks are particularly useful during program design, but we don't throw the benchmarks away then the program is working. As the program evolves, or its input grows, or it is deployed on new operating systems or processors with different characteristics, we can reuse those benchmarks to revisit design decisions.

Profiling

When we wish to look carefully at the speed of our programs, the best technique for identifying the critical code is profiling. Profiling is an automated approach to performance measurement based on sampling a number of profile events during execution, then extrapolating from them during a post-processing step; the resulting statistical summary is called a profile.

Go supports many kinds of profiling, each concerned with a different aspect of performace, but all of them involve recording a sequence of events of interest, each of which has an accompanying stack trace—the stack of function calls active at the moment of the event. The go test tool has built-in support for several kinds of profiling.

A CPU profile identifies the functions whose execution requires the most CPU time. The currently running thread on each CPU is interrupted periodically by the operating system every few milliseconds, with each interruption recording one profile event before normal execution resumes.

A heap profile identifies the statements responsible for allocating the most memory. The profiling library samples calls to the internal memory allocation routines so that on average, one profile event is recorded per 512KB of allocated memory.

A blocking profile identifies the operations responsible for blocking goroutines the longest, such as system calls, channel sends and receives, and acquisitions of locks. The profiling library records an event every time a goroutine is blocked by one fo these operations.

Gathering a profile for code under test is as easy as enabling one of the flags below. Be careful when using more than one flag at a time, however: the machinery for gathering one kind of profile may skew the results of others.

$ go test -cpuprofile=cpu.out
$ go test -blockprofile=block.out
$ go test -memprofile=mem.out

It's easy to add profiling support to non-test programs too, though the details of how we do that vary between short-lived command-line tools and long-running server applications. Profiling is especially useful in long-running applications, so the Go runtime's profiling features can be enabled under programmer control using the runtime API.

Once we've gathered a profile, we need to analyze it using the pprof tool. This is a standard part of the Go distribution, but since it's not an everyday tool, it's accessed indirectly using go tool pprof. It has dozens of features and options, but basic use requires only two arguments, the executable that produced the profile and the profile log.

To make profiling efficient and to save space, the log does not include function names; instead, functions are identified by their addresses. This means that pprof needs the executable in order to make sense of the log. Although go test usually discards the test executable once the test is complete, when profiling is enabled it saves the executable as foo.test, where foo is the name of the tested package.

The commands below show how to gather and display a simple CPU profile. We've selected one of the benchmarks from net/http package. It is usually better to profile specific benchmarks that have been constructed to be representative of worloads one cares about. Benchmarking test cases is almost never representative, which is why we disabled them by using the filter -run=NONE.

$ go test -run=NONE -bench=ClientServerParallelTLS64 -cpuprofile=cpu.log net/http
$ go tool pprof -text -nodecount=10 ./http.test cpu.log

The -text flag specifies the output format, in this case, a textual table with one row per function, sorted so the "hottest" functions—those that consume the most CPU cycles —appear first. The -nodecount=10 flag limits the result to 10 rows. For gross performance problems, this textual format may be enough to pinpoint the cause.

This profile tells us which function is important to the performance of this particular HTTPS benchmark. By contrast, if a profile is dominated by memory allocation functions from the runtime package, reducing memory consumption may be a worthwhile optimization.

For more subtle problems, you may be better off using one of pprof's graphical displays. These require GraphViz, which can be downloaded from www.graphviz.org. The -web flag then renders a directed graph of the functions of the program, annotated by their CPU profile numbers and colored to indicate the hottest functions.

We've only scratched the surface of Go's profiling tools here. To find out more, read the "Profiling Go Programs" article on the Go Blog.

Example Functions

The third kind of function treated specially by go test is an example function, one whose name starts with Example. It has neither parameters nor results. Here's an example function for IsPalindrome:

func ExampleIsPalindrome() {
fmt.Println(IsPalindrome("A man, a plan, a canal: Panama"))
fmt.Println(IsPalindrome("palindrome"))
// Output:
// true
// false
}

Example functions serve three purpose. The primary one is documentation: a good example can be a more succinct or intuitive way to convey the behavior of a library function than its prose description, especially when used as a reminder or quick reference. An example can also demonstrate the interaction between several types and functions belonging to one API, whereas prose documentation must always be attached to one place, like a type or function declaration or the package as a whole. And unlike examples within comments, example functions are real Go code, subject to compile-time checking, so they don't become stale as the code envolves.

The second purpose is that examples are executable tests run be go test. If the example function contains a final // Output: comment like the one above, the test driver will execute the function and check that what it printed to its standard output matches the text within the comment.

The thired purpose of an example is hands-on experimentation.

Reflection

Go provides a mechanism to update variables and inspect their values at run time, to call their methods, and to apply the operations instrinsic to their representation, all without knowing their types at compile time. This mechanism is called reflection.

Why Reflection?

Sometimes we need to wirte a function capable of dealing uniformly with values of types that don't satisfy a common interface, don't have a known representation, or don't exist at the time we design the function—or even all three.

reflect.Type and reflect.Value

Reflection is provided by the reflect package. It defines two important types, Type and Value. A Type represents a Go type. It is an interface with many methods for discriminating among types and inspecting their componets, like the fields of a struct or the parameters of a function. The sole implementation of reflect.Type is the type descriptor, the same entity that identifies the dynamic type of an interface value.

The reflect.TypeOf function accepts any interface{} and returns its dynamic type as a reflect.Type.

Because reflect.TypeOf returns an interface value's dynamic type, it always returns a concrete type.

The other important type in the reflect package is Value. A reflect.Value can hold a value of any type. The reflect.ValueOf function accepts any interface{} and returns a reflect.Value containing the interface's dynamic value. As with reflect.TypeOf, the results of reflect.ValueOf are always concrete.

By the way, even unexported fields are visible to reflection.

Setting Variables with reflect.Value

A variable is an addressable storage location that contains a value, and its value may be updated through that address.

A similar distinction applies to reflect.Values. Some are addressable; others are not.

We can calling *reflect.ValueOf(&x).Elem(), to obtain an addressable Value for any variable x.

We can ask a reflect.Value whether it is addressable through its CanAddr method.

We obtain an addressable reflect.Value whenever we indirect through a pointer, even if we started from a non-addressable Value. All the usual rules for addressability have analogs for reflection.

To recover the variable from an addressable reflect.Value requires three steps. First, we call Addr(), which returns a Value holding a pointer to the variable. Next, we call Interface() on this Value, which returns an interface{} value containing the pointer. Finally, if we know the type of the variable, we can use a type assertion to retrieve the contents of the interface as an ordinary pointer. We can then update the variable through the pointer:

x := 2
d := reflect.ValueOf(&x).Elem() // d refers to the variable x
px := d.Addr().Interface().(*int) // px := &x
*px = 3 // x = 3
fmt.Println(x) // "3"

Or, we can update the variable referred to by an addressable reflect.Value directly, without using a pointer, by calling the reflect.Value.Set method:

d.Set(reflect.ValueOf(4))
fmt.Println(x) // "4"

The same checks for assignability that are ordinarily performed by the compiler are done at run time by the Set methods, Above, the variable and the value both have type int, but if the variable had been an int64, the porgram would panic, so it's crucial to make sure the value is assignable to the type of the variable:

d.Set(reflect.ValueOf(int64(5))) // panic: int64 is not assignable to int

And of course calling Set on a non-addressable reflect.Value panics too:

x := 2
b := reflect.ValueOf(x)
b.Set(reflect.ValueOf(3)) // panic: Set using unaddressable value

There are variants of Set specialized for certain groups of basic types: SetInt, SetUint, SetString, SetFloat, and so on:

d := reflect.ValueOf(&x).Elem()
d.SetInt(3)
fmt.Println(x) // "3"

An addressable reflect.Value records whether it was obtained by traversing an unexported struct field and, if so, disallows modification. Consequently, CanAddr is not usually the right check to use before setting a variable. The related method CanSet reports whether a reflect.Value is addressable and settable.

A Word of Caution

Reflection is a powerful and expressive tool, but it should be used with care, for three reasons.

The first reason is that reflection-based code can be fragile. For every mistake that would cause a compiler to report a type error, a reflection error is reported during execution as a panic, possibly long after the program was written or even long after it has started running.

The second reason to avoid reflection is that since types serve as a form of documentation and the operations of reflection cannot be subject to static type checking, heavily reflective code is often hard to understand.

The third reason is that reflection-based functions may be one or two orders of magnitude slower than code specialized for a particular type. Testing is a particularly good fit for reflection since most tests use small data sets. But for functions on the critical path, reflection is best avoided.

Low-Level Programming

The design of Go guarantees a number of safety properties that limit the ways in which a Go program can "go wrong." During compilation, type checking detects most attempts to apply an operation to a value that is inappropriate for its type, for instance, subtracting one string from another. Strict rules for type conversions prevent direct access to the internals of built-in type like strings, maps, slices, and channels.

For errors that cannot be detected statically, such as out-of-bounds array accesses or nil pointer dereferences, dynamic checks ensure that the program immediately terminates with an informative error whenever a forbidden operation occurs. Automatic memory management(garbage collection) eliminates "use after free" bugs, as well as most memory leaks.

Many implementation details are inaccessible to Go programs. There is no way to discover the memory layout of an aggregate type like a struct, or the machine code for a function, or the identity of the operating system thread on which the current goroutine is running. Indeed, the Go scheduler freely moves goroutines from one thread to another. A pointer identifies a variable without revealing the variable's numeric address. Addresses may change as the garbage collector moves variables; pointers are transparently updated.

Together, these features make Go programs, especially failing ones, more predictable and less mysterious than programs in C, the quintessential low-level language. By hiding the underlying details, they also make Go programs highly portable, since the language semantics are largely independent of any particular compiler, operating system, or CPU architecture.

Occasionally, we may choose to forfeit some of these helpful guarantees to achieve the highest possible performance, to interoperate with libraries written in other languages, or to implement a function that cannot be expressed in pure Go.

In this chapter, we'll see how the unsafe package lets us step outside the usual rules, and how to use the cgo tool to create Go bindings for C libraries and operating system calls.

The approaches described in this chapter should not be used frivolously. Without careful attention to detail, they may cause the kinds of unpredictable, inscrutable, non-local failures with which C programmers are unhappily acquainted.

The unsafe package is rather magical. Although it appears to be a regular package and is imported in the usual way, it is actually implemented by the compiler. It provides access to a number of built-in language features that are not ordinarily available because they expose details of Go's memory layout. Presenting these features as a separate package makes the rare occasions on which they are needed more conspicuous. Also, some environments may restrict the use of the unsafe package for security reasons.

Package unsafe is used extensively within low-level packages like runtime, os, syscall and net that interact with the operating system, but is almost never needed by ordinary programs.

unsafe.Sizeof, Alignof, and Offsetof

The unsafe.Sizeof function reports the size in bytes of the representation of its operand, which may be an expression of any type; the expression is not evaluated.

Sizeof reports only the size of the fixed part of each data structure, like the pointer and length of a string, but not indirect parts like the contents of the string.

Computers load and store values from memory most efficiently when those values are properly aligned. For example, the address of a value of a two-byte type such as int16 should be an even number, the address of a four-byte value such as a float64, uint64, or 64-bit pointer should be a multiple of eight. Alignment requirements of higher multiples are unusual, even for larger data types such as complex128.

For this reason, the size of a value of an aggregate type(a struct or array) is at least the sum of the sizes of its fields or elements but may be greater due to the presence of "holes". Holes are unused spaces added by the compiler to ensure that the following field or element is properly aligned relative to the start of the struct or array.

The language specification does not guarantee that the order in which fields are declared is the order in which they are laid out in memory, so in theory a compiler is free to rearrange them, although as we write this, none do. If the types of a struct's fields are of different sizes, it may be more space-efficient to declare the fields in an order that packs them as tightly as possible.

The unsafe.Alignof function reports the required alignment of its argument's type. Typically, boolean and numeric types are aligned to their size(up to a maximum of 8 bytes) and all other types and word-aligned.

The unsafe.Offsetof function, whose operand must be a field selector x.f, computes the offset of field f relative to the start of its enclosing struct x, accounting for holes, if any.

Despite their names, these functions are not in fact unsafe, and they may be helpful for understanding the layout of raw memory in a program when optimizing for space.

unsafe.Pointer

Most pointer types are written *T, meaning "a pointer to a variable of type T". The unsafe.Pointer type is a special kind of pointer that can hold the address of any variable.

An ordinary *T pointer may be converted to an unsafe.Pointer, and an unsafe.Pointer may be converted back to an ordinary pointer, not necessarily of the same type *T. By converting a *float64 pointer to a *uint64, for instance, we can inspect the bit pattern of a floating-point variable:

func Float64bits(f float64) uint64 { return *(*uint64)(unsafe.Pointer(&f)) }
fmt.Printf("%#016x\n", Float64bits(1.0)) // 0x3ff0000000000000

Many unsafe.Pointer values are thus intermediaries for converting ordinary pointers to raw numeric addresses and back again. The example below takes the address of variable x, adds the offset of its b field, converts the resulting address to *int16, and through that pointer updates x.b:

var x struct {
    a bool
    b int16
    c []int
}

// equivalent to pb := &x.b
pb := (*int16)(unsafe.Pointer(uintptr(unsafe.Pointer(&x)) + unsafe.Offsetof(x.b)))
*pb = 42

fmt.Println(x.b) // 42

Deep Equivalence

The DeepEqual function from the reflect package reports whether two values are "deeply" equal. DeepEqual compares basic values as if by the built-in == operator; for composite valuse, it traverses them recursively, comparing corresponding elements. Because it works for any pair of values, even ones that are not comparable with ==, it finds widespread use in tests. The following test uses DeepEqual to compare two []string values:

func TestSplit(t *testing.T) {
    got := strings.Split("a:b:c", ":")
    want := []string{"a", "b", "c"}
    if !reflect.DeepEqual(got, want) {
        t.Logf("not deep equal")
    }
}

Although DeepEqual is convenient, its distinctions can seem arbitrary. For example, it doesn't consider a nil map equal to a non-nil empty map, nor a nil slice equal to a non-nil empty one:

var a, b []string = nil, []string{}
fmt.Println(reflect.DeepEqual(a, b)) // "false"

var c, d map[string]int = nil, make(map[string]int)
fmt.Println(reflect.DeepEqual(c, d)) // "false"

Calling C Code with cgo

A Go program might need to use a hardware driver implemented in C, query an embedded database implemented in C++, or use some liner algebra routines implemented in Fortran. C has long been the lingua franca of programming, so many packages intended for widespread use export a C-compatible API, regardless of the language of their implementation.

Another Word of Caution

We ended the previous chapter with a warning about the downsides of the reflection interface. That warning applies with even more force to the unsafe package described in this chapter.

High-level languages insulate programs and programmers not only from the arcane specifics of individual computer instruction sets, but from dependence on irrelevancies like where in memory a variable lives, how big a data type is, the details of structure layout, and a host of other implementation details. Because of that insulating layer, it's possible to write programs that are safe and robust and that will run on any operating system without change.

The unsafe package lets programmers reach through the insulation to use some crucial but otherwise inaccessible feature, or perhaps to achieve higher performance. The cost is usually to portability and safety, so one uses unsafe at one's peril. Our advice on how and when to use unsafe parallels Kunth's comments on premanture optimization. Most programmers will never need to use unsafe at all.