Golang Internals, Part 1: Main Concepts and Project Structure
This series of blog posts is intended for those who are already familiar with the basics of Go and would like to get a deeper insight into its internals. Today’s post is dedicated to the structure of Go source code and some internal details of the Go compiler. After reading this, you should be able to answer the following questions:
1. What is the structure of Go source code?
2. How does the Go compiler work?
3. What is the basic structure of a node tree in Go?
Getting started
When you start learning a new programming language, you can usually find a lot of “hello-world” tutorials, beginner guides, and books with details on main language concepts, syntax, and even the standard library. However, getting information on such things as the layout of major data structures that the language runtime allocates or what assembly code is generated when you call built-in function is not that easy. Obviously, the answers lie inside the source code, but, from my own experience, you can spend hours wandering through it without making much progress.
I will not pretend to be an expert on the topic, nor will I attempt to describe every possible aspect. Instead, the goal is to demonstrate how you can decipher Go sources on your own.
Before we can begin, we certainly need our own copy of Go source files. There is nothing special in getting them. Just execute:
git clone https://github.com/golang/go
Please note that the code in the main branch is being constantly changed, so I use the release-branch.go1.4 branch in this blog post.
Understanding project structure
If you look at the /src folder of the Go repository, you can see a lot of folders. Most of them contain source files of the standard Go library. The standard naming conventions are always applied here, so each package is inside a folder with a name that directly corresponds to the package name. Apart from the standard library, there is a lot of other stuff. In my opinion, the most important and useful folders are:
/src/cmd/ | Contains different command line tools. |
/src/cmd/go/ | Contains source files of a Go tool that downloads and builds Go source files and installs packages. While doing this, it collects all source files and makes calls to the Go linker and Go compiler command line tools. |
/src/cmd/dist/ | Contains a tool responsible for building all other command line tools and all the packages from the standard library. You may want to analyze its source code to understand what libraries are used in every particular tool or package. |
/src/cmd/gc/ | This is the architecture-independent part of the Go compiler. |
/src/cmd/ld/ | The architecture-independent part of the Go linker. Architecture-dependent parts are located in the folder with the “l” postfix that uses the same naming conventions as the compiler. |
/src/cmd/5a/, 6a, 8a, and 9a | Here you can find Go assembler compilers for different architectures. The Go assembler is a form of assembly language that does not map precisely to the assembler of the underlying machine. Instead, there is a distinct compiler for each architecture that translates the Go assembler to the machine’s assembler. You can find more details here. |
/src/lib9/, /src/libbio, /src/liblink | Different libraries that are used inside the compiler, linker, and runtime package. |
/src/runtime/ | The most important Go package that is indirectly included into all programs. It contains the entire runtime functionality, such as memory management, garbage collection, goroutines creation, etc. |
Inside the Go compiler
As I said above, the architecture-independent part of the Go compiler is located in the /src/cmd/gc/ folder. The entry point is located in the lex.c file. Apart from some common stuff, such as parsing command line arguments, the compiler does the following:
-
Initializes some common data structures.
-
Iterates through all of the provided Go files and calls the yyparse method for each file. This causes actual parsing to occur. The Go compiler uses Bison as the parser generator. The grammar for the language is fully described in the go.y file (I will provide more details on it later). As a result, this step generates a complete parse tree where each node represents an element of the compiled program.
-
Recursively iterates through the generated tree several times and applies some modifications, e.g., defines type information for the nodes that should be implicitly typed, rewrites some language elements—such as typecasting—into calls to some functions in the runtime package and does some other work.
-
Performs the actual compilation after the parse tree is complete. Nodes are translated into assembler code.
-
Creates the object file that contains generated assembly code with some additional data structures, such as the symbols table, which is generated and written to the disk.
Diving into Go grammar
Now lets take a closer look at the second step. The go.y file that contains language grammar is a good starting point for investigating the Go compiler and the key to understanding the language syntax. The main part of this file consists of declarations, similar to the following:
xfndcl: LFUNC fndcl fnbody fndcl: sym '(' oarg_type_list_ocomma ')' fnres | '(' oarg_type_list_ocomma ')' sym '(' oarg_type_list_ocomma ')' fnres
In this declaration, the xfndcl and fundcl nodes are defined. The fundcl node can be in one of two forms. The first form corresponds to the following language construct:
somefunction(x int, y int) int
and the second one to this language construct:
(t *SomeType) somefunction(x int, y int) int.
The xfndcl node consists of the keyword func that is stored in the constant LFUNC, followed by the fndcl and fnbodynodes.
An important feature of Bison (or Yacc) grammar is that it allows for placing arbitrary C code next to each node definition. The code is executed every time a match for this node definition is found in the source code. Here, you can refer to the result node as $$ and to the child nodes as $1, $2, …
It is easier to understand this through an example. Note that the following code is a shortcut version of the actual code.
fndcl: sym '(' oarg_type_list_ocomma ')' fnres { t = nod(OTFUNC, N, N); t->list = $3; t->rlist = $5; $$ = nod(ODCLFUNC, N, N); $$->nname = newname($1); $$->nname->ntype = t; declare($$->nname, PFUNC); } | '(' oarg_type_list_ocomma ')' sym '(' oarg_type_list_ocomma ')' fnres
First, a new node is created, which contains type information for the function declaration. The $3 argument list and the $5 result list are referenced from this node. Then, the $$ result node is created. It stores the function name and the type node. As you can see, there can be no direct correspondence between definitions in the go.y file and the node structure.
Understanding nodes
Now it is time to take a look at what a node actually is. First of all, a node is a struct (you can find a definition here). This struct contains a large number of properties, since it needs to support different kinds of nodes and different nodes have different attributes. Below is a description of several fields that I think are important to understand.
op | Node operation. Each node has this field. It distinguishes different kinds of nodes from each other. In our previous example, those were OTFUNC (operation type function) and ODCLFUNC (operation declaration function). |
type | This is a reference to another struct with type information for nodes that have type information (there are no types for some nodes, e.g., control flow statements, such as if, switch, or for). |
val | This field contains the actual values for nodes that represent literals. |
Now that you understand the basic structure of the node tree, you can put your knowledge into practice. In the next post, we will investigate what exactly the Go compiler generates, using a simple Go application as an example.
Read all parts of the series: Part 1 | Part 2 | Part 3 | Part 4 | Part 5
About the author: Sergey Matyukevich is a Cloud Engineer and Go Developer at Altoros. With 6+ years in software engineering, he is an expert in cloud automation and designing architectures for complex cloud-based systems. An active member of the Go community, Sergey is a frequent contributor to open-source projects, such as Ubuntu and Juju Charms.
本文来自:ALTOROS
感谢作者:Sergey Matyukevich
查看原文:Golang Internals, Part 1: Main Concepts and Project Structure
郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。