Go Runtime Scheduler: High Level Concurrency Overview

Shedding light on the core principles of concurrency within the Go Runtime Scheduler. Explore its adept management of threads and run queues, facilitating efficient concurrency handling at scale.

Friday, May 17, 2024

avatargolang

The concurrency model in Go/Golang can be a tricky concept to grasp since it diverges from other languages and runtimes. I'm going to do by best at explaining these concepts from a high level. However, there's plenty of material and documentation out in the wild that dive deeper into these concepts. I highly recommend referrencing the material linked below.

Concurrency vs Parallelism

I think it's important to revisit the differences between concurrency and parallelism.

Concurrency and parallelism are often used interchangeably, especially when describing asynchronous systems at scale. However, within the context of writing and building software, they represent distinct concepts.

"Concurrency is about dealing with a lot of processing at a single point in time."

vs.

"Parallelsim is about executing a lot of processes at once."

Concurrency refers to the ability of a system to manage multiple tasks at the same time. These tasks are not executed simultaneously but rather give the appearance of simultaneous execution. Concurrency requires careful management and coordination of shared resources to ensure tasks do not interfere with one another, using mechanisms such as locks.

Parallelism, on the other hand, refers to performing multiple tasks at exactly the same time. It requires multiple cores in a CPU or multiple processors in a system to execute instruction sets simultaneously.

Goroutines & Concurrency

Goroutines serve as a fundamental unit of concurrency within Go applications. Goroutines are remarkably lightweight, occupying a mere 2KB of memory upon instantiation. Depending on the underlying hardware and architecture, Go applications can efficiently manage hundreds of thousands or even millions of goroutines through scheduling.

To create a new Goroutine, prefixing a function call with the reserved keyword, go, will schedule the function call and execute at a later point in time.

func process() {
	go func() { }() // anonymous function
	go someOtherProcess() // named function
}

Even without explicitly scheduling a goroutine, you're effectively utilizing one through the 'main' goroutine.

main.go
package main
 
func main() {}

A key takeaway is that a goroutine is distinctly different from a system thread or a process. Instead, it's a lightweight thread of execution managed by the Go runtime scheduler. Also known as a "user-space" thread, it operates within the Go runtime rather than being managed directly by the operating system kernel thread. (More on this later).

Cooperative Concurrency

Go's concurrency model is referred to as, Cooperative Concurrency. Meaning that Goroutines "cooperate" or voluntarily yields control to the runtime scheduler to ensure that other Goroutiness can be scheduled and executed. This model enables Goroutines efficient scaling by minimizing context switching overhead and resource utilization.

User-Space vs Kernel-Space Threads

To understand why Goroutines have such a small footprint and how they're able to scale to the lengths they're able, it's important to cover it's usage of threads. There are two types of threads, Kernal-space (OS System Threads) and user-space threads.

Kernel-space threads are created using the OS kernel API's and system calls. pthreads in Unix based systems are an example kernel-space threads. Since the OS kernel manages kernel-space threads, they have the benefit of being isolated from one another with their own address space and context information. However, kernel-space threads have additional performance overhead and complexity since the kernel manages the scheduling and management for both processes and threads. Creating new kernel-space threads or context switching is sluggish.

User-space threads on the other hand are used by the Go Scheduler to orchestrate goroutines in order to scale and efficiently handle N number of arbitrary independent goroutines concurrently. The Go runtime scheduler operates on kernel-space system threads, and schedules goroutines on these threads as well. Only when necessary will the scheduler create additional OS thread. So the mapping of goroutines to kernel-space are not directly correlated.

User-Space Threads Recap

  • Managed by the application: User-space threads are managed by the application or user library without involvement from the OS Kernel.
  • Light-weight: Don't require kernel resources so they don't take more CPU or memory. The application or library can manage the scheduling, creation, and context switch between user-space threads. Operate independent of the kernel.
  • Efficient: Efficient for applications requiring a high amount of concurrency. User-space threads are more flexible and provide less overhead compared to kernel-space threads.

Kernel-Space Threads Recap

  • Managed by the operating system: The OS kernel is responsible for scheduling, management, and execution of OS threads.
  • Heavier: kernel threads consume more resources compared to user-space threads as they require kernel resources like memory and CPU for schduling and management.
  • Overhead: Involve more calls to System/Kernel calls and context switching compared to user-space. Transferring control from one thread to another in the same process involves a mode switch to kernel mode.

Go Runtime Scheduler

The Go runtime includes an internal scheduler responsible for managing Goroutines across multiple OS threads ("kernel-space"). This Runtime Scheduler facilitates Go's Cooperative Concurrency model, optimizing the performance of Go applications by effectively managing and scheduling Goroutines.

The scheduler can multiplex a large number of Goroutines onto a limited number of OS threads, allowing many Goroutines to be scheduled and executed on a single CPU core.

When a Goroutine is blocked by operations such as network I/O, blocking system calls, channel operations, or mutexes, the scheduler handles task-switching to another Goroutine, thereby maximizing CPU utilization.

The diagram demonstrates a significant disparity between the number of goroutines that can run and the limited number of system processing units available. This illustrates how the Go scheduler effectively manages a large number of goroutines with a finite amount of system resources.

M:N Scheduling

M:N Scheduler, is also known as the Go Runtime Scheduler, explains how Go enables efficient and scalable concurrency systems. The (M) & (N) refers to how the Go runtime can handle many goroutines (M) on a limited number of OS system threads (N). By decoupling goroutines (M) from system threads (N), the runtime is able to maximize resource utilization and performance.

Key Concepts

  1. Goroutine (G): Stands for goroutine.
  • Includes stack, instruction pointer, and other information for goroutine scheduling.
  1. Machine (M): Stands for machine.
  • Represents an OS kernel-space thread. Each M is capable of executing goroutines.
  1. Processor (P): Stands for processor.
  • Represents the context required for executing Go code including scheduling of local run queues. Effectively turns N:1 scheduler to a M:N scheduler. Number of Ps are usually determined by avilable hardware CPU's or GOMAXPROCS env variable.

In the above illustration, we see two threads (M), each with their own context (P), executing a Goroutine (G). Each thread (M) requires a context (P) to execute a Goroutine (G). The number of threads (M) is determined by the number of CPU cores the system has or the value of GOMAXPROCS.

Both threads (M) are executing Goroutines (G) colored green, while the non-colored ones represent Goroutines stored in run queues, ready to be scheduled.

Goroutine Queue Management

The Go Run Scheduler employs a combination of global and local run queues to efficiently manage the scheduling of Goroutines across multiple CPUs and system threads.

This hybrid approach maximizes the underlying system architecture by integrating the benefits of both distributed and centralized queue models.

Global vs Local Run Queue

Global Run Queue

The global run queue is a centralized queue where Goroutines that are ready to be executed but have not yet been assigned to a specific processor (P) are stored. It is primarily used to help balance work across multiple processors.

Advantages
  • The global run queue improves hardware utilization by preventing queue starvation and processors (P) from becoming idle.
Disadvantages
  • Synronization contention from multiple processors (P) accessing the same queue for work simultaneously. Lock contention could prevent the system from taking advantage of the underlying hardware.

Local Run Queue(s)

Each processor (P) in Go's runtime is assigned its own local queue for scheduling Goroutines. When a Goroutine is scheduled or ready to be executed, it is typically pushed to the local run queue of the processor (P) that created or awakened it.

Advantages
  • Reduced lock contention as synchronization overhead is minimized, with fewer processors competing for work.
  • Greater scalability for systems with a high number of CPU cores.
  • Faster context switching since Goroutines are prioritized to be scheduled on the same processor that initially scheduled them.
Disadvantages
  • Complex load balancing that could result in individual local queues being starved resulting in an idle processor (P).

Scheduling Flow

When Goroutines are created or ready to be executed, they are typically added to the local run queue of the processor (P) that scheduled them.

If a processor becomes idle, it first checks the global run queue for Goroutines. If the global run queue is empty, it engages in "work stealing," where it takes Goroutines from the local run queues of other processors to maintain efficient CPU utilization.

Work Stealing

The mechanism that helps balance the load across multiple Processes (P) in Go is "work stealing." This procedure allows idle Processes with empty local run queues to steal Goroutines from other Processes with larger local run queues.

Work stealing prevents the disadvantages of local run queues by ensuring maximum CPU utilization, thus improving overall performance and efficiency.

Application Layer

Here's just a very small sample of runtime package functions to interact with the Go Runtime and Go Runtime Scheduler.

runtime.NumGoRoutine()

Simple utility function provided by the standard library runtime function, to access the number of currently running Goroutines in a Go application.

package main
 
import (
	"fmt"
	"runtime"
	"sync"
	"time"
)
 
func main() {
	wg := sync.WaitGroup{}
	jobs := 10
	wg.Add(jobs)
	for i := 1; i <= jobs; i++ {
		go func(jobId int) {
			time.Sleep(20 * time.Second)
			fmt.Printf("\nJob: %d", jobId)
			wg.Done()
		}(i)
	}
	// Get the number of running Goroutines
	numGoroutines := runtime.NumGoroutine()
	fmt.Printf("Number of Running Goroutines: %d\n", numGoroutines)
	wg.Wait()
}

runtime.GOMAXPROCS()

Go's Runtime Scheduler will default to utilizing as many threads as there are CPU cores. (Able to determine how many CPU cores a given machine as by executing runtime.NumCPU().)

However runtime.GOMAXPROCS(n int) int, accepts an override limiter that can set a new limit of CPU's (system threads) the scheduler can utilize simultaneously. (Also returns the previous or default limit.)

This value can also be set via an environment variable: env:GOMAXPROCS.

runtime.Gosched()

runtime.Gosched() is a function to explicitly yield control to the Go scheduler, allowing for other Goroutines to be scheduled and executed.

runtime.LockOSThread() and runtime.UnlockOSThread()

runtime.LockOSThread() is a function that forces a given Goroutine to always run on the same system thread (M), preventing other Goroutines from executing on that thread unless runtime.UnlockOSThread() is called.

This can be particularly useful when utilizing CGO (calling C code), that depends on thread-local storage.

Additional Resources