[转载] Linux/Unix 系统负载原理解析[英文]
PDF下载:http://vdisk.weibo.com/s/cULRe2mYCQsPz/1407491911
[转载] http://yuxu9710108.blog.163.com/blog/static/23751534201022593028822/
CALC_LOAD() calc_load()工作原理
In this two part-series I want to explore the use of averages in performance analysis and capacity planning. There are many manifestations of averages e.g., arithmetic average (the usual one), moving average (often used in financial planning), geometric average (used in theSPEC CPU benchmarks), harmonic average (not used enough), to name a few.
More importantly, we will be looking at averages over time or time-dependent averages. A particular example of such a time-dependent average is the load average metric that appears in certain UNIX commands. In Part 1 I shall look at what the load average is and how it gets calculated. In Part 2 I‘ll compare it with other averaging techniques as they apply in capacity planning and performance analysis. This article does not assume you are a familiar with UNIX commands, so I will begin by reviewing those commands which display the load average metric. By Section 4, however, I‘ll be submerging into the UNIX kernel code that does all the work.
1 UNIX Commands
Actually, load average is not a UNIX command in the conventional sense. Rather it‘s an embedded metric that appears in the output of other UNIX commands like uptime andprocinfo. These commands are commonly used by UNIX sysadmin‘s to observe system resource consumption. Let‘s look at some of them in more detail.
1.1 Classic Output
The generic ASCII textual format appears in a variety of UNIX shell commands. Here are some common examples.uptime
The uptime shell command produces the following output:
[pax:~]% uptime |
procinfo
On Linux systems, the procinfo command produces the following output:
[pax:~]% procinfo |
w
The w(ho) command produces the following output:
[pax:~]% w |
top
The top command is a more recent addition to the UNIX command set that ranks processes according to the amount of CPU time they consume. It produces the following output:
4:09am up 12:48, 1 user, load average: 0.02, 0.27, 0.17 |
In each of these commands, note that there are three numbers reported as part of the load average output. Quite commonly, these numbers show a descending order from left to right. Occasionally, however, an ascending order appears e.g., like that shown in the top output above.
1.2 GUI Output
The load average can also be displayed as a time series like that shown here in some output from a tool called ORCA.
Figure 1: ORCA plot of the 3 daily load averages.
Although such visual aids help us to see that the green curve is more spikey and has more variability than the red curve, and it allows us to see a complete day‘s worth of data, it‘s not clear how useful this is for capacity planning or performance analysis. We need to understand more about how the load average metric is defined and calculated.
2 So What Is It?
So, exactly what is this thing called load average that is reported by all these various commands? Let‘s look at the official UNIX documentation.2.1 The man Page
[pax:~]% man "load average" |
... |
Which are the GREEN, BLUE and RED curves, respectively, in Figure 1 above. |
2.2 What the Gurus Have to Say
Let‘s turn to some UNIX hot-shots for more enlightenment.Tim O‘Reilly and Crew
The book UNIX Power Tools [POL97], tell us on p.726 The CPU:
|
|
Adrian Cockcroft on Solaris
In Sun Performance and Tuning [Coc95] in the section on p.97 entitled: Understanding and Using the Load Average, Adrian Cockcroft states:
|
O‘Reilly et al. also note some potential gotchas with using load average ...
|
3 Performance Experiments
The experiments described in this section involved running some workloads in background on single-CPU Linux box. There were two phases in the test which has a duration of 1 hour:- CPU was pegged for 2100 seconds and then the processes were killed.
- CPU was quiescent for the remaining 1500 seconds.
A Perl script sampled the load average every 5 minutes using the uptime command. Here are the details.
3.1 Test Load
Two hot-loops were fired up as background tasks on a single CPU Linux box. There were two phases in the test:- The CPU is pegged by these tasks for 2,100 seconds.
- The CPU is (relatively) quiescent for the remaining 1,500 seconds.
The 1-minute average reaches a value of 2 around 300 seconds into the test. The 5-minute average reaches 2 around 1,200 seconds into the test and the 15-minute average would reach 2 at around 3,600 seconds but the processes are killed after 35 minutes (i.e., 2,100 seconds).
3.2 Process Sampling
As the authors [BC01] explain about the Linux kernel, because both of our test processes are CPU-bound they will be in a TASK_RUNNING state. This means they are either:- running i.e., currently executing on the CPU
- runnable i.e., waiting in the run_queue for the CPU
The Linux kernel also checks to see if there are any tasks in a short-term sleep state calledTASK_UNINTERRUPTIBLE. If there are, they are also included in the load average sample. There were none in our test load.
The following source fragment reveals more details about how this is done.
600 * Nr of active tasks - counted in fixed-point numbers |
3.3 Test Results
The results of these experiments are plotted in Fig. 2. NOTE: These colors do not correspond to those used in the ORCA plots like Figure 1.Although the workload starts up instantaneously and is abruptly stopped later at 2100 seconds, the load average values have to catch up with the instantaneous state. The 1-minute samples track the most quickly while the 15-minute samples lag the furthest.
Figure 2: Linux load average test results.
For comparison, here‘s how it looks for a single hot-loop running on a single-CPU Solaris system.
Figure 3: Solaris load average test results.
You would be forgiven for jumping to the conclusion that the "load" is the same thing as the CPU utilization. As the Linux results show, when two hot processes are running, the maximum load is two (not one) on a single CPU. So, load is not equivalent to CPU utilization.
From another perspective, Fig. 2 resembles the charging
Figure 4: Charging and discharging of a capacitor.
and discharging of a capacitive RC circuit.
4 Kernel Magic
Now let‘s go inside the Linux kernel and see what it is doing to generate these load average numbers.
unsigned long avenrun[3]; |
1 HZ = 100 ticks |
4.1 Magic Numbers
The function CALC_LOAD is a macro defined in sched.h
58 extern unsigned long avenrun[]; /* Load averages */ |
A noteable curiosity is the appearance of those magic numbers: 1884, 2014, 2037. What do they mean? If we look at the preamble to the code we learn,
/* |
These magic numbers are a result of using a fixed-point (rather than a floating-point) representation.
Using the 1 minute sampling as an example, the conversion of exp(5/60) into base-2 with 11 bits of precision occurs like this:
|
(1) |
|
(2) |
T | EXP_T | Rounded |
5/60 | 1884.25 | 1884 |
5/300 | 2014.15 | 2014 |
5/900 | 2036.65 | 2037 |
2/60 | 1980.86 | 1981 |
2/300 | 2034.39 | 2034 |
2/900 | 2043.45 | 2043 |
These numbers are in complete agreement with those mentioned in the kernel comments above. The fixed-point representation is used presumably for efficiency reasons since these calculations are performed in kernel space rather than user space.
One question still remains, however. Where do the ratios like exp(5/60) come from?
4.2 Magic Revealed
Taking the 1-minute average as the example, CALC_LOAD is identical to the mathematical expression:
|
(3) |
|
(4) |
|
(5) |
Conversely, when n = 2 as it was in our experiments, the load average is dominated by the second term such that:
|
(6) |
5 Summary
So, what have we learned? Those three innocuous looking numbers in the LA triplet have a surprising amount of depth behind them.
The triplet is intended to provide you with some kind of information about how much work has been done on the system in the recent past (1 minute), the past (5 minutes) and the distant past (15 minutes).
As you will have discovered if you tried the LA Triplets quiz, there are problems:
- The "load" is not the utilization but the total queue length.
- They are point samples of three different time series.
- They are exponentially-damped moving averages.
- They are in the wrong order to represent trend information.
These inherited limitations are significant if you try to use them for capacity planning purposes. I‘ll have more to say about all this in the next online column Load Average Part II: Not Your Average Average.
References
- [BC01]
- D. P. Bovet and M. Cesati. Understanding the Linux Kernel. O‘Reilly & Assoc. Inc., Sebastopol, California, 2001.
- [Coc95]
- A. Cockcroft. Sun Performance and Tuning. SunSoft Press, Mountain View, California, 1stedition, 1995.
- [Gun01]
- N. J. Gunther. Performance and scalability models for a hypergrowth e-Commerce Web site. In R. Dumke, C. Rautenstrauch, A. Schmietendorf, and A. Scholz, editors,Performance Engineering: State of the Art and Current Trends, volume # 2047, pages 267-282. Springer-Verlag, Heidelberg, 2001.
- [POL97]
- J. Peek, T. O‘Reilly, and M. Loukides. UNIX Power Tools. O‘Reilly & Assoc. Inc., Sebastopol, California, 2nd edition, 1997.
郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。