Android 性能建议 Performance Tips
原文:来自Android developers
Performance Tips
This document primarily covers micro-optimizations that can improve overall app performancewhen combined, but it‘s unlikely that these changes will result in dramatic performance effects. Choosing the right algorithms and data structures should always be yourpriority, but is outside the scope of this document. You should use the tips in this documentas general coding practices that you can incorporate into your habits for general code efficiency.
There are two basic rules for writing efficient code:
- Don‘t do work that you don‘t need to do.
- Don‘t allocate memory if you can avoid it.
One of the trickiest problems you‘ll face when micro-optimizing an Android app is that your app is certain to be running on multiple types ofhardware. Different versions of the VM running on differentprocessors running at different speeds. It‘s not even generally the casethat you can simply say "device X is a factor F faster/slower than device Y",and scale your results from one device to others. In particular, measurement on the emulator tells you very little about performance on any device. There are also huge differences between devices with and without a JIT: the best code for a device with a JIT is not always the best code for a devicewithout.
To ensure your app performs well across a wide variety of devices, ensureyour code is efficient at all levels and agressively optimize your performance.
Avoid Creating Unnecessary Objects
Object creation is never free. A generational garbage collector with per-thread allocationpools for temporary objects can make allocation cheaper, but allocating memoryis always more expensive than not allocating memory.
As you allocate more objects in your app, you will force a periodic garbage collection, creating little "hiccups" in the user experience. Theconcurrent garbage collector introduced in Android 2.3 helps, but unnecessary workshould always be avoided.
Thus, you should avoid creating object instances you don‘t need to. Someexamples of things that can help:
- If you have a method returning a string, and you know that its result will
always be appended to a
StringBuffer
anyway, change your signature and implementation so that the function does the append directly, instead of creating a short-lived temporary object. - When extracting strings from a set of input data, try to return a
substring of the original data, instead of creating a copy. You will create a
new
String
object, but it will share thechar[]
with the data. (The trade-off being that if you‘re only using a small part of the original input, you‘ll be keeping it all around in memory anyway if you go this route.)
A somewhat more radical idea is to slice up multidimensional arrays into parallel single one-dimension arrays:
- An array of
int
s is a much better than an array ofInteger
objects, but this also generalizes to the fact that two parallel arrays of ints are also a lot more efficient than an array of(int,int)
objects. The same goes for any combination of primitive types. - If you need to implement a container that stores tuples
of
(Foo,Bar)
objects, try to remember that two parallelFoo[]
andBar[]
arrays are generally much better than a single array of custom(Foo,Bar)
objects. (The exception to this, of course, is when you‘re designing an API for other code to access. In those cases, it‘s usually better to make a small compromise to the speed in order to achieve a good API design. But in your own internal code, you should try and be as efficient as possible.)
Generally speaking, avoid creating short-term temporary objects if youcan. Fewer objects created mean less-frequent garbage collection, which hasa direct impact on user experience.
Prefer Static Over Virtual
If you don‘t need to access an object‘s fields, make your method static.Invocations will be about 15%-20% faster.It‘s also good practice, because you can tell from the methodsignature that calling the method can‘t alter the object‘s state.
Use Static Final For Constants
Consider the following declaration at the top of a class:
staticint intVal =42;
staticString strVal ="Hello, world!";
The compiler generates a class initializer method,
called<clinit>
, that is executed when the class is first
used.The method stores the value 42 into intVal
, and extracts
areference from the classfile string constant table
for strVal
.When these values are referenced later on, they are
accessed with fieldlookups.
We can improve matters with the "final" keyword:
static final int intVal =42;
static final String strVal ="Hello, world!";
The class no longer requires
a <clinit>
method,because the constants go into
static field initializers in the dex file.Code that refers
to intVal
will usethe integer value 42 directly, and
accesses to strVal
willuse a relatively inexpensive
"string constant" instruction instead of afield lookup.
Note: This optimization applies only to primitive types
andString
constants, not arbitrary
reference types. Still, it‘s goodpractice to declare constants static
final
whenever possible.
Avoid Internal Getters/Setters
In native languages like C++ it‘s common practice to use getters(i =
getCount()
) instead of accessing the field directly (i=
mCount
). This is an excellent habit for C++ and is often practiced in
otherobject oriented languages like C# and Java, because the compiler canusually
inline the access, and if you need to restrict or debug field accessyou can add
the code at any time.
However, this is a bad idea on Android. Virtual method calls are expensive,much more so than instance field lookups. It‘s reasonable to followcommon object-oriented programming practices and have getters and settersin the public interface, but within a class you should always accessfields directly.
Without a JIT,direct field access is about 3x faster than invoking atrivial getter. With the JIT (where direct field access is as cheap asaccessing a local), direct field access is about 7x faster than invoking atrivial getter.
Note that if you‘re using ProGuard,you can have the best of both worlds because ProGuard can inline accessors for you.
Use Enhanced For Loop Syntax
The enhanced for
loop (also sometimes known as
"for-each" loop) can be usedfor collections that implement the Iterable
interface and for arrays.With
collections, an iterator is allocated to make interface
callsto hasNext()
and next()
. With
an ArrayList
,a hand-written counted loop
isabout 3x faster (with or without JIT), but for other collections the
enhancedfor loop syntax will be exactly equivalent to explicit iterator
usage.
There are several alternatives for iterating through an array:
staticclassFoo{
int mSplat;
}
Foo[] mArray =...
publicvoid zero(){
int sum =0;
for(int i =0; i < mArray.length;++i){
sum += mArray[i].mSplat;
}
}
publicvoid one(){
int sum =0;
Foo[] localArray = mArray;
int len = localArray.length;
for(int i =0; i < len;++i){
sum += localArray[i].mSplat;
}
}
publicvoid two(){
int sum =0;
for(Foo a : mArray){
sum += a.mSplat;
}
}
zero()
is slowest, because the JIT can‘t yet optimize
awaythe cost of getting the array length once for every iteration through
theloop.
one()
is faster. It pulls everything out into
localvariables, avoiding the lookups. Only the array length offers a
performancebenefit.
two()
is fastest for devices without a JIT,
andindistinguishable from one() for devices with a
JIT.It uses the enhanced for loop syntax introduced in version 1.5 of the
Javaprogramming language.
So, you should use the enhanced for
loop by default,
but consider ahand-written counted loop for performance-critical ArrayList
iteration.
Tip:Also see Josh Bloch‘s Effective Java, item 46.
Consider Package Instead of Private Access with Private Inner Classes
Consider the following class definition:
假设有如下的类定义:
publicclassFoo{
privateclassInner{
void stuff(){
Foo.this.doStuff(Foo.this.mValue);
}
}
privateint mValue;
publicvoid run(){
Innerin=newInner();
mValue =27;
in.stuff();
}
privatevoid doStuff(int value){
System.out.println("Value is "+ value);
}
}
What‘s important here is that we define a private inner
class(Foo$Inner
) that directly accesses a private method and a
privateinstance field in the outer class. This is legal, and the code prints
"Value is27" as expected.
The problem is that the VM considers direct access
to Foo
‘sprivate members
from Foo$Inner
to be illegal
becauseFoo
and Foo$Inner
are different
classes, even thoughthe Java language allows an inner class to access an outer
class‘ privatemembers. To bridge the gap, the compiler generates a couple of
syntheticmethods:
/*package*/staticintFoo.access$100(Foo foo){
return foo.mValue;
}
/*package*/staticvoidFoo.access$200(Foo foo,int value){
foo.doStuff(value);
}
The inner class code calls these static methods whenever it needs toaccess
the mValue
field or invoke
the doStuff()
methodin the outer class. What this means
is that the code above really boils down to a case where you‘re accessing member
fields through accessor methods.Earlier we talked about how accessors are slower
than direct fieldaccesses, so this is an example of a certain language idiom
resulting in an"invisible" performance hit.
If you‘re using code like this in a performance hotspot, you can avoid the overhead by declaring fields and methods accessed by inner classes to have package access, rather than private access. Unfortunately this means the fieldscan be accessed directly by other classes in the same package, so you shouldn‘tuse this in public API.
Avoid Using Floating-Point
As a rule of thumb, floating-point is about 2x slower than integer onAndroid-powered devices.
In speed terms, there‘s no difference
between float
anddouble
on the more
modern hardware. Space-wise, double
is 2x larger. As with
desktop machines, assuming space isn‘t an issue, youshould
prefer double
to float
.
Also, even for integers, some processors have hardware multiply but lack hardware divide. In such cases, integer division and modulus operations areperformed in software—something to think about if you‘re designing a hash table or doing lots of math.
Know and Use the Libraries
In addition to all the usual reasons to prefer library code over rolling your
own, bear in mind that the system is at liberty to replace callsto library
methods with hand-coded assembler, which may be better than the best code the
JIT can produce for the equivalent Java. The typical examplehere
is String.indexOf()
and related APIs,
which Dalvik replaces withan inlined intrinsic. Similarly, the System.arraycopy()
methodis about
9x faster than a hand-coded loop on a Nexus One with the JIT.
Use Native Methods Carefully
Developing your app with native code using theAndroid NDKisn‘t necessarily more efficient than programming with theJava language. For one thing,there‘s a cost associated with the Java-native transition, and the JIT can‘toptimize across these boundaries. If you‘re allocating native resources (memoryon the native heap, file descriptors, or whatever), it can be significantlymore difficult to arrange timely collection of these resources. You alsoneed to compile your code for each architecture you wish to run on (ratherthan rely on it having a JIT). You may even have to compile multiple versionsfor what you consider the same architecture: native code compiled for the ARMprocessor in the G1 can‘t take full advantage of the ARM in the Nexus One, andcode compiled for the ARM in the Nexus One won‘t run on the ARM in the G1.
Native code is primarily useful when you have an existing native codebasethat you want to port to Android, not for "speeding up" parts of your Android appwritten with the Java language.
If you do need to use native code, you should read ourJNI Tips.
Performance Myths
On devices without a JIT, it is true that invoking methods via avariable with
an exact type rather than an interface is slightly moreefficient. (So, for
example, it was cheaper to invoke methods on aHashMap map
than
a Map map
, even though in bothcases the map was
a HashMap
.) It was not the case that thiswas 2x slower; the
actual difference was more like 6% slower. Furthermore,the JIT makes the two
effectively indistinguishable.
On devices without a JIT, caching field accesses is about 20% faster than repeatedly accesssing the field. With a JIT, field access costs about the sameas local access, so this isn‘t a worthwhile optimization unless you feel itmakes your code easier to read. (This is true of final, static, and staticfinal fields too.)
Always Measure
Before you start optimizing, make sure you have a problem that youneed to solve. Make sure you can accurately measure your existing performance,or you won‘t be able to measure the benefit of the alternatives you try.
Every claim made in this document is backed up by a benchmark. The sourceto these benchmarks can be found in the code.google.com"dalvik" project.
The benchmarks are built with theCaliper microbenchmarkingframework for Java. Microbenchmarks are hard to get right, so Caliper goes outof its way to do the hard work for you, and even detect some cases where you‘renot measuring what you think you‘re measuring (because, say, the VM hasmanaged to optimize all your code away). We highly recommend you use Caliperto run your own microbenchmarks.
You may also findTraceview usefulfor profiling, but it‘s important to realize that it currently disables the JIT,which may cause it to misattribute time to code that the JIT may be able to winback. It‘s especially important after making changes suggested by Traceviewdata to ensure that the resulting code actually runs faster when run withoutTraceview.
译文:
这篇文章主要介绍一些结合起来使用能提升app 整体性能的细小的优化方法,但不要期待这些修改能带来巨大的性能改变。你应该花更多精力在选择合适的算法和数据结构,但这些不在该文章的主题之内。为了写出高性能的代码,你应该将这些帮助提示融入你的编码习惯中。
编写高效代码有两个基本原则:
- 不做多余的事。
- 尽量避免内存分配(操作)。
避免多余对象创建
对象创建并非没有代价的。一个每个线程用于临时对象的的分配池的垃圾收集器可以降低内存分配的代价,但是一个需要分配内存的操作总是比不需要的代价大。
一旦你创建了过多的对象,便意味着你必须定期进行垃圾回收,从而对用户体验造成轻微卡顿的感觉。多线程的垃圾收集器在Android2.3 时被引入,但是仍然应该避免不必要的操作。
因此,你应该避免创建不必要的对象实例。一些例子:
- 如果你有一个方法返回了一个String,而你知道这个返回值必定为被拼接(append)到一个StringBuffer 中,这时应该改变你的方法签名和方法实现,让你的方法内部直接完成这个拼接,而不是创建一个临时对象。
- 当你从一连串的输入数据里面抽取出一个字符串时,试着返回一个源数据的substring(子串),而不是创建一个副本。你将会创建出一个新的string 对象,但是他将与源数据共享底部的char[].(这么做的代价是,即使你只适用源数据的其中一部分,你也会将所有数据保留在内存里)。
一个更激进的做法是将一个多维的数组分解成多个平行的一维数组:
- 一个存储int 的数组比存储Integer 对象的数组更好,同时这也引出一个事实:两个并行的使用的int 数组要比一个存储(int,int)结构对象的数组高效许多。这种情况对于所有基本类型都适用。
- 如果你想要实现一个存储(Foo,bar)元组对象的容器(集合类),记住,两个并行使用的Foo[] 和 bar[] 一般来说都比一个自定义的(Foo,bar)元组对象 的数组高效得多。(当然,如果你是在设计一个对外开发使用的API,一般来说牺牲一点性能来达到良好的API 设计总是值得的。但是,在你内部的代码实现中,你应该尽量使用高效的做法)。
对象创建并非没有代价的。一个每个线程用于临时对象的的分配池的垃圾收集器可以降低内存分配的代价,但是一个需要分配内存的操作总是比不需要的代价大。
一旦你创建了过多的对象,便意味着你必须定期进行垃圾回收,从而对用户体验造成轻微卡顿的感觉。多线程的垃圾收集器在Android2.3 时被引入,但是仍然应该避免不必要的操作。
因此,你应该避免创建不必要的对象实例。一些例子:
- 如果你有一个方法返回了一个String,而你知道这个返回值必定为被拼接(append)到一个StringBuffer 中,这时应该改变你的方法签名和方法实现,让你的方法内部直接完成这个拼接,而不是创建一个临时对象。
- 当你从一连串的输入数据里面抽取出一个字符串时,试着返回一个源数据的substring(子串),而不是创建一个副本。你将会创建出一个新的string 对象,但是他将与源数据共享底部的char[].(这么做的代价是,即使你只适用源数据的其中一部分,你也会将所有数据保留在内存里)。
一个更激进的做法是将一个多维的数组分解成多个平行的一维数组:
- 一个存储int 的数组比存储Integer 对象的数组更好,同时这也引出一个事实:两个并行的使用的int 数组要比一个存储(int,int)结构对象的数组高效许多。这种情况对于所有基本类型都适用。
- 如果你想要实现一个存储(Foo,bar)元组对象的容器(集合类),记住,两个并行使用的Foo[] 和 bar[] 一般来说都比一个自定义的(Foo,bar)元组对象 的数组高效得多。(当然,如果你是在设计一个对外开发使用的API,一般来说牺牲一点性能来达到良好的API 设计总是值得的。但是,在你内部的代码实现中,你应该尽量使用高效的做法)。
使用静态
如果你的方法没有使用到对象的域(成员变量),把你的方法改成static的。方法调用能提升15 % -20%。 这也是一个良好的做法,因为从方法的签名你能知道这个调用这个方法将不会改变对象的状态。
为变量添加 Static Final 标识
假设类里定义了一些变量:
staticint intVal =42;
staticString strVal ="Hello, world!";
编译器使用<clinit> 的初始化这个类,这是在类第一次使用时被执行的。这个方法将42 存储到intVal,然后为strVal 从类文件字符串常量表中取出一个引用。这些值之后通过域查找被使用。
我们可以通过final 关键字来优化:
static final int intVal =42;
static final String strVal ="Hello, world!";
这个类不再需要<clinit>的调用,因为这些常量进入dex 文件的静态域初始化。使用intVal 的代码将直接使用整型值42,而对strVal 的使用时通过相对代价低廉的 ”字符串常量“(String constant)指令而非域查找来实现的。
Note: 注意这个优化只适用于基本类型和String 常量,并非所有类型。不过,尽可能使用static final 来标示常量是一个良好的做法。
避免内部的Getter/Setters
在原生的语言里,如C++, 一个常有的做法是使用getters(i = getCount()) 而不是直接使用域(i = mCount). 对于C++ 来说,这是一个好习惯。这也被其他的面向对象语言中使用,比如C#和Java,因为编译器自己能内联域。如果需要限制或者debug 域的使用,你可以加上这样的代码。
但是但是,这在Android 里头是个糟糕的做法。方法调用比域查找代价高得多。遵循面向对象语言的规范,在公共接口里使用getter 和setter 是合理的,但是在类的内部,则应该总是使用域。
不使用JIT,直接使用域比调用getter 获取 快三倍多。使用JIT(这时直接适用域就跟使用本地数据一样廉价)后,速度提升到7倍多。
注意:如果你在使用ProGuard 的话,你可以任意使用这两种方法,因为ProGuard 会帮你自动内联域。
使用加强版的迭代
加强版的迭代(即 for-each 方法迭代)能被使用在实现了Iterable 接口的集合类和数组上。对于集合类,可以使用iterator来 调用hasnext() 和next()来进行迭代。对于ArrayList,手动计数的迭代要快3倍多比iterator或者for-each迭代(不管有没使用JIT)。而对于其他集合类,使用for-each迭代跟使用iterator差不多。
一般来说,你应该使用for-each 迭代。但是对于ArrayList 则应考虑使用手动计数的迭代。
使用包 以取代对私有内部类的私有使用
假设有如下的类定义:
publicclassFoo{
privateclassInner{
void stuff(){
Foo.this.doStuff(Foo.this.mValue);
}
}
privateint mValue;
publicvoid run(){
Innerin=newInner();
mValue =27;
in.stuff();
}
privatevoid doStuff(int value){
System.out.println("Value is "+ value);
}
}
这里的关键点在于,我们定义了一个私有的内部类(Foo$Inner
),内部类的方法里又调用了外部类的一个私有方法和一个私有的成员域(成员变量)。这是合法的,最后的结果也如预期打印出“Value
is27”。
但是,问题在于VM 将把(Foo$Inner
) 直接引用Foo 的私有成员当成是非法的操作,因为Foo
和 Foo$Inner 是不同的类,即使Java
允许一个内部类使用外部类的私有成员。为了解决这个问题,编译器将自己生成几个合成方法:
/*package*/staticintFoo.access$100(Foo foo){
return foo.mValue;
}
/*package*/staticvoidFoo.access$200(Foo foo,int value){
foo.doStuff(value);
}
当内部类需要引用外部类的mValue 私有成员域,或者调用doStuff()私有方法时,将调用这些静态方法。这意味着上述的代码变成了你是在使用方法调用来获取成员变量的值。早些时候我们已经提及方法调用要比直接的域使用效率低,因此这是一种程序语言惯用语法导致一个“隐性”的性能损失的例子。
如果你正在一个性能的关键处使用了类似语法,你可以通过将那些被内部类使用的域和方法改写成包访问权限的,而非私有权限的,以此来避免这些性能损失。不过,这会导致包内的其他类都能访问到该域和方法,因此,在公共(public)api 中,你不该这么做。
避免使用浮点数
经验告诉我们,在Android 设备上,使用浮点数,将比使用整形数慢上两倍。
从速度上讲,float 和double 在现代的硬件设备上没区别。存储空间上,double 是两倍大。对于桌面设备,不需要太关注考虑存储空间问题,所以你应该更多地使用double。
同样,对于整型数integer,一些处理器支持硬件乘法,但不支持硬件除法。在这种情况下,整型的除法和模除是在软件上进行的,这是当你在设计一个hash 表 或者做大量数学计算时应该考虑的事情。
使用的类库
除了一般我们提及的尽量使用类库而不是总靠自己实现的原因之外,有一点应该被牢记的是,系统可以把对类库的调用替换成更高效的汇编语言,这可能会比JIT
能生成等量Java 代码性能更好。典型的例子是,String.indexOf() 还有它相关的APIs,Dalvik
使用内联来替换原码。类似的System.arraycopy()
方法在使用JIT 的 Nexus
One 设备上的效率是自己手写的循环复制的差不多9倍。
谨慎使用Native 方法
谨慎使用Native 方法
利用Android NDK 使用Native (本地)语言 开发的Android App 并不一定就比使用Java开发的性能更卓越。至少有一点值得提出的,Java-native 的关联和通信是有代价的,JIT 并不能实现优化这种语言之间的差异。如果你正在分配native 资源(在native heap 上分配内存,文件描述符,或者其他的),对这些资源的定期回收可能明显困难得多。同时你也需要将你的代码为运行其上的不同的架构分开编译,而非只是依赖JIT 去完成。你甚至可能还要为同一架构编写不同版本代码:对于运行在ARM 处理器,为G1 编译的native代码并不能充分发挥Nexus One 上的A处理器的性能,然后为Nexus One 编译的Native 代码将不能运行在G1上,虽然都是ARM 架构。
性能误区
对于一个没有JIT 优化 的设备,通过一个具体类型对象调用比通过一个接口调用方法确实是快一些。(比如,调用HashMap map 的方法是要比调用Map map 的方法代价小些,即使实际上map 都是引用的HashMap 的一个实例)。但是并不是因此造成2倍的性能差异,事实上它只是快了6% 左右。事实是,JIT的优化进一步扩大了这种差异。
对于没有JIT 优化的设备,保存一个类成员域的引用并多次使用(就像局部变量)比多次请求这个类成员域(需要域查找)提升20%的性能。使用JIT 优化,他们两者是性能相当的,所以这种优化并不值得,除非你觉得这么做能提升你的代码可读性。(这个情况适用于final, static和 static final 标识的域)。
坚持评估
在你开始优化之前,你应该确保你当前有一个问题亟需解决。你必须确定你能准确评估你当前的性能,否则,你将不可能评估你在尝试的优化的措施效益。
在这篇文章里的每个结论都是有基准测试作支撑的。这些数据可以在code.google.com"dalvik" project 找到。
这些基准测试 是使用 Caliper 构建的。Caliper 是适用于Java的微型基准测试框架。Caliper 替你完成了微型基准测试的困难工作,甚至能检测到你设计的测试的偏差。(比如说,VM 已经帮你的代码进行了优化)我们非常推荐你使用Caliper 来运行的微型基准测试。
你可能会发现Traceview 对于分析非常有用,但是很重要的一点是,确保你当前禁用了JIT,否则可能导致最后的结果错误地将JIT 实现的提升归功于代码本身。特别是,当你根据Traceview 的信息建议进行了一些修改之后,想观察最后的代码是否比修改前运行得更快。
郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。