For several weeks members of my team and I have been trying to reconcile discrepancies in performance data for our application's startup when measured locally and when measured on our CI platform. We weren't really sure what was causing the discrepancies and resorted to [tooltips keyword="brainstorming" content="Guessing"] about the possible causes.

One of our most promising hypothesis was a theory that there was more of a difference between measurement techniques than we imagined. While we weren't sure how the application's startup time was being measured on the CI platform, we knew how it was being measured locally -- using the Android Profiler.

We were testing the same build variant (a "release" build) locally and on the CI platform which meant that we were executing the same byte code on both. [tooltips keyword="What other variables could account for the difference in measurements?" content="Of course, we accounted for the possibility of testing on different hardware platforms."] It occurred to us that the CI platform might be taking its measurements with some tool other than the Android Profiler. One of the prerequisites of the Android Profiler is that the application under profile be built with the android:debuggable flag set to true.

Our team operated under the assumption that the value of this flag simply controlled an Android OS permission that dictated whether a debugger/profiler could attach to the process at runtime. In other words, we believed that the value of this flag should not change the way the Android OS's runtime treated the application. However, this was the only potential difference we could imagine between the local and CI platform test environment, so we decided to investigate.

Turns out this was a good idea.

To properly isolate whether the debuggable flag is the cause of the change in startup performance, we had to figure out how to make two different versions of the application -- both with the same byte code but one debuggable and the other not. In other words, we needed to generate two applications with the same bytecode but different values of the debuggable flag so that we could definitively attribute any differences in application start up time to the runtime. Ideally, then, we would toggle the flag on two copies of the same compiled version of the application (an apk).

The android:debuggable flag exists in an apk's AndroidManifest.xml file. In older versions of Android, this file was, as the extension suggests, plain text xml. However, in modern versions of Android, the AndroidManifest.xml file is stored in a binary format in the apk.

There is not much publicly available documentation (that we could find quickly) about this file format. Several tools are available for inspecting its contents (e.g., aapt from the Android SDK) and we found some tools that promised to help modify the contents of an existing apk (e.g., apktool) but couldn't get them to work properly.

That meant we were going to have to modify the binary-formatted xml file by hand! Fortunately, there is a good, open-source, standalone tool, axmldec, whose source code implicitly contains enough information about the file format to give us an idea of the binary format's structure.

The android:debuggable flag is an attribute for an element. In the case of the android:debuggable flag, that element is the application itself. According to the source code, each element attribute is stored on disk in 20-byte structure:


4 bytes
4 bytes
4 bytes
8 bytes

Taking a cue from Java class file encoding techniques, this binary format stores every string in a table and the use of a string is stored simply as a pointer to an entry in that table. Because the namespace (ns) and name (name) of the attribute are strings, their values in this structure are 4-byte pointers into the string table. In this case, those strings are "android" and "debuggable". The raw_value and value fields are more interesting. If the raw_value field is not 0xffffffff, then it represents a pointer to a string in the string table. In other words, the value of that attribute is some string and the value is meaningless. On the other hand, if the raw_value field is 0xffffffff, then the type of the element is some [tooltips keyword="primitive" content="int, boolean, float, etc"] whose value can be represented in 8 bytes.

A separate format defines the structure of the data in the raw_value field:


2 bytes
1 byte
1 byte
4 bytes

The data and data_type type fields are the most useful. Again, according to the source code, a boolean (the type which we suppose the android:debuggable attribute to be) has a value of 0x12. For a boolean, "false" is 0x0000 in the data field and "true" is anything else.

With that understanding, we compiled a version of the application with the debuggable flag set to true and unzipped the resulting apk to get access to AndroidManifest.xml. Then, we used a modified version of axmldec to find the offset of the android:debuggable attribute in the application element. Using our newfound knowledge of the format of the elements, we used vi and xxd to manually change the attribute's value from "true" to "false". We used that modified AndroidManifest.xml file to create a non-debuggable version of the apk by zipping the contents of the directory generated when we unzipped the original apk. Finally, we signed the apk with our key and verified that our change to the debuggable flag worked ($ aapt d badging | grep -i debug).


After all this work we learned ...? Nothing. Yet.

We did, however, have a basis for a/b testing: Two applications, identical except for their debuggable flags.

We had to answer one final question before starting to test: How to fairly measure the application startup time? Remember, because only one of the two variants is marked as debuggable, we cannot use the Android Profiler. We found our answer in the am_ flags.

The Android Activity Monitor emits am_* flags when certain meaningful events occur. You can see a list of all the am_* flags here. We decided to rely on the am_proc_start flag which contains information about the time an application takes to start.

We didn't need to build a raw data set where n was, say, 1000. We just needed something "rough and ready". Our test tool ran each version of the application 5 times and recorded the am_proc_start times for each run. We recorded and analyzed the values.

You can see the raw data here.

Well, wow! The results indicate that our assumption was almost entirely wrong! Simply changing the value of the android:debuggable flag has a significant impact on the time it takes an application to start.

For us, the implications of this discovery are enormous.

  1. The measurements that we are taking of the time it takes to start our application on the CI platform and in the profiler cannot be compared directly with the reported startup times of comparable applications (unless, of course, they are also measuring versions of their application with the android:debuggable flag set).
  2. A corollary to (1) is that we can use timing results from Android Profiler and the CI platform to monitor performance changes over time. The caveat is that we have to be vigilant against the possibility that our optimizations are only improving the performance of startup components that the runtime executes when the debuggable flag is set. While these optimizations will not negatively affect startup performance under routine conditions, spending time on them would be a waste of engineering resources.
  3. We must, now, consider whether the value of the android:debuggable attribute changes other aspects of runtime performance. Although we do not believe that it does, these results show that we can no longer be certain of that.

As a result of this investigation, we are going to spend time researching where the runtime checks the debuggable flag and how it changes its behavior based on that value. We have several theories.

If you have any information leading to the arrest of the party responsible for the lost performance, please contact us in the comments!

Thank you to my fantastic teammate Anny who provided valuable feedback on a draft of this post. If you think that the post is well written and informative, it's because of her. If you think that the post is confusing and boring, I take full responsibility.