TL;DR: In an Android app, strings obfuscated using dProtect can be recovered by Dalvik code emulation using Katalina.

Time To Read: 5 min

Katalina

Katalina is a Dalvik bytecode emulation tool. Katalina implements a sandbox in Python and executes one dalvik instructions at a time for emulation. As highlighted by the author, such approach is useful in dealing with obfuscated code, especially in recovering obfuscated strings.

String obfuscation is a highly effective technique used by both, malwares and genuine applications to slow down reverse engineering attempts. For instance, lack of visible strings can deter many automated static analysis tools and slows down manual attempts, forcing the attacker to perform dynamic analysis. The app authors can further implement various anti-dynamic analysis techniques to prolong the time to reverse engineer the app.

To put Katalina to test, I created an obfuscated application using dProtect. dProtect is an open source obfuscation tool for Android. It is an extension of Proguard and offers code obfuscation for Java and Kotlin, along with symbol obfuscation feature from Proguard.

dProtect String Obfuscation

DetectMagiskHide is used. Following dProtect configuration is used to enable string obfuscation, along with other techniques:

-keep,allowobfuscation class com.** { *; }

-obfuscations *
-obfuscate-strings      class com.darvin.security.** { *; }
-obfuscate-arithmetic   class com.darvin.security.** { *; }
-obfuscate-constants    class com.darvin.security.** { *; }
-obfuscate-control-flow class com.darvin.security.** { *; }

-repackageclasses
-allowaccessmodification
-flattenpackagehierarchy
-useuniqueclassmembernames

Post obfuscation, the code is transformed to following:

public class AppZygotePreload implements ZygotePreload {
    private static int a;
    private static long[] b;
 
    static {
        b = r0;
        long[] jArr = {1959339100, 1532247175, 496329810, 84648423, 1290985939, 139648832, 1345075629};
        a = ((int) jArr[6]) ^ 618995181;
    }
 
    public static String a(String str) {
        long[] jArr;
        StringBuilder sb = new StringBuilder();
        int i = ((int) b[1]) ^ 1532247175;
        while (i < str.length()) {
            char charAt = str.charAt(i);
            long[] jArr2 = b;
            int i2 = (((int) jArr2[2]) ^ 496366473) + charAt + (((int) jArr2[3]) ^ 84648422);
            int i3 = ~charAt;
            sb.append((char) ((i % (((int) jArr2[4]) ^ 1290935852)) ^ ((i2 + ((~(((int) jArr2[2]) ^ 496366473)) | i3)) - (((((int) jArr2[2]) ^ 496366473) + charAt) - (((charAt + (((int) jArr2[2]) ^ 496366473)) + (((int) jArr2[3]) ^ 84648422)) + ((~(((int) jArr2[2]) ^ 496366473)) | i3))))));
            do {
                jArr = b;
                int i4 = (((int) jArr[3]) ^ 84648422) + i + (((int) jArr[3]) ^ 84648422);
                int i5 = ~i;
                i = i + (((int) jArr[3]) ^ 84648422) + (((int) jArr[3]) ^ 84648422) + ((~(((int) jArr[3]) ^ 84648422)) | i5) + (((((int) jArr[3]) ^ 84648422) + i) - (i4 + ((~(((int) jArr[3]) ^ 84648422)) | i5)));
            } while ((a + (((int) jArr[3]) ^ 84648422)) % (((int) jArr[5]) ^ 139648834) == 0);
        }
        return sb.toString();
    }
 
    @Override // android.app.ZygotePreload
    public void doPreload(ApplicationInfo applicationInfo) {
        if (applicationInfo == null || Build.VERSION.SDK_INT < (((int) b[0]) ^ 1959339073)) {
            return;
        }
        System.loadLibrary(a("鞵鞻鞭鞱鞩鞻韰鞰鞺鞰"));
    }
}

Call to System.loadLibrary() expects a string input and dProtect has obfuscated the original string passed to this function. The original string is recovered during the runtime by calling a("鞵鞻鞭鞱鞩鞻韰鞰鞺鞰").

If we closely analyze the function a, the code is self-contained in the class, which means all the dependencies and logic required to execute this function is present in the same class. There is no dependency on any other class. Such scenarious are perfect to handle using emulation.

On using Katalina, the original string native-lib can be easily recovered:

python3 Katalina/main.py -v classes.dex -x 'Lcom/darvin/security/AppZygotePreload;->a(Ljava/lang/String;)Ljava/lang/String;' '鞵鞻鞭鞱鞩鞻韰鞰鞺鞰'     
 
// --- output ----
...
INFO     String created: Lcom/darvin/security/AppZygotePreload;.a(LL) -> native-lib
INFO     String created: native-lib
...

Patching Katalina

The existing implementation of Katalina does not handle all array use-cases properly (and mentioned in the Readme). dProtect’s code is using aget-wide and aput-wide instructions, and to handle them it required patching Katalina. The patched code is available here.