Publications

You can also find my articles on my Google Scholar profile.

Preprints

[WIP] Stage-Supervised Latent Reasoning for Single-Shot JavaScript Deobfuscation

2026

This WIP paper proposes a stage-aware latent reasoning framework for JavaScript deobfuscation. Rather than treating deobfuscation as a one-step translation, we convert intermediate outputs from the Webcrack deterministic deobfuscation pipeline into structured supervision for a Coconut-based model. The model learns from multi-stage rewrites during training but generates the final cleaned program in a single shot at inference time. Preliminary evaluation on JsDeObsBench shows the Coconut-based model achieves 50% syntactic validity and 80% semantic correctness among valid outputs, outperforming direct fine-tuning (15% syntax, 0% semantics) and a zero-shot baseline (25% syntax, 40% semantics).

Can LLMs Recover Program Semantics? A Systematic Evaluation with Symbolic Execution

2025

Rong Feng, Suman Saha

This paper presents a systematic evaluation of whether large language models can recover program semantics from obfuscated code. We designed and implemented a benchmark suite combining TUM Obfuscation Benchmarks, LLVM test suite, and algorithmic repositories with four obfuscation transformations: control-flow flattening, opaque predicates, arithmetic encoding, and branch encoding. We fine-tuned multiple LLMs (GPT-4.1-mini, Codestral, Ministral) under baseline and KLEE-enhanced training configurations using SMT constraints, path statistics, and test cases.

Overview

Paper

Can Large Language Models Simulate Symbolic Execution Output Like KLEE?

2025

Rong Feng, Vanisha Gupta, Vivek Patel, Viroopaksh Reddy Ernampati, Suman Saha

This paper investigates whether large language models can simulate symbolic execution outputs produced by tools like KLEE. We research program semantics and code security, focusing on deobfuscation, symbolic execution, and program equivalence. We designed preprocessing pipelines to automatically insert symbolic variables into C programs and compile to LLVM bitcode for analysis. We fine-tuned GPT-4o on a large dataset with KLEE artifacts to evaluate its ability to replicate path constraints and test case outputs.

Paper