Deserialization as a Persistent RCE Primitive: Why Every Language Keeps Reinventing the Same Mistake

Jason

Jason

@Jason

In January 2015, Chris Frohoff and Gabriel Lawrence presented research at AppSec California that would reshape how the industry understood Java security. They had discovered that by carefully chaining together method calls across commonly-used Java libraries , Apache Commons Collections, Spring Framework, Groovy , an attacker could construct a serialized Java object that, when deserialized by any application that included those libraries on its classpath, would execute arbitrary operating system commands. No special vulnerability in the application itself was needed. The application merely had to deserialize untrusted data, and the right libraries had to be present.

The tool they released, ysoserial, packaged these "gadget chains" into a simple command-line utility. Point it at a target, pick a chain that matches the target's dependency tree, and it generates a serialized payload that pops a shell. Within months, security researchers had found deserialization endpoints in nearly every major Java application server, middleware platform, and enterprise application. WebLogic, WebSphere, JBoss, Jenkins, Spring, countless custom applications , they were all vulnerable, and many had been for years.

The reaction from the Java community was intense but instructive. The vulnerability was not in any specific library. Apache Commons Collections did not have a "bug" in the traditional sense. Each individual class behaved as documented. The gadget chain exploited the composition of behaviors across libraries to achieve an emergent capability , arbitrary code execution , that no individual library intended to provide. This made the problem fundamentally difficult to fix at the library level. You could not simply patch Commons Collections and move on, because the gadget chain mechanism worked with any sufficiently-expressive set of classes, and the Java standard library itself contained enough building blocks for several chains.

The Pattern Is Universal

What makes deserialization particularly insidious as a vulnerability class is that the same fundamental mistake has been made independently by virtually every programming language that provides native object serialization.

Java has ObjectInputStream, the original and most extensively researched deserialization attack surface. Gadget chains leveraging Commons Collections, Spring, Groovy, ROME, and the JDK itself have been publicly documented for a decade. Oracle has gradually added deserialization filters (JEP 290, JEP 415), but the default behavior remains permissive, and the filter mechanism requires explicit configuration that most applications do not implement.

Python has pickle, whose documentation has contained a warning , "Never unpickle data received from an untrusted or unauthenticated source" , since at least Python 2.4. The __reduce__ method, which controls how objects are reconstructed during unpickling, can return arbitrary callables, making pickle exploitation trivially easy: pickle.loads(malicious_data) can execute os.system("rm -rf /") with no gadget chain needed. Despite this, pickle remains widely used in machine learning pipelines (model serialization), scientific computing (result caching), and web applications (session storage, task queues). The convenience is too great, the warning too abstract, and the alternatives too inconvenient for most developers to abandon it voluntarily.

PHP has unserialize(), where exploitation relies on magic methods (__wakeup, __destruct, __toString) that are automatically called during deserialization. The PHP ecosystem's history with insecure deserialization is extensive , WordPress, Joomla, Drupal, Laravel, and virtually every major PHP framework has had deserialization vulnerabilities. PHP 7 added partial protections (the allowed_classes option for unserialize()), but the default behavior remains unrestricted, and legacy code rarely uses the safer option.

Ruby has Marshal.load, which is similarly dangerous. The 2013 Rails YAML deserialization vulnerability (CVE-2013-0156) was a watershed moment for the Ruby community. Rails accepted YAML-encoded parameters by default, and YAML deserialization in Ruby can instantiate arbitrary objects, leading to trivial RCE. The fix was to remove YAML parameter parsing from Rails, but the underlying Marshal.load and YAML.load remain available and dangerous.

.NET has BinaryFormatter, SoapFormatter, ObjectStateFormatter, and other serializers that reconstruct arbitrary type hierarchies. Microsoft has been gradually marking these as dangerous , BinaryFormatter was marked obsolete in .NET 5 and disabled by default in .NET 9 , but the migration path for existing applications is complex, and many enterprise .NET applications still rely on binary serialization for session state, ViewState, and inter-service communication.

The universality of this pattern tells us something important: this is not a series of independent implementation mistakes. It is a design-level trade-off that every language community makes, and that is consistently resolved in favor of expressiveness and convenience until exploitation forces a reassessment.

What Makes Gadget Chains Work

The concept of a "gadget chain" deserves closer examination because it is the mechanism that makes deserialization exploitation both powerful and difficult to defend against.

A gadget, in this context, is a code fragment within an existing library that performs a useful (to the attacker) side effect when invoked. Examples include:

  • A class whose toString() method triggers a method lookup via reflection
  • A comparator whose compare() method invokes a Transformer chain
  • A class whose finalizer or destructor initiates a network connection
  • A proxy handler that dispatches arbitrary method calls based on deserialized state

Individually, none of these behaviors is a vulnerability. A toString() method that uses reflection is a legitimate implementation choice. A comparator that applies transformations is a reasonable design pattern. The vulnerability emerges when an attacker can compose these behaviors by constructing a serialized object graph that, when deserialized, triggers a sequence of method calls that chains from an automatic entry point (constructor, finalizer, readObject, __wakeup) through a series of intermediate gadgets to a terminal gadget that executes a system command, makes a network request, or writes to the filesystem.

stateDiagram-v2 [*] --> Deserialize: Untrusted byte stream Deserialize --> EntryGadget: Runtime calls readObject/wakeup/reduce EntryGadget --> IntermediateA: Method dispatch via reflection/proxy IntermediateA --> IntermediateB: Transformer chain or callback IntermediateB --> TerminalGadget: Runtime.exec() / ProcessBuilder / os.system TerminalGadget --> RCE: Arbitrary command execution note right of EntryGadget: Automatic invocation , \nno application code involved note right of TerminalGadget: Uses only classes already\non the classpath/import path

The critical insight is that the attacker does not need to inject new code. All the code that executes is already present in the application's runtime , it is part of the standard library or third-party dependencies. The attacker merely provides a data structure that causes existing code to execute in an order and with arguments that achieve the attacker's objective.

This is why deserialization vulnerabilities are so persistent. You cannot eliminate the risk by removing one library. You can only eliminate specific known chains. But the space of possible chains is combinatorially large, and new chains are regularly discovered in libraries that were previously considered safe.

The Real-World Attack Surface Is Larger Than You Think

When security teams think about deserialization risk, they typically picture public API endpoints that accept serialized data. This is the most obvious attack surface, but it is far from the only one. In modern architectures, deserialization occurs at many implicit boundaries:

Message queues. Systems that use RabbitMQ, Kafka, or SQS with native serialization format messages (Java serialization in Spring AMQP default configurations, pickle in Celery's default serializer before 4.0) deserialize every message that arrives. If any producer in the message ecosystem is compromised or if the queue itself is accessible, every consumer becomes an RCE target.

Cache systems. Redis, Memcached, and other caches often store serialized objects. If the cache is accessible from a compromised system (which it usually is, since caches typically have minimal authentication), an attacker can replace cached values with malicious serialized payloads that will be deserialized by every application that reads from the cache.

Session stores. Applications that store session data in serialized form (PHP's default session handler, Java's HttpSession serialization, Python's pickle-based session backends) create a deserialization point for every request that loads a session. If the session store is compromised or if the session token format allows content injection, deserialization-based RCE becomes possible through the session loading path.

Internal APIs. Microservice architectures where services communicate using native serialization (Java RMI, .NET Remoting, Python pickle-based RPC) are especially dangerous because the internal network location creates a false sense of trust. "This service only receives data from other internal services" is not a meaningful security property when any one of those services might be compromised.

Webhooks and callbacks. Systems that receive and process serialized payloads from external sources , webhook handlers, payment processor callbacks, partner API integrations , often deserve untrusted data more carefully than internal boundaries. But the deserialization step itself may not be apparent in code review because it happens inside a framework's request parsing layer rather than in application code.

What Actually Changes the Risk

The mitigations for deserialization fall into several categories, and their real-world effectiveness varies significantly:

Format replacement is the most effective mitigation. Replacing native serialization with JSON, Protocol Buffers, MessagePack, or another format that does not support arbitrary type reconstruction eliminates the gadget chain mechanism entirely. JSON does not have a readObject() method. Protocol Buffers do not invoke constructors based on attacker-supplied type information. This is a categorical fix, not a whack-a-mole game against individual chains.

The challenge is migration cost. Applications that have used native serialization for years , for session storage, for cache entries, for inter-service communication, for message queue payloads , face a significant engineering effort to switch formats. The serialized data may include complex object graphs that are not trivially representable in JSON. Rolling the change out requires coordinating producers and consumers, handling backward compatibility during the transition, and validating that no edge cases are lost. This cost is real, and it is why many organizations accept the risk rather than invest in the migration.

Type allowlists are a weaker but more incrementally deployable mitigation. Java's deserialization filter mechanism (introduced in JDK 9 and backported to 8u121) allows applications to specify which classes are permitted during deserialization. If the filter is strict , allowing only the specific classes the application expects , it blocks gadget chains that rely on unexpected classes. But maintaining an accurate allowlist requires understanding every class that might legitimately appear in serialized data, which is difficult in complex applications with deep dependency trees.

Runtime sandboxing limits the damage a successful gadget chain can cause. If the process performing deserialization runs without shell execution permissions, without network egress, and with minimal filesystem access, then even a successful RCE gadget chain cannot achieve meaningful impact. This is defense in depth at the OS/container level, and it is valuable because it works regardless of which specific chain the attacker uses.

Dependency minimization reduces the number of available gadget sources. This is the software engineering equivalent of reducing your attack surface: if a library is not on the classpath, its classes cannot be used as gadgets. The ysoserial project lists gadget chains for specific libraries , if you do not use Commons Collections, the Commons Collections chains are irrelevant. But dependency trees in modern applications are deep, and transitive dependencies introduce gadget-capable classes that the application developer never directly chose.

The Structural Lesson

Deserialization vulnerabilities persist because they represent a fundamental tension in programming language design: the desire for seamless object persistence versus the requirement for safe data handling. Every language designer who has implemented native serialization has made the same choice , prioritize developer convenience by supporting rich, arbitrary type reconstruction , and every language community has eventually discovered that this choice creates an exploitable attack surface.

The history of deserialization security is not a history of novel attack techniques. It is a history of the same technique being rediscovered in each new context. The Java community learned it in 2015. The Ruby community learned it in 2013. The Python community has known since pickle's documentation first carried its warning, and has collectively decided to accept the risk anyway. The .NET community is learning it now, as Microsoft gradually deprecates BinaryFormatter.

The lesson is not "deserialize carefully." The lesson is that native object serialization is, by design, a code execution mechanism, and treating a code execution mechanism as a data interchange format will always produce vulnerabilities. The safe path is not better deserialization. It is less deserialization , replacing rich, type-aware serialization with constrained, schema-validated data formats wherever the engineering cost of migration is justified by the risk reduction.

For systems where native serialization cannot be eliminated, the honest assessment is that you are running a known-dangerous mechanism and your security depends on the completeness of your allowlists, the restrictiveness of your runtime permissions, and the absence of novel gadget chains in your dependency tree. This is a defensible position, but it requires explicit acknowledgment and ongoing investment, not the comfortable assumption that "we validated our inputs" is sufficient.

Integrate Axe:ploit into your workflow today!