Website Fingerprinting: How Tor and VPN Users Can Still Be Tracked
- Shilpi Mondal

- Jan 14
- 6 min read
SHILPI MONDAL | DATE: JANUARY 13 ,2025
If you think your organization is invisible because you force all remote traffic through an encrypted tunnel, you might want to reconsider that assumption.
We tend to visualize encrypted connections whether via a corporate VPN or the Tor network as opaque pipes that shield us from prying eyes. The payload is indeed scrambled; a math-based lock keeps the actual data unreadable. But there’s a catch. While the “what” is hidden, the “how” remains dangerously visible. Through a technique called Website Fingerprinting (WF), eavesdroppers can identify exactly which websites a user is visiting by analyzing the shape, timing, and volume of the traffic, often with terrifying accuracy. According to A Comprehensive Survey of Website Fingerprinting Attacks and Defenses in Tor: Advances and Open Challenges published on arXiv in 2025, even strong cryptographic protections such as end-to-end encryption do not conceal traffic metadata like timing, direction, and size patterns, which adversaries exploit to infer visited sites.
The "Envelope" Problem: How Metadata Betrays You
The fundamental mechanics of the web make true anonymity difficult. When a browser
loads a page-say, a Salesforce dashboard or a competitor’s news site-it requests a specific cascade of resources: HTML, CSS, JavaScript, and images.

This request-response cycle creates a unique traffic signature. Even inside an encrypted tunnel, the sequence of packets behaves like a fingerprint. As noted in research from the NDSS Symposium, an adversary analyzing packet timing, size, and direction can map these patterns to specific websites without ever cracking the encryption keys. It’s effectively a classification game. The attacker captures a “trace” a time-ordered sequence of packets and compares it against a known library of website signatures. In the past, this required manual statistical analysis. According to Adaptive Context-Aware Multi-Tab Website Fingerprinting Using Hierarchical Deep Learning, a 2025 peer-reviewed study published in the Journal of Network and Computer Applications, the threat has evolved into a highly automated discipline, where deep learning models are used to classify encrypted traffic even when multiple websites are loaded simultaneously across browser tabs.
The AI Escalation: From Statistics to Deep Learning
A decade ago, you might have been safe. Early attempts using statistical methods like Naive Bayes achieved a laughable 3% accuracy against Tor traffic. Security teams breathed a sigh of relief, assuming the noise of the internet was enough to hide the signal.
That complacency is now dangerous. The introduction of Convolutional Neural Networks (CNNs) has completely shifted the balance of power. A landmark study on Deep Fingerprinting (DF) demonstrated that CNNs could achieve over 98% accuracy on undefended Tor traffic. These models don't just look for obvious patterns; they extract latent features from raw traffic traces that human analysts would never spot.
Even more concerning for enterprise defense is the "Tik-Tok" attack (no relation to the social platform). Research published in Proceedings on Privacy Enhancing Technologies showed that deep learning models could exploit the timing of packet bursts the micro-delays between groups of packets-to bypass defenses that only focused on padding packet sizes.
Why VPNs Are Often Less Secure Than Tor
Here is the uncomfortable truth for the corporate sector: Your expensive enterprise VPN might be leaking more metadata than the free, volunteer-run Tor network.
Tor splits traffic into fixed-size 512-byte cells and routes it through three hops, which unintentionally standardizes some traffic features. VPNs, by contrast, are built for speed. They typically use a single hop and lack native traffic-shaping mechanisms.

The data supports this grim view. An evaluation of VPN fingerprinting by Rochester researchers found that the WireGuard protocol widely praised for its modern cryptography could be fingerprinted with 95% accuracy based on packet direction alone.
The vulnerability extends to video content as well. Because streaming services use Variable Bit Rate (VBR) encoding to save bandwidth (sending more data for action scenes, less for static shots), the traffic pattern mimics the video itself. As far back as the classic Slingbox studies, and confirmed by modern traffic analysis research, an eavesdropper can identify the specific movie or genre an employee is watching through the corporate tunnel.
Tor's Specific Headaches: Entry Guards and Onions

While Tor offers a higher baseline of anonymity, it isn't immune. The network relies on "entry guards"-stable relays that a client uses for months. While this protects against some attacks, research on entry guard selection indicates that a persistent local adversary monitoring the connection to a guard can build a massive longitudinal profile of a user.
Furthermore, if your organization utilizes .onion sites (Hidden Services) for secure drops or internal communication, be aware that these are highly conspicuous. The complex handshake required to establish a rendezvous circuit is distinct from normal web traffic. USENIX Security research reveals that an adversary can identify hidden service activity with over 99% accuracy just by observing the first 20 cells of a connection.
The Cost of Defense: Bandwidth vs. Privacy
What stops us from fixing a known weakness? It comes down to three things locked together: how private data stays, how fast it moves, time delays, plus how much can flow at once.
Faster safeguards tend to slow things down more than expected. Heavy protection weighs hard on speed.
Lightweight Defenses:
Methods like WTF-PAD inject dummy packets to fill gaps in traffic. They cause zero latency but increase bandwidth usage by roughly 60%. Unfortunately, modern deep learning models can often see right through this padding.
Heavy Defenses:
Strategies like Tamaraw force traffic into a Constant Bit Rate (CBR). This kills the fingerprint but can increase page load times by 200% a trade-off most users simply won't accept.
The Real-World "Open World" Constraint
Before we declare the death of privacy, we must look at the "Open World" scenario. In a lab, identifying one site out of 100 is easy. In the real world, distinguishing one site out of billions is mathematically harder due to the "base rate fallacy."
As demonstrated in large-scale empirical research on website fingerprinting, accuracy metrics that appear strong in laboratory settings break down when applied to real-world Internet traffic. In Website Fingerprinting at Internet Scale, Panchenko et al. show that in an open-world environment where users may access hundreds of thousands or millions of possible websites even classifiers with very high nominal precision suffer from the base-rate fallacy, producing substantial numbers of false positives simply due to the overwhelming volume of non-monitored traffic (Panchenko et al., NDSS 2016). As a result, website fingerprinting does not scale effectively as a dragnet surveillance technique. Instead, the study concludes that its practical value lies in targeted use, where fingerprinting serves as a confirmation mechanism against individuals already under suspicion rather than a broad population-level monitoring tool.
Side Channels: The Hardware Threat
Finally, sophisticated attackers are moving beyond the network entirely. We are seeing the rise of Cache Occupancy attacks, where malicious JavaScript in one browser tab spies on the CPU's cache usage to infer what is happening in another, encrypted tab. Finding its way around network padding completely, this method zeroes in on the machine handling information instead of what moves through cables.
Key Takeaways
Encryption isn't anonymity:
Even when tools such as WireGuard or OpenVPN shield what you send, bits of information slip out. These leaks include how big the packets are, which way they travel, and exactly when they move. That hidden trail might be enough to expose who is behind them.
AI is flipping the script:
Deep learning models, such as Deep Fingerprinting, now nail encrypted traffic identification with over 98% accuracy, making those old-school statistical defenses pretty much useless.
VPNs have weak spots:
Most commercial VPNs skip traffic shaping, which makes them sitting ducks for fingerprinting detectable at 95% accuracy, even more than Tor.
Defenses come at a cost:
The best countermeasures, like Constant Bit Rate, can triple your page load times, which is why they're tough to roll out widely.
Hardware betrays you too:
Secure your network all you want, but side-channel attacks like Cache Occupancy can still spy on your browsing through CPU patterns.





Comments