My Cursor bill last month: $340.
I dug into the API logs. Over 60% of the tokens were being sent to the LLM were:
- Boilerplate I'd copied from Stack Overflow three years ago
- The same database helper function, 4 slightly different times
- An entire test file that has nothing to do with what I was asking
My AI tool was optimizing for similarity -- and similarity is not the same as information.
The problem with every AI coding tool
Cursor, Copilot, Claude Code, Cody -- they all select context the same way:
- Embed your query
- Find the top-K similar chunks
- Stuff them into the context window until full
- Cut everything else
The result?
Query: "How does payment processing work?"
What your AI actually sees:
auth.py (similarity: 0.94) <- useful
auth_test.py (similarity: 0.91) <- copies auth logic
auth_utils.py (similarity: 0.89) <- more auth copies
auth_v2.py (similarity: 0.87) <- even more auth
...
payments.py (similarity: 0.41) <- NEVER LOADED, cut by budget
Your AI is answering questions about payment processing without having read the payments file. It's hallucinating from auth code.
This is not a prompt engineering problem. It's a math problem.
What we built
Entroly is an open-source context optimizer that intercepts requests between your IDE and the LLM. It replaces Top-K with three algorithms running in a Rust engine:
Algorithm 1: KKT-optimal knapsack bisection
Context selection is a 0/1 knapsack problem. You have N code fragments with information scores and token costs. You want the maximum information within your budget.
We solve this with KKT dual bisection -- O(30N) -- combined with submodular diversity selection that gives a (1-1/e) ~ 63% optimality guarantee.
The diversity constraint is the key insight: instead of 4 versions of your auth module, you get auth + payments + DB schema + API layer -- one fragment from each area of your codebase.
Algorithm 2: O(1) SimHash deduplication
Every fragment gets a 64-bit SimHash fingerprint. Near-duplicate detection uses Hamming distance <= 3 via LSH buckets. Constant time, regardless of codebase size.
Copy-pasted code, auto-generated boilerplate, and lightly-edited duplicates are removed before they consume token budget.
Algorithm 3: PRISM -- online RL that learns what's actually useful
After each LLM response, we measure how much of the injected context the model actually referenced (trigram + identifier overlap scoring).
This feeds a REINFORCE loop. The dual variable from the forward knapsack constraint serves as a per-item baseline -- so the RL gradient is guaranteed consistent with the selection math. Weights update via a spectral natural gradient on the 4x4 gradient covariance.
In plain English: the more you use it, the better it gets at choosing what to include.
The numbers
On a 50K LOC Python/TypeScript monorepo, one month of usage:
| Metric | Naive stuffing | Top-K (Cody-style) | Entroly |
|---|---|---|---|
| Tokens per request | baseline | -18% | -78% |
| Files represented | 8 | 11 | 100% |
| Latency overhead | 0ms | ~40ms | <10ms |
| Monthly API cost | $340 | $280 | $75 |
Your AI sees your entire codebase. You spend 78% less. The Rust engine adds under 10ms per request.
How to try it (60 seconds)
pip install entroly
# For Cursor / Claude Code (MCP server):
entroly init
# Generates .cursor/mcp.json automatically
# For anything else (transparent HTTP proxy):
pip install entroly[proxy]
entroly proxy --quality balanced
# Point your AI tool to http://localhost:9377/v1
# See what it's doing:
entroly demo # before/after comparison on your actual project
entroly dashboard # live metrics at localhost:9378
It auto-indexes your codebase via git ls-files, builds dependency graphs, and starts working immediately. No YAML, no config files, no embeddings database to set up.
The bonuses you get for free
Built-in security scanning. 55 SAST rules (SQL injection, hardcoded secrets, command injection, 8 CWE categories) run on selected context before your AI sees it. If you're about to ask your AI to modify sensitive code, it flags it.
Entropy anomaly detection. We run robust MAD-based Z-scores across directory groups to flag code that's statistically unusual compared to its neighbors -- copy-paste errors, dead stubs, and suspicious auth deviations surface without any LLM call.
Codebase health grades. Clone detection, dead symbol finder, god file detection. Run entroly health to get an A-F grade for your project.
It's all open source (MIT)
The entire Rust core (19 modules, PyO3 bridge) is on GitHub:
-> https://github.com/juyterman1000/entroly
If you're building AI dev tools, the knapsack selection, SimHash dedup, and dependency graph modules are all designed to be composable -- fork it and strip for parts.
PRs, issues, and savage code reviews all welcome.
What's your current strategy for context management with your AI coding tool? I'd love to hear what you're using in the comments.
United States
NORTH AMERICA
Related News
CBS News Shutters Radio Service After Nearly a Century
5h ago
White House Unveils National AI Policy Framework To Limit State Power
5h ago
Officer Leaks Location of French Aircraft Carrier With Strava Run
5h ago
Microsoft Says It Is Fixing Windows 11
5h ago
NASA's Hubble Unexpectedly Catches Comet Breaking Up
5h ago