From Feature Engineering to Context Engineering
Local models are now good enough to change system design. The shift from feature engineering to context engineering makes previously unusable data operational without sending customer data to the cloud.

#Local AI is moving faster than the conversation around it
That does not mean the cloud is about to disappear, and it does not mean the problem is solved. But the idea that local models are still mostly clever toys with obvious ceilings is outdated.
They still fall short on some agentic workflows and on complex tool use. But for extraction, classification, summarization, translation, enrichment, and a growing share of day-to-day data work, they are good enough to change how systems are designed. And that matters more than most people realize.
#A different starting point
For a CDP, a CRM, or any customer intelligence stack, local models create a different starting point. You can run useful inference close to the data, without sending everything upstream, without turning every request into an API call, and without making the cloud the default dependency for every AI feature from day one.
More than an optimization, it's also:
- Less data leaving the perimeter
- Less exposure
- Less privacy risk
- Fewer third-party dependencies baked into the architecture from the start
#From feature engineering to context engineering
A lot of customer intelligence is built one output at a time. One model for churn, another for time-to-buy, another for segmentation, another for propensity, and so on. Each score gets its own logic, its own pipeline, its own governance, its own maintenance burden. It works. But it fragments the system, and I do not think that pattern will hold forever.
The fragmentation comes from the constraint itself. Every use case needs its own clean feature set, so every use case gets its own pipeline.
As models improve, the more interesting direction will not be to build one narrow model for every attribute. It will be to recognize that a whole layer of data (CRM notes, support transcripts, product feedback, sales conversations, mixed signals that never fit neatly into a feature set) can now be enriched, structured, and made operational in ways that were not feasible or economical before.
Feature engineering starts from what you can cleanly extract.
Context engineering starts from all the data that is already there.
Feature engineering starts from what you can cleanly extract and compute. Context engineering starts from all the data that is already there, including the messy parts, and turns it into usable input at the moment it is needed.
Take churn as a concrete example. Both approaches start from the same problem, but not from the same place.
That does not mean the death of structured modeling. Explicit scores remain the right answer for some use cases. They are easier to audit, easier to govern, easier to benchmark, easier to industrialize. But they are no longer the only option. A second pattern is emerging, where intelligence is produced from context at runtime instead of being pre-baked into a catalog of fixed outputs.
In practice, that makes previously awkward data operational. This data already exists in almost every CRM and CDP, connected to nothing operational. It stops being dead weight and becomes usable input.
To me, that is much closer to the original promise of AI than the last few years of score factories. We will no longer be building automation around narrow tasks, but systems that can respond to new questions without needing a new model, a new feature set, and a new pipeline every time someone asks for something slightly different.
#Why this is becoming practical now
What makes this real, is that you can already run capable models inside your own infrastructure, on your own terms, with no customer data leaving the perimeter. The momentum is coming from open-weight releases, especially from Chinese labs.
Alibaba's Qwen3.5 family now spans smaller dense models and more efficient Mixture-of-Experts variants like Qwen3.5-35B-A3B, which activates only 3 billion parameters out of 35 billion per token. DeepSeek-V3 pushed even further: 671B total, 37B activated. That ratio between capability and inference cost is exactly where local usage starts to become practical on real hardware.
The tooling keeps up. llama.cpp supports low-bit quantization, multiple hardware backends, and CPU+GPU hybrid inference. People run useful models locally on imperfect machines by trading precision, memory, and latency in pragmatic ways. You do not need datacenter hardware anymore.
And then there is the operational layer. Tools like llama-swap sit in front of local inference servers, switch models on demand, expose OpenAI-compatible endpoints, and unload models automatically after a TTL. The ecosystem is growing past the "pick one model and live with it" phase. You can test, compare, and swap models without rebuilding anything.
What makes this interesting is not any single release. It is that several hard pieces now work at the same time:
- Open MoE architectures
- Quantized inference
- Hybrid CPU+GPU loading
- Hot-swappable model serving
- Specialized models for specific tasks
The stack is rough in places, but it is real, it is working, and it is improving fast. Compliance-friendly options are no longer the weak option. Local inference is becoming the practical one. Although I grant you that the described architecture remains simplistic.
#What changes for customer intelligence
Once the models are good enough, the bottleneck moves. Right now, the constraint is often memory, it's the gap between "it runs" and "it runs without becoming annoying." But that is an engineering problem with a known trajectory.
The more meaningful shift is in what the customer profile becomes now that cheap models can process, enrich, and reason over the raw data sitting inside it. It stops being just a place to store attributes. It becomes a context layer that can be queried dynamically, for many questions, across many workflows.
A concrete example: a CRM lead wants to understand why a premium segment is showing signs of erosion.
Explicit models do not go away. They become one layer of intelligence inside the system, not the entire system.
#Where this is going
The real question is no longer whether the models are good enough. They are.
The question is whether CDP and CRM teams are ready to stop treating every new business question as a reason to build another pipeline. Because the stack now allows a different approach: less centered on fixed outputs, more centered on usable context and runtime reasoning.
This is not about replacing CRMs or CDPs. It is about evolving them. The teams that figure this out first will not just move faster. They will be answering questions that their competitors are still scoping as projects.
If any of this resonates with what you are building, I am happy to discuss it. You can reach me at octave@olivetti.ai.