Home
Europe/Paris
BlogMarch 15, 2026 · 7 min read

From Feature Engineering to Context Engineering

Octave Olivetti
From Feature Engineering to Context Engineering
Tech stack
LLMllama.cppllama-swapQwen3.5DeepSeek-V3MoE
That does not mean the cloud is about to disappear, and it does not mean the problem is solved. But the idea that local models are still mostly clever toys with obvious ceilings is outdated. They still fall short on some agentic workflows and on complex tool use. But for extraction, classification, summarization, translation, enrichment, and a growing share of day-to-day data work, they are good enough to change how systems are designed. And that matters more than most people realize. For a CDP, a CRM, or any customer intelligence stack, local models create a different starting point. You can run useful inference close to the data, without sending everything upstream, without turning every request into an API call, and without making the cloud the default dependency for every AI feature from day one. More than an optimization, it's also:
  • Less data leaving the perimeter
  • Less exposure
  • Less privacy risk
  • Fewer third-party dependencies baked into the architecture from the start
A lot of customer intelligence is built one output at a time. One model for churn, another for time-to-buy, another for segmentation, another for propensity, and so on. Each score gets its own logic, its own pipeline, its own governance, its own maintenance burden. It works. But it fragments the system, and I do not think that pattern will hold forever. The fragmentation comes from the constraint itself. Every use case needs its own clean feature set, so every use case gets its own pipeline. As models improve, the more interesting direction will not be to build one narrow model for every attribute. It will be to recognize that a whole layer of data — CRM notes, support transcripts, product feedback, sales conversations, mixed signals that never fit neatly into a feature set — can now be enriched, structured, and made operational in ways that were not feasible or economical before.
Feature engineering starts from what you can cleanly extract.
Context engineering starts from all the data that is already there.
Feature engineering starts from what you can cleanly extract and compute. Context engineering starts from all the data that is already there, including the messy parts, and turns it into usable input at the moment it is needed. Take churn as a concrete example. Both approaches start from the same problem, but not from the same place.
Feature Engineering• Explanatory variables: purchase frequency, RFM delta, return history• Model trained on those variables, score published• One question = one model = one dedicated pipeline• The next question requires a new project
Context Engineering• Existing corpus: CRM notes, support transcripts, open tickets• Local model queried on demand against that corpus• "Is this customer showing disengagement signals?"• The next question uses the same infrastructure
That does not mean the death of structured modeling. Explicit scores remain the right answer for some use cases. They are easier to audit, easier to govern, easier to benchmark, easier to industrialize. But they are no longer the only option. A second pattern is emerging, where intelligence is produced from context at runtime instead of being pre-baked into a catalog of fixed outputs. In practice, that makes previously awkward data operational. This data already exists in almost every CRM and CDP, connected to nothing operational. It stops being dead weight and becomes usable input.
CRM notesFree-text comments from sales and pre-sales teams
Call transcriptsRecorded customer service calls and exchanges
Meeting summariesIn-store or on-site visit notes and follow-ups
Support ticketsIncident history, escalations, and resolutions
Product reviewsStructured or free-text customer feedback
Inbound emailsRequests, complaints, and unstructured exchanges
To me, that is much closer to the original promise of AI than the last few years of score factories. We will no longer be building automation around narrow tasks, but systems that can respond to new questions without needing a new model, a new feature set, and a new pipeline every time someone asks for something slightly different.
What makes this real, is that you can already run capable models inside your own infrastructure, on your own terms, with no customer data leaving the perimeter. The momentum is coming from open-weight releases, especially from Chinese labs. Alibaba's Qwen3.5 family now spans smaller dense models and more efficient Mixture-of-Experts variants like Qwen3.5-35B-A3B, which activates only 3 billion parameters out of 35 billion per token. DeepSeek-V3 pushed even further: 671B total, 37B activated. That ratio between capability and inference cost is exactly where local usage starts to become practical on real hardware. The tooling keeps up. llama.cpp supports low-bit quantization, multiple hardware backends, and CPU+GPU hybrid inference. People run useful models locally on imperfect machines by trading precision, memory, and latency in pragmatic ways. You do not need datacenter hardware anymore. And then there is the operational layer. Tools like llama-swap sit in front of local inference servers, switch models on demand, expose OpenAI-compatible endpoints, and unload models automatically after a TTL. The ecosystem is growing past the "pick one model and live with it" phase. You can test, compare, and swap models without rebuilding anything. What makes this interesting is not any single release. It is that several hard pieces now work at the same time:
  • Open MoE architectures
  • Quantized inference
  • Hybrid CPU+GPU loading
  • Hot-swappable model serving
  • Specialized models for specific tasks
The stack is rough in places, but it is real, it is working, and it is improving fast. Compliance-friendly options are no longer the weak option. Local inference is becoming the practical one. Although I grant you that the described architecture remains simplistic.
Once the models are good enough, the bottleneck moves. Right now, the constraint is often memory, it's the gap between "it runs" and "it runs without becoming annoying." But that is an engineering problem with a known trajectory. The more meaningful shift is in what the customer profile becomes now that cheap models can process, enrich, and reason over the raw data sitting inside it. It stops being just a place to store attributes. It becomes a context layer that can be queried dynamically, for many questions, across many workflows. A concrete example: a CRM lead wants to understand why a premium segment is showing signs of erosion.
Scoring approach1. Identify the relevant variables for the segment2. Build or adapt a dedicated pipeline3. Wait for execution, validate results4. Partial answer to a single question
Context layer1. Query against the segment's contact history2. Free-text notes and tickets included automatically3. Structured answer in a few minutes4. Same infrastructure for the next question
Explicit models do not go away. They become one layer of intelligence inside the system, not the entire system.
The real question is no longer whether the models are good enough. They are. The question is whether CDP and CRM teams are ready to stop treating every new business question as a reason to build another pipeline. Because the stack now allows a different approach: less centered on fixed outputs, more centered on usable context and runtime reasoning. This is not about replacing CRMs or CDPs. It is about evolving them. The teams that figure this out first will not just move faster. They will be answering questions that their competitors are still scoping as projects. If any of this resonates with what you are building, I am happy to discuss it. You can reach me at octave@olivetti.ai.
Share this post:
On this page