I was recently contemplating a subscription to an AI copilot, and a thought struck me: my usage would likely be anything but steady. Some weeks I might rely on it heavily, drafting code, analysing data, or generating content; other weeks, barely at all. Some AI platforms do offer usage-based plans, where you pay per token or per request, rather than a fixed monthly fee—but most of the copilots I’ve seen do not.
This discrepancy is significant. On high-use days, I might quickly exhaust my tokens. On low-use days, unused credits simply go to waste. While a handful of platforms allow the “rollover” of unused tokens, it’s far from standard practice. Most copilot subscriptions are engineered for predictable, steady usage—presumably because providers value predictable income, and, frankly, customers often crave predictable expenditure. Humans are creatures of habit. We love predictability.
The Hidden Inefficiency of Multiple Subscriptions
The problem deepens when you consider multiple subscriptions. If, hypothetically, I subscribed to several copilot-like services, the waste could multiply. Most of the cost of these services is for calls to cloud inference engines. Many AI products, while branded differently, sit atop the same backend models. You could, quite literally, run out of credits on one service while another subscription sits idly with surplus capacity. This duplication represents not just wasted money, but wasted computational resources—a kind of “hidden inefficiency” baked into the AI economy.
Aggregators: A Potential Solution
The solution, I believe, lies in abstraction—a classic principle of software engineering. Introduce an aggregator, a broker of AI services. Users could pool credits or tokens, spending them across multiple front-ends as needed. This layer could also orchestrate usage, provide analytics, and optimise resource allocation. Instead of subscribing to multiple redundant services, users would manage a single credit pool, flexibly directing resources wherever they are most effective.
Experience tells me that by the time I imagine something, someone else has likely already implemented it. Indeed, services exist that aggregate access to multiple large language models or AI inference engines. OpenRouter, Portkey, and LiteLLM are a few examples. They provide unified APIs and usage-based billing, allowing developers to route requests intelligently across multiple models. Yet, these solutions are largely aimed at developers or enterprises. They do not yet offer a complete solution for “copilots” in the sense of day-to-day productivity tools for knowledge workers.
Beyond Cost: Security, Availability, and Trust
-
Security and Data Privacy
Every cloud request involves sending data outside your local environment. Sensitive code, intellectual property, or confidential documents may traverse third-party servers. Even with encryption and strict privacy policies, data exposure risk exists. -
Availability and Reliability
Dependence on a remote service means any downtime—planned or unplanned—can halt your workflow. Outages, rate limits, or degraded performance can make a copilot less reliable than an offline tool, especially in high-stakes or time-critical projects. -
Vendor Lock-in and Portability
Many copilots tie users to proprietary APIs or data formats. Switching platforms—or pooling resources across multiple providers—can be complex without an aggregator or abstraction layer.
Building Your Own Copilot: Accessible and Flexible
Aggregators are only one approach. Another option is to build your own AI copilot—an in-house system tailored to your workflow, coding standards, and security requirements. Coupled with LoRAs and RAG techniques, this approach allows updates without full retraining and ensures alignment with evolving workflows, languages, and frameworks.
Importantly, the skills required to build and maintain such a system are likely already present in your team: software engineers familiar with APIs, deployment, and automation can manage the workflow. Many LLMs are freely available for download, ranging from general-purpose models to specialised coding assistants. Open-source hosting and serving platforms—such as Text Generation Web UI, vLLM, or Llama.cpp—make it possible to run these models locally or on private infrastructure, giving teams full control over security, data privacy, and operational reliability.
How Often Do LLMs Need Updating?
- Most modern LLMs are pre-trained and general-purpose, so routine tasks like coding assistance rarely require full retraining.
- Updates become important when:
- Major paradigm shifts occur in your domain (new languages, frameworks, architectures).
- Security vulnerabilities or best practices evolve.
- Your workflow or project focus changes, necessitating specialised knowledge.
Techniques for Updating Without Full Retraining
- LoRA (Low-Rank Adaptation): Fine-tunes only a small subset of model weights to incorporate new knowledge efficiently.
- RAG (Retrieval-Augmented Generation): Integrates an external knowledge base with the LLM, allowing access to up-to-date information without retraining the model itself.
When Radical Updates Are Needed
- Language shifts, framework overhauls, or domain-specific regulations that fundamentally change coding patterns.
The Fundamental Question: Is Software Development Predictable?
This brings us to a broader question: Is software development predictable? The consultant’s classic answer applies: “It depends.” Projects expand, priorities shift, and human creativity introduces variability that no algorithm can fully constrain. The same applies to AI usage patterns: highly variable, context-dependent, and ultimately unpredictable.
The Way Forward
Subscriptions built for predictability will never fully satisfy users with fluctuating needs. Aggregators, orchestration, and flexible usage models are not just convenient—they are necessary for an AI economy that seeks efficiency without waste. They can also provide a layer of abstraction that mitigates security, reliability, and portability concerns by intelligently routing requests, controlling data flows, and offering fallback options when services are unavailable.
For teams or individuals with specialised requirements, building your own copilot is a practical alternative. Coupled with LoRAs and RAG techniques, it allows updates without full retraining and ensures alignment with evolving workflows, languages, and frameworks. With freely available models, open-source hosting platforms, and internal team expertise, creating a customised in-house copilot is increasingly achievable.
In short, the AI copilot paradox is real: the very tools designed to make us more productive can also multiply inefficiencies if consumption and risk are unmanaged. Predictable subscriptions satisfy human desire for certainty, but they fail to reflect the unpredictable rhythm of creative and technical work. Aggregation, abstraction, orchestration, and thoughtful DIY strategies together may represent the next evolutionary step in AI productivity—a way to harmonise usage, costs, security, and outcomes across a fragmented ecosystem.



