Skip to main content

The AI Copilot Paradox: Subscription Waste, Cloud Risks, and the Case for Aggregation

I was recently contemplating a subscription to an AI copilot, and a thought struck me: my usage would likely be anything but steady. Some weeks I might rely on it heavily, drafting code, analysing data, or generating content; other weeks, barely at all. Some AI platforms do offer usage-based plans, where you pay per token or per request, rather than a fixed monthly fee—but most of the copilots I’ve seen do not.

This discrepancy is significant. On high-use days, I might quickly exhaust my tokens. On low-use days, unused credits simply go to waste. While a handful of platforms allow the “rollover” of unused tokens, it’s far from standard practice. Most copilot subscriptions are engineered for predictable, steady usage—presumably because providers value predictable income, and, frankly, customers often crave predictable expenditure. Humans are creatures of habit. We love predictability.


The Hidden Inefficiency of Multiple Subscriptions

The problem deepens when you consider multiple subscriptions. If, hypothetically, I subscribed to several copilot-like services, the waste could multiply. Most of the cost of these services is for calls to cloud inference engines. Many AI products, while branded differently, sit atop the same backend models. You could, quite literally, run out of credits on one service while another subscription sits idly with surplus capacity. This duplication represents not just wasted money, but wasted computational resources—a kind of “hidden inefficiency” baked into the AI economy.

Aggregators: A Potential Solution

The solution, I believe, lies in abstraction—a classic principle of software engineering. Introduce an aggregator, a broker of AI services. Users could pool credits or tokens, spending them across multiple front-ends as needed. This layer could also orchestrate usage, provide analytics, and optimise resource allocation. Instead of subscribing to multiple redundant services, users would manage a single credit pool, flexibly directing resources wherever they are most effective.

Experience tells me that by the time I imagine something, someone else has likely already implemented it. Indeed, services exist that aggregate access to multiple large language models or AI inference engines. OpenRouter, Portkey, and LiteLLM are a few examples. They provide unified APIs and usage-based billing, allowing developers to route requests intelligently across multiple models. Yet, these solutions are largely aimed at developers or enterprises. They do not yet offer a complete solution for “copilots” in the sense of day-to-day productivity tools for knowledge workers.


Beyond Cost: Security, Availability, and Trust

  1. Security and Data Privacy
    Every cloud request involves sending data outside your local environment. Sensitive code, intellectual property, or confidential documents may traverse third-party servers. Even with encryption and strict privacy policies, data exposure risk exists. 
  2. Availability and Reliability
    Dependence on a remote service means any downtime—planned or unplanned—can halt your workflow. Outages, rate limits, or degraded performance can make a copilot less reliable than an offline tool, especially in high-stakes or time-critical projects. 
  3. Vendor Lock-in and Portability
    Many copilots tie users to proprietary APIs or data formats. Switching platforms—or pooling resources across multiple providers—can be complex without an aggregator or abstraction layer.

Building Your Own Copilot: Accessible and Flexible

Aggregators are only one approach. Another option is to build your own AI copilot—an in-house system tailored to your workflow, coding standards, and security requirements. Coupled with LoRAs and RAG techniques, this approach allows updates without full retraining and ensures alignment with evolving workflows, languages, and frameworks.

Importantly, the skills required to build and maintain such a system are likely already present in your team: software engineers familiar with APIs, deployment, and automation can manage the workflow. Many LLMs are freely available for download, ranging from general-purpose models to specialised coding assistants. Open-source hosting and serving platforms—such as Text Generation Web UI, vLLM, or Llama.cpp—make it possible to run these models locally or on private infrastructure, giving teams full control over security, data privacy, and operational reliability.

How Often Do LLMs Need Updating?

  • Most modern LLMs are pre-trained and general-purpose, so routine tasks like coding assistance rarely require full retraining.
  • Updates become important when:
    • Major paradigm shifts occur in your domain (new languages, frameworks, architectures).
    • Security vulnerabilities or best practices evolve.
    • Your workflow or project focus changes, necessitating specialised knowledge.


Techniques for Updating Without Full Retraining

  • LoRA (Low-Rank Adaptation): Fine-tunes only a small subset of model weights to incorporate new knowledge efficiently.
  • RAG (Retrieval-Augmented Generation): Integrates an external knowledge base with the LLM, allowing access to up-to-date information without retraining the model itself.

When Radical Updates Are Needed

  • Language shifts, framework overhauls, or domain-specific regulations that fundamentally change coding patterns.

The Fundamental Question: Is Software Development Predictable?

This brings us to a broader question: Is software development predictable? The consultant’s classic answer applies: “It depends.” Projects expand, priorities shift, and human creativity introduces variability that no algorithm can fully constrain. The same applies to AI usage patterns: highly variable, context-dependent, and ultimately unpredictable.

The Way Forward

Subscriptions built for predictability will never fully satisfy users with fluctuating needs. Aggregators, orchestration, and flexible usage models are not just convenient—they are necessary for an AI economy that seeks efficiency without waste. They can also provide a layer of abstraction that mitigates security, reliability, and portability concerns by intelligently routing requests, controlling data flows, and offering fallback options when services are unavailable.

For teams or individuals with specialised requirements, building your own copilot is a practical alternative. Coupled with LoRAs and RAG techniques, it allows updates without full retraining and ensures alignment with evolving workflows, languages, and frameworks. With freely available models, open-source hosting platforms, and internal team expertise, creating a customised in-house copilot is increasingly achievable.

In short, the AI copilot paradox is real: the very tools designed to make us more productive can also multiply inefficiencies if consumption and risk are unmanaged. Predictable subscriptions satisfy human desire for certainty, but they fail to reflect the unpredictable rhythm of creative and technical work. Aggregation, abstraction, orchestration, and thoughtful DIY strategies together may represent the next evolutionary step in AI productivity—a way to harmonise usage, costs, security, and outcomes across a fragmented ecosystem.



Popular posts from this blog

10 Popular AI Prompt Formats

As the world continues to embrace the potential of artificial intelligence (AI), the quest for effective ways to communicate with these intelligent systems becomes increasingly important.  Over the last few months, I've encountered numerous discussions surrounding the usefulness of different prompt formats in harnessing the power of AI models. The necessity for clear and adaptable prompt structures has become abundantly clear.  In this article, I have a quick look at ten popular AI prompt formats, shedding light on their unique features, applications, and giving an example of each. Keyword-based  prompts: Prompting with specific keywords or phrases to guide the model's attention towards relevant information. Example: "Generate a summary of recent news articles about [keyword]." Template-based  prompts: Utilizing pre-defined templates to structure the input and guide the model's response generation. Templates can include placeholders for variables or specific conte...

Fostering Trust, Fairness, and Engagement

We live and work in a world where employee loyalty is increasingly hard to secure, and organisations need to examine a multitude of strategies to retain their top talent. One approach, salary transparency, has emerged as a powerful tool for fostering trust, fairness, and engagement within the workplace. Let's explore how this practice can significantly impact employee retention. Trust is the cornerstone of any strong employer-employee relationship. When it comes to compensation, transparency can play a crucial role in building and maintaining this trust. According to a study by PayScale, 66% of employees at organisations with transparent pay practices expressed trust in company leadership. This trust translates directly into improved retention rates, with transparent organisations experiencing up to 25% lower turnover. Salary transparency goes beyond just disclosing numbers; it is about creating a culture of fairness and equity. When employees understand how their pay is determined...

Some ChatGPT Prompts

My inbox has turned into an AI listicle hotspot lately.  It seems that AI tools, in particular ChatGPT, can solve just about any problem you can think of and many that you probably didn't even know existed.   I thought if you can't beat them, you might as well join them, so I've used ChatGPT (of course) to produce some prompts of my own below. I've done most of the hard work by producing the prompts. I'm expecting others to use them to actually solve the problems and apply the the real world solutions. Please do let me know how you get on. The prompts - simply copy and paste them into ChatGPT (other AI systems are available): Prompt: "Generate a Single Line of Code to Decipher Dark Matter's Cryptic Language" Prompt: "Explain Quantum Entanglement in a way that a lay-person would understand" Prompt: "Develop an AI Interpreter in Python for Decoding Alien Communications" Prompt: "Provide a simulation of Black Hole Fusion" Pro...