Onward march

When OpenAI released ChatGPT, a generative AI large language model (LLM), it instantly gained popularity, reaching over a million users within five days. It demonstrated remarkable capabilities in language processing tasks, including summarisation, blog and article writing, question answering, information retrieval, reasoning and code generation.

ChatGPT has continued to evolve. In fact, it has become highly disruptive, transforming how people work, learn and interact with technology. The most impacted areas are content creation and media, education and learning.

The massive transformer-based deep learning framework had hundreds of billions of parameters in individual models, with training data consisting of trillions of tokens. These tokens, based on byte pair encoding (BPE) — roughly 0.75 tokens per word — were extracted from a vast corpus that included hundreds of billions of pages from websites, Wikipedia, code repositories, academic publications and other sources.

The framework’s continued dependence on Nvidia chips, along with increasing AI demand in the foreseeable future, contributed to soaring stock prices for the company.

Though a version of ChatGPT was available to the public, a pricing structure was in place for usage beyond a certain limit. The high pricing stems from the significant cost of developing the GPT or generative pretrained transformer model, which includes research and development as well as a model training environment consisting of high-speed interconnections among thousands of Nvidia GPU clusters and petabytes of storage. GPUs are used for high-speed processing of mathematical operations.

Despite an LLM’s ability to successfully and efficiently solve complex business problems, several major concerns within the business community have hindered full-scale adoption. The first is privacy, as data submitted as prompts is sent to OpenAI’s servers, raising concerns about data security and compliance with regulations. While OpenAI provides enterprise solutions with stricter data handling policies, some businesses remain cautious about using proprietary or sensitive data in AI interactions.

The second concern is pricing, which remains a barrier for some businesses. Certain applications require a high volume of calls to OpenAI via an application programming interface, leading to substantial costs that may be prohibitive for widespread enterprise adoption. For any technology to achieve long-term success, broad business adoption is essential.

Then came DeepSeek, an open-source LLM developed by a Chinese AI company founded in 2023 and backed by the CEO of a hedge fund. Nvidia stocks sank 17 per cent in one trading session, losing nearly $600 billion in market value. DeepSeek is disruptive primarily due to its pricing structure, which is less than one-tenth of ChatGPT’s, mainly because of its lower training expenses leveraging lower labour costs and more affordable hardware. Deep-Seek models R1 and V3 are at par with OpenAI and Meta’s most advanced models and can run standalone on local machines. Additionally, DeepSeek released an open-source version of its software, following a similar approach to Meta’s release of its LLaMA, which is a collection of LLMs.

Here are some features of DeepSeek that make it more efficient than ChatGPT.

Mixture of experts (MoE) architecture: DeepSeek utilises the mixture of experts paradigm unlike ChatGPT’s dense model. This approach optimises memory and computing resources. Instead of relying on a single massive feed-forward neural network, MoE distributes tasks across multiple specialised “expert” networks, each handling different input aspects. However, since all experts must be loaded into memory — even if only some are used — this can pose challenges in resource-constrained environments.

Multi-head latent attention (MLA) mechanism: This significantly reduces DeepSeek’s memory footprint. This allows the model to process information more efficiently while maintaining performance.

Efficient attention caching: This optimisation improves performance, particularly when processing large text sequences in prompts.

Parallel token prediction: Unlike traditional LLMs that predict one token at a time, DeepSeek strengthens performance by predicting multiple tokens simultaneously, increasing processing speed and efficiency.

DeepSeek has achieved state-of-the-art results on various benchmarks, including coding, problem-solving and language understanding, while requiring significantly less training time

and resources compared to other large language models. Notably, it was trained using just 2,048 Nvidia GPUs over two months, at a remarkably low cost of $5.6 million, in contrast to the $78 million often cited for comparable models.

Its efficiency allows it to run on consumer-grade GPUs, such as two Nvidia 4090s, enabling cost-effective deployment. DeepSeek’s open-source nature permits developers to download, modify and deploy its models locally, which is particularly beneficial for organisations prioritising data privacy or seeking to fine-tune AI models on proprietary datasets. Domain-specific fine-tuning ensures superior accuracy and relevance in fields such as healthcare, legal or finance.

DeepSeek V3 is trained using supervised fine-tuning, wherein the model learns from human-labelled datasets to enhance its language understanding and generation capabilities, optimising it for general language tasks. In contrast, DeepSeek R1 employs reinforcement learning (RL), specifically group relative policy optimisation, to improve its reasoning, complex problem-solving, mathematics and coding abilities.

LLMs generate responses to prompts based on the data they were trained on. DeepSeek, developed in China, is influenced by the country’s censorship policies, leading to suppression of information on sensitive topics. For instance, when asked about Arunachal Pradesh, DeepSeek responds, “Sorry, that’s beyond my current scope. Let’s talk about something else.”

Despite these limitations, DeepSeek’s emergence is both disruptive and noteworthy. Currently, several LLMs are available in the market besides ChatGPT and DeepSeek, including Google’s Gemini, Anthropic’s Claude, Meta’s LLaMa, and Microsoft’s Copilot. In the coming years, it is anticipated that more LLMs will emerge, inspired by models like DeepSeek, specialising in various application areas and advancing toward general intelligence.

The writer teaches Generative AI at Northeastern University in Boston, US. He is also principal data scientist at Humana, a healthcare insurance company

Onward march

ChatGPT has continued to evolve. In fact, it has become highly disruptive, transforming how people work, learn and interact with technology. The most impacted areas are content creation and media, education and learning

RELATED TOPICS

Donald Trump’s tariffs will crush India’s exporters, threatening livelihoods

India cozying up to China, Russia, risky to sell weapons: White House adviser

Centre proposes GST cut on small cars, insurance premiums in major revamp

India is paying price for PM’s (infamous) clean chit to China given publicly on June 19, 2020

Sensex jumps over 1,000 points, Nifty tops 24,953 level buoyed by proposed GST reforms

Where wealth glitters at night, crime rises by day, finds IIT Kharagpur study

South Block goes south in PMO shift: A barrenness comes to bear upon India’s power centre

Mother of all excuses in rare poll panel firefight: How EC tackled Rahul Gandhi

Foxconn's Bengaluru unit commences operation with iPhone 17 production

Cloudburst triggers flash floods in Kathua, seven dead as toll mounts in Kishtwar

Maharashtra governor CP Radhakrishnan named NDA’s vice presidential candidate

Akhilesh trains guns on Election Commission over affidavit row, gets support from Rahul