
A Chinese artificial intelligence lab called DeepSeek has caused surprise in Silicon Valley by having its eponymous large language model become one of the biggest competitors to the ChatGPT model developed by the US firm OpenAI.
New models that have been released this month are said to be both extremely fast and highly affordable.
The DeepSeek-R1, the latest model created with a reduced number of chips, is gaining ground against industry leaders like OpenAI, Google, and Meta, causing a significant drop in Nvidia's chipmaking stocks on Monday.
This is the latest update on a cutting-edge technology revealed by a Chinese innovator.
What is the origin of DeepSeek?
According to reports, the company, established in July 2023, was founded by Liang Wenfeng, an engineer specializing in information and electronics and a graduate of Zhejiang University, which is based in Hangzhou, China.
This project was part of the incubation program of High-Flyer, a fund that Liang had established in 2015. Liang, similar to other prominent figures in the field, hopes to attain the level of "artificial general intelligence," which would be able to match or surpass human capabilities in various tasks.
Operating independently, DeepSeek's funding structure permits it to pursue demanding AI projects without external investor pressure, and it focuses on prioritizing long-term research and development.
DeepSeek's team comprises young graduates from top Chinese universities, with a company hiring process that distinguishes between candidates based on technical expertise rather than professional experience.
In essence, it represents a fresh perspective on the development of artificial intelligence models.
DeepSeek's journey began in November 2023 with the launch of DeepSeek Coder, an open-source model geared towards coding activities.
This was succeeded by DeepSeek LLM, which aimed to compete with other leading language models. DeepSeek-V2, released in May 2024, experienced significant popularity due to its impressive performance and affordability.
It also led to a reduction of AI model prices by major Chinese tech firms, including ByteDance, Tencent, Baidu, and Alibaba.
The extended capacity of these Advanced DeepSeek models enables the detection and identification of various types of data with enhanced efficiency and precision.
DeepSeek-V2 was later replaced by DeepSeek-Coder-V2, a more advanced model with 236 billion parameters.
Designed for complex coding tasks, the model provides up to a 128,000 token input context window.
A token is a unit of text. This unit can often be a word, a particle (such as "artificial" and "intelligence") or even a character. For example: "Artificial intelligence is great!" may consist of four tokens: "Artificial," "intelligence," "is," "great," and "!".
The maximum input text length that the model can process at the same time is 128,000 tokens within a context window.
A wider scope of contextual understanding enables a model to grasp, condense, or dissect lengthy texts. This is particularly beneficial when processing lengthy papers, novels, or intricate conversations.
Our latest models, DeepSeek-V3 and DeepSeek-R1, have further solidified our position in the market.
A 671,000-parameter model, DeepSeek-V3 requires significantly fewer resources than its peers, while exceling impressively in various benchmark tests with different brands.
The DeepSeek-R1, unveiled this month, targets advanced tasks like reasoning, programming, and problem-solving. With its expertise in this domain, it faces competition from o1, a newer model developed by ChatGPT.
Although DeepSeek has accomplished considerable success in a relatively brief time frame, the company is primarily focused on conducting research and has no plans for immediate commercialization, as reported by Forbes.
Is there anyone who doesn't have to pay, at the end of the process?
One of the key factors contributing to DeepSeek's notable profile is its zero-cost accessibility to end-users.
This is the first advanced AI system available to users without cost. Other powerful systems, such as OpenAI's o1 and Claude Sonnet, necessitate a paid subscription. Some paid subscriptions even limit users' capacities.
Google Gemini is also available at no cost, but its free options are restricted to earlier models. DeepSeek has no current limitations.
How to use it?
Users can access the DeepSee chat interface designed for end users at "chat.deepsee". It is sufficient to type commands on the chat screen and press the "search" button to search the internet.
There is an advanced "deep think" feature that yields more detailed information on any subject. When activated, this option not only generates more detailed answers to users' inquiries but also expands its search to a broader range of sources in the search engine. It's worth noting that, unlike ChatGPT, which relies on a curated set of sources, this feature may return information from less credible or smaller sites, which may contain inaccuracies. Consequently, users are advised to verify the information received from this chatbot to ensure its accuracy.
Is it safe?
Another essential query about employing DeepSeek is whether it is secure. DeepSeek, similar to other implementations, necessitates user data, which is probably stored on servers in China.
As with any Large Language Model, it is important that users do not give sensitive data to the chatbot.
As DeepSeek is also open-source, independent researchers can review the model's code to verify its security. Further detailed information on potential security issues is anticipated to be disclosed in the coming days.
What is meant by the term open source?
The models, including DeepSeek-R1, have been released largely as open source. This allows anyone to access the tool's code and adapt it to their needs. The training data, however, remains proprietary.
OpenAI, in contrast, has made the o1 model available and is currently selling it to users, offering packages ranging from $20 (€19) to $200 (€192) per month.
How was its development permissible in the face of US regulations?
Additionally, the company has formed collaborative relationships to fortify its technological capacity and expand its global market presence.
One of the notable collaborations was with US chip manufacturer AMD. As stated by Forbes, DeepSeek utilized AMD Instinct GPUs (graphics processing units) and ROCM software at critical stages during the development of the model, particularly for DeepSeek-V3.
MIT Technology Review mentioned that Liang had purchased substantial amounts of Nvidia A100 chips, a type of chip currently restricted from export to China, prior to the United States placing trade sanctions against China.
Chinese news outlet 36Kr reports that the company holds more than 10,000 units in inventory. Others claim that this number may be as high as 50,000.
Recognizing the relevance of this dataset for training AI, Liang founded the company DeepSeek and started deploying them in conjunction with energy-efficient chips to enhance his models.
It is crucial to note that Liang has successfully created proficient models with minimal resources available.
US chip export restrictions required DeepSeek developers to design more intelligent and resource-efficient algorithms to make up for their limited computing capabilities.
DeepFury's developers estimate that ChatGPT would require 10,000 Nvidia GPUs to process large-scale training data. According to DeepSeek engineers, they were able to achieve equally impressive results using a significantly reduced amount of just 2,000 GPUs.
The reaction to DeepSeek has been studied.
Alexandr Wang, CEO of ScaleAI, which supplies training data to AI models of prominent organisations such as OpenAI and Google, characterised DeepSeek's product as "an earth-shattering model" in a speech at the World Economic Forum (WEF) in Davos last week.
While DeepMind has stunned American rivals, experts are already cautioning about what its release will signify in the Western world.
"We should be both worried and provisional. The infusion of Chinese AI technology into the UK and Western community is not just a bad situation - it’s a highly unwise one," Ross Burley, Co-Founder of the Centre for Information Resilience, stated.
"We've witnessed Beijing repeatedly use its technological superiority for surveillance, control, and coercion, both within its own borders and internationally. Be it through surveillance devices embedded with spyware, state-orchestrated cyberattacks, or the misuse of AI to silence dissent, China's history reveals that its technology serves as an integral component of its strategic worldwide influence," he said.
This appears to be a neutral-sounding Large Language Model, but it has previously been revealed that the AI conceals information critical of the Chinese government.
Many others share the view that releasing its latest LLM is a politically motivated move, which may further exacerbate the already strained Sino-American relationship.
"The actual cutting edge technology does exist, but the moment of its release is influenced by political considerations," Gregory Allen, director of the Wadhwani AI Center at the Center for Strategic and International Studies, informed the Associated Press.
Allen compared DeepSeek's announcement last week to Huawei's release of a new phone last year during diplomatic discussions over the Biden administration's export controls.
"Trying to prove that export restrictions are ineffective or even have a negative impact is a key objective of China's foreign policy at the moment," Allen said.