DeepSeek, Meta, and the rise of open-source LLM models

Hello 🙈,

I missed you guys too 🙂. I had an extensive life detour… I am back now 👽

Let’s get back with a mid-series special.

Free and open-source software started from academics, in the 1950s. At the time people writing software for computers were mainly academics, and they kept to the academic principles of sharing and openness, where they shared their research and knowledge in publications.

A computer hobbyists club called the “Homebrew Computer Club” existed between 1975 and 1986 in Menlo Park, California. The club's main aim was for computer hobbyists to meet and share ideas and computer parts, from circuits, software, and other parts relating to personal computing. The club also had a newsletter where they shared the latest on how to set up a personal computer. This club is significant because it pushed for the advent of the personal computer, and noteworthy members of the club were none other than Steve Jobs and Steve Wozniak, before they started their personal computing startup called Apple.

Personal computing was largely built on the open-source spirit and mindset of sharing and openness. If you are a programmer, it’s almost inevitable that you have used free and open-source code or software in your development.

With the rise of AI, we have recently seen the rise of open-source large language models (LLMs), led by the elephant in the room, DeepSeek. But Meta has also been building free and open-source LLMs. What is the deal with DeepSeek?

DeepSeek is a small AI startup from China that has sent the tech scene agog with the release of its R1 AI model that is competing and with some benchmarks even surpassing the latest models from big dogs like OpenAI’s o1 and Anthropic’s Claude 3.5.

How did it happen, and why is this impressive?

The USA has been at loggerheads with China for many years now, and they are both wary of each other. While the USA is trying to protect its status as the world power, the Chinese State is threatening that status as it’s growing fast in Technology, Engineering, Medicine, etc.

In light of this, the USA has tried to restrain and restrict China’s growth by imposing sanctions and high tariffs. A good example of such sanctions is the USA ban on all Huawei products in its country claiming that the Chinese government is using Huawei as a front to steal sensitive data from its citizens. Of recent, TikTok the popular short video-sharing platform was banned temporarily from the USA by the Joe Biden administration for about 2 days but was reinstated by the Trump administration as soon as President Trump was inaugurated, still with a clause that the Chinese state sell the USA branch of TikTok to a USA company.

In the AI wars, the USA believes the country that creates the most advanced AI would be the world power, and that the USA should be that country. To achieve that the state posed heavy restrictions to China to the export of US-made semi-conductors, GPUs, and chips which are used to train AI models. This helped AI companies in the USA have an edge of a couple of years over other countries in AI. All the popular models, from ChatGPT, Claude, Llama, and Gemini are all from the USA. The USA has the money, the hardware, and the talent to push on and dust everybody else.

Enter DeepSeek.

Cái khó ló cái khôn - In hardship, wisdom emerges.

Having been squeezed by USA restrictions to the best hardware and funding, a small startup called DeepSeek was created in 2023 by Liang Wenfeng, an alumnus of Zhejiang University with a background in information and electronic engineering. In December last year, the DeepSeek V3 model was released. Subsequently, DeepSeek-R1 and DeepSeek-R1-Zero models were released that were on par with the latest and best models available from OpenAI’s o1 and Anthropic’s Claude 3.5 Sonnet models.

Now this is special because due to the lack of access to the best GPUs from the USA, DeepSeek had to think outside the box, with about $5.58 million, in 55 days they were able to train their LLM. To put those figures into context, just about a year ago Sam Altman, the CEO of OpenAI was reportedly trying to raise $7trillion to step up the growth of advanced AI. This is special also because DeepSeek is an open-source project, which makes it readily available for free to everyone.

How it works

To go into technical details on how DeepSeek was able to achieve, you can read here. They used Reinforcement Learning, low-level communications, compression, and abstracted training to train only specific parts that needed to be trained. So when other LLM models trains their AI model to know everything, DeepSeek’s models use what they call experts for each domain(topic). For example if I ask a question about health, DeepSeek would take the question to the health expert within its model, for evaluation and response. In that way if I want to make my model better on a specific topic i can easily just train that expert better. This is different from what we have with popular models like GPT 4o, where one model is an expert on everything, so to make minor adjustments on specific topics would get expensive real quick because the whole model would need to be retrained.

Sam Altman of OpenAI called out DeepSeek for copying its models, but then is that not what all great artists do? It was Picasso himself that said, “Good artists copy, great artists steal”. When asked about DeepSeek copying AI model concepts and building on existing USA companies’ LLM, in an interview with CNBC, CEO of Perplexity AI, Aravind Srinivas had this to say:

“… everybody copies everybody in this field, Google built the Transformer first and OpenAI copied it, Google built the first LLMs and OpenAI also copied it”

To make it even more ridiculous, OpenAI started as a non-profit. Its early focus was on open research and sharing knowledge within the AI community. It is ironic that a company with the name OpenAI would want to gatekeep its progress and have so far transitioned to a for-profit organization, leaving early investors like Elon Musk outraged 💀. Well to be fair after a very solid headstart and immense funding, I’d expect Sam Altman and his OpenAI team not to be happy. OpenAI is not the poster boy of AI anymore, they have extreme competition from startups building AI models that compete with them on every level, the playing field is almost level.

DeepSeek is not the first open-source LLM Project, Meta has been building their open-source Llama models since 2023.

Meta

Mark Zuckerberg the CEO of Meta has always been a firm believer in open-source software and code, we have had big ones like Pytorch and ReactJs. As of 2022, Meta had 1,032 actively maintained open-sourced projects, you can read more here. Not to take anything from Zuckerberg, DeepSeek was built on the efforts of US companies spending billions of dollars to train the initial LLMs.

Apple

Apple Inc. is arguably one of the biggest beneficiaries of the DeepSeek LLM news because they slept on AI and had to use OpenAI for their Apple Intelligence, now they can easily build their custom LLMs for their product ecosystem which they do so well. Also, they would have saved billions in training their initial LLMs, as the saying goes: The early bird gets the worm, but the second mouse gets the cheese. TechRadar wrote a piece explaining this in more detail, you can check it out here.

In conclusion, I believe US AI companies working on LLMs would catch up quickly to what DeepSeek has done, you know since DeepSeek is open-sourced and anyone can check their code and research papers. I hope it makes free and open-source AI software even better, as I believe open-source is the best way to keep AI out of being a monopoly and to encourage growth and breakthroughs like what DeepSeek has done.

You can check our roadmap here to refresh your memory on our journey in this series, from how far we’ve come, and where we are going.

It’s been a while since I wrote, I hope I was not too rusty… see you, in the next one 👽

⬅️ Previous Chapter

DeepSeek, Meta, and the rise of open-source LLM models

AI Series: Mid-Series Special 7