Recently, a Stanford University AI project team copied the open source products of the Chinese large model company farce, in the new era of AI, for the Sino-US technology catch-up situation pressed the refresh key.
As the Llama3-V open source model led by Stanford University AI team, it was quickly confirmed that the shell copied the domestic Tsinghua and wall intelligence open source model “Little steel gun” MiniCPM-Llama3-V 2.5, Beijing time on June 4 1:27, The two authors, Siddharth Sharma and Aksh Garg, formally apologized to the MiniCPM team on the social platform X for the behavior and said that the Llama3-V model would be taken down.
In the view of Liu Zhiyuan, chief scientist of Wall-facing Intelligence and associate professor of Tsinghua University, the main goal of industry practitioners in 2006 is still to publish a paper at a top international conference. Although this time reveals the high level of AI research and development in China in a regrettable way, it also shows that the large-scale model products of Chinese startups have begun to receive widespread international attention and recognition.
The plagiarism was quickly proven
The timeline shows that the incident began as early as May 29, when an AI team from Stanford University began to advertise on the network that $500 can train a SOTA (the best state of the latest technology) multimodal model.
The authors claim that Llama3-V is more powerful than GPT-4V, Gemini Ultra, and Claude Opus. The team members are undergraduates from Stanford University, who have published several papers in the field of machine learning, and their internship experience includes AWS, SpaceX, and so on. Due to the bright background, the Llama3-V project quickly rushed to the front page of HuggingFace (a developer community and platform) and aroused the attention of the developer community.
A user on social platforms X and HuggingFace questioned whether the Llama3-V is a shell of MiniCPM-Llama3-V 2.5, an open source end-to-end multimodal model for wall-facing intelligence, which was released on May 21, 2024.
The Llama-3V team responded at the time that they only used the MiniCPM-Llama3-V 2.5 tokenizer (a word segmentation device, an important component in natural language processing) and started the work before the MiniCPM-Llama3-V 2.5 was released. However, the team did not explain how the detailed tokenizer was obtained before the release of MiniCPM-Llama3-V 2.5.
Subsequently, there were more and more voices about the plagiarism of the above-mentioned AI team. For example, the model structure and configuration file of Llama3-V are exactly the same as MiniCPM-Llama3-V 2.5, with some reformatting and renaming of some variables. Llama3-V also has the same word segmentation as MiniCPM-Llama3V 2.5, including the special symbols newly defined by MiniCPM-Llama3-V 2.5.
The HuggingFace page shows that the author of the original Llama3-V directly imported the code for the wall-facing intelligent MiniCPM-V when uploading the code, and then changed the name to Llama3-V. But Mustafa Aljadery, one of the authors, does not consider the act plagiarism. He posted that there was a bug in Llama3-V reasoning, and they just used the configuration of MiniCPM-V to solve the bug, not plagiarism. “The architecture is based on comprehensive research, how can you say it’s MiniCPM? The visual part of the MiniCPM code also looks like it was used from Idefics.”
In the view of Li Dahai, CEO of Wall-facing intelligence, another evidence is that Llama3-V also uses the newly set Tsinghua Jane (a batch of Warring States bamboo slips collected by Tsinghua University in July 2008) recognition ability, and the cases presented are exactly the same as MiniCPM, and this training data has not been fully disclosed. More subtly, the two models are highly similar in both correct and incorrect performance after Gaussian perturbation validation, a method used to verify the similarity of models.
In the latest development, two authors from Stanford’s Llama3-V team issued a formal apology to the wall-facing MiniCPM team on a social platform. Aksha Garg said: “First of all, we would like to apologize to the original authors of MiniCPM. I, Sundhas Sharma, and Mustafa released the Llama3-V together. Mustafa wrote the code for the project, but could not be contacted since Wednesday. Sundhas Sharma and I were mainly responsible for helping Mustafa promote the model. The two of us looked at the latest papers to verify the novelty of this work, but were not informed or aware of any previous work from OpenBMB, a large-scale pre-trained language model library and related tools supported by the Tsinghua team. We apologize to the authors and are disappointed that we did not make an effort to verify the originality of this work. We take full responsibility for what happened and have removed the Llama3-V to apologize again.”
Big model era China quickly catch up
For the plagiarism farce, Stanford Artificial Intelligence Lab director Christopher David Manning posted a condemnation and praised the Chinese open source model MiniCPM.
“We deeply regret this incident,” Li said. On the one hand, this is also a way to be recognized by the international team, and on the other hand, we call on everyone to build an open, cooperative and trusting community environment.”
At present, the global competition pattern of large models shows the characteristics of diversification. The United States takes the lead in the number and technical level of large models, including natural language processing, computer vision, speech recognition fields, as well as AI chips, cloud computing infrastructure and so on. However, China’s large model has advantages in application scenarios, algorithm optimization, data resources and so on.
According to IT Orange data, at present, there are 102 unicorn companies in the field of artificial intelligence in China, of which 10 are new unicorns in 2023, and 4 are related to AIGC and large models, accounting for nearly half, including wisdom spectrum AI, Baichuan intelligence, zero and one things, Minimax name Dream.
Talking about the gap between China and the United States in the field of large models, Chairman and CEO Kaifu Lee said that a year ago, China’s large model and OpenAI, Google to start large model research and development compared to the time, there is a gap of 7 to 10 years; But today, the gap between China and the US is narrowing and is now about six months.
Liu Zhiyuan was plagiarised for this time to recall the past ten years, the scientific research experience of the “change of the star” : in 2006, Liu Zhiyuan read a PhD, the main goal of the computer, artificial intelligence industry practitioners is to issue a paper at the top international conference; In 2014, Liu Zhiyuan began to work as a teacher. At that time, only by obtaining important results such as the best papers from internationally renowned conferences could he have the opportunity to be on the news homepage of the department. In 2018, the language representation model BERT was published, and the research team saw its revolutionary significance, and made a knowledge enhancement pre-training model ERNIE, published in the ACL (Association for Computational Linguistics) 2019 annual conference, such results at that time have been considered to stand on the international frontier; In 2020, OpenAI released 170 + billion parameter GPT-3, practitioners are clearly aware of the gap with the international top results, know shame and then brave began to explore the “big model”; At the end of 2022, OpenAI launched ChatGPT, which made the public really feel the gap between domestic and foreign in the field of AI, especially after the release of international open source models such as Llama in 2023, there began to be a saying that “foreign open source and domestic self-research”.
Today in 2024, Liu Zhiyuan said that industry practitioners should also see that domestic large model teams such as ZhipU – Tsinghua GLM, Ali Qwen, DeepSeek and Meibi – Tsinghua OpenBMB are receiving wide attention and recognition internationally through continuous open source sharing. This incident also reflects the international attention paid to domestic innovation achievements.
In addition to single mode, in April this year, Professor Zhu Jun, vice dean of the Institute of Artificial Intelligence of Tsinghua University, co-founder and chief scientist of Shengdu Technology, on behalf of Tsinghua University and Shengdu Technology, released China’s first video large model Vidu, which is regarded as the Chinese version of Sora (multi-mode large model released by OpenAI).
Zhou Zhifeng, partner of Qiming Venture Capital, said that today’s large model has gradually moved from the original pure language mode to the exploration of multi-modes. A lot of work has been cited by the OpenAI and Stable Diffusion teams. Tang Jiayu, CEO of Sheng Number Technology, believes that the research of multi-modal large models is still in its infancy, and the technology maturity is not high. This is different from the hot language model, and foreign countries have been ahead of an era. Therefore, compared with the “volume” on the language model, Tang Jiayu believes that multi-modal is an important opportunity for domestic teams to seize the large model track.
Lin Yonghua, vice president and chief engineer of Beijing Zhiyuan Artificial Intelligence Research Institute, holds a more rigorous attitude, she told the first financial reporter that China’s corner overtaking in the multi-modal field is a certain possibility, but the more critical thing is to see the successful elements of the multi-modal model – still computing power, algorithms and data. At the current algorithm level, the difference between the Chinese and American teams is not so big, and the computing power will not cause the biggest problem, and the industry still has ways to solve the computing power problem. However, Lin Yonghua believes that the current data problem is the biggest resistance, even if the wisdom source has been doing AI training data expansion, but to obtain massive high-quality data, it is still very difficult.