FSD regulation algorithm may introduce large models

Published on 2 August 2023 at 11:55

 

Recently, Musk tweeted that vehicle control is the final hurdle in the FSD AI puzzle, which could reduce over 300,000 lines of C++ control code by about 2 orders of magnitude. Our engineers are making good progress so far, but progress on training calculations is limited.

 

This seems to signal that the FSD's regulation and control algorithms will be introduced to large models to accelerate the landing.

 

FSD is Tesla's Autopilot software. In Musk's vision, FSD is going to allow vehicles to be fully self-driving without human intervention and be safer than human drivers.

 

To formally launch FSD or to realize autonomous driving, it is necessary to overcome the technical hurdles of perception, planning and control. Only after the first step of the perception problem is solved, can we step into the planning and control phase of the attack.

 

Two years ago, Tesla rewrote the code of the perception module, introduced the big model into the field of automatic driving, and proposed the combination of BEV+Transformer. Subsequently, car companies such as Xiaopeng, Ideal, and Azure, as well as many intelligent driving suppliers, have gradually started to use BEV+transformer to solve the problems at the perception level. The "optimal solution" for the first step of autonomous driving has been recognized by most people.

 

The introduction of large models has brought obvious improvement to the perception module, and the trend of "focusing on perception but not maps" has become more and more obvious, and the data closed-loop system is gradually taking shape.

 

Now, Musk once again revealed that Tesla also used AI algorithms to streamline the code in the regulation and control algorithm, which means that the regulation and control code is also being rewritten. After rewriting the code, the training of the system itself requires massive data and arithmetic support, which may also be the reason why Musk said "arithmetic is limited".

 

Traditional regulation and control algorithms are based on a rule-based system, which is heavy code and complex. If Tesla introduces a large model into the regulation and control algorithm, not only will the code be streamlined, but the training of the regulation and control algorithm will also be further accelerated, and it will be closer to human drivers. However, it should be noted that the safety risk of the regulation and control link is higher than that of the perception link, so it also needs to be backed up by the traditional rule system.

 

However, as of now, Tesla's introduction of large models in the regulation and control module is still in the early stages, and what needs to be solved is still the problem of arithmetic power. Previously, Tim Zaman, general manager of engineering at Tesla, also tweeted that at present, our computing cluster has only 0.3% free time, of which 84% of the work is high-priority, and we hope that we can have more computing space.

 

This problem may be expected to be solved with the commissioning of Dojo.

 

It is understood that in July Tesla's own supercomputer Dojo has been officially put into production. Data show that each Dojo is integrated with 120 training modules, built-in 3,000 D1 chips, with more than 1 million training nodes, the arithmetic power of 1.1EFLOP (trillion floating point operations per second).

 

Musk has been looking forward to this for a long time, and he believes that the use of Dojo will give Tesla an order of magnitude improvement in training costs. According to Tesla's planning, Dojo will drive Tesla to enter a period of rapid growth in arithmetic power. in February 2024 Tesla's arithmetic power scale will enter the world's top five, and in October 2024 Tesla's total arithmetic power scale will reach 100 Exa-Flops.

 

With Dojo put into use, Tesla's regulation algorithm will accelerate the reconstruction, and FSD landing in the year may become a reality. And how the progress is after that, I believe Musk will make another preview.

 

 

近日,马斯克在推特上表示:车辆控制是FSD AI难题中的最后一关,这可以让超过30万行 C++ 控制代码减少约2个数量级。目前我们的工程师进展顺利,

但训练计算的进展受限。

 

这似乎预示着,FSD的规控算法将引入大模型以加速落地。

 

FSD是特斯拉的自动驾驶软件。在马斯克的愿景中,FSD要让车辆做到不在人工干预的情况下完全自动驾驶,而且要比人类司机更安全。

 

          而想要正式推出FSD或者说实现自动驾驶,需要攻克感知、规划、控制等技术关卡。只有在将第一步的感知问题解决之后,才能步入规划和控制阶段的攻关。

 

          两年前,特斯拉重写感知模块代码,将大模型引入自动驾驶领域,提出了 BEV+Transformer 的组合。随后,小鹏、理想、蔚来等车企以及众多智驾供应商也逐渐开始使用 BEV+transformer 以解决感知层面的问题。自动驾驶第一步的“最优解”得到了多数人的认可。

 

大模型的引入给感知模块带来了明显的效果提升,“重感知轻地图”的趋势愈发明显,数据闭环体系也逐渐成型。

 

          而今,马斯克再次透露,在规控算法中特斯拉同样用了AI算法精简代码,这意味着规控代码也正在被重写。重写代码后,系统本身的训练需要海量的数据和算力支撑,这或许也是马斯克表示“算力受限”的原因。

 

         传统的规控算法基于规则制来运行,代码繁重,工作复杂。若特斯拉将大模型引入规控算法中,不仅将对代码进行精简,规控算法的训练也将进一步提速,将更加接近人类驾驶员。不过需要注意的是,规控环节的安全风险比感知环节的更高,因此也需要用传统的规则制来进行兜底。

         

          但就目前来看,特斯拉在规控模块引入大模型还在前期阶段,需要解决的仍是算力问题。此前特斯拉工程总经理Tim Zaman也在推特上表示,目前我们的计算集群只有0.3%的空闲时间,其中84%的工作都是高优先级的,我们希望可以拥有更多的计算空间。

 

这一问题或许会随着 Dojo 的投入使用而有望得到解决。

 

         据了解,7月份特斯拉自研的超级计算机 Dojo 已经正式投产。数据显示,每个 Dojo 都集成了120个训练模块,内置3000个D1芯片,拥有超过100万个训练节点,算力达到1.1EFLOP(每秒千万亿次浮点运算)。

 

         马斯克对此期待已久,他认为 Dojo 的使用在训练成本上会让特斯拉有一个数量级的提高。根据特斯拉的规划,Dojo 将带动特斯拉进入算力快速增长期。2024年2月特斯拉的算力规模将进入全球前五,2024年10月特斯拉的算力总规模将达到 100 Exa-Flops。

 

随着 Dojo 的投入使用,特斯拉的规控算法将加快重建,FSD年内落地或将成为现实。而此后进展如何,相信马斯克还会再做出预告。

 

文|韩永昌 

编辑|李勤

Add comment

Comments

There are no comments yet.