- The AI scientists and researchers are developing hardware that can support the constant evolution of AI models.
- This includes the development of chips and hardware configurations that can work in tandem with the constantly evolving AI models.
The software industry is successful with the deployment of AI in the production, hardware industry. This includes automotive, industrial and smart retail. AI is still in the infant stage as far as AI productization is concerned. Major gaps continue to hinder AI algorithm proofs-of-concept (PoC) from turning into real hardware deployments. These systems come with drawbacks that include “non-perfect” inputs, and changing “state of the art models”. When it comes to overcoming the challenges, adaptive hardware seems to be a solution.
Small Data:
Google and Facebook have been collecting massive data regularly and analyze it every day. They use this data to develop AI models that provide acceptable performance. In such situations, the hardware that trains the model varies from the hardware that runs these models. On the other hand, the hardware industry does not deal with big data and this results in less mature AI models. Therefore, they require to collect more data and execute “online models” where training and inference are executed on the same hardware to improve the accuracy continuously.
Adaptive computing which includes field-programmable gateway arrays (FPGA) and adaptable system-on-chip (SoC) can execute both to infer and train to constantly update themselves with the newly captured data. The traditional AI needs the cloud or large on-premise data centers and takes weeks and days to perform the analysis. On the other hand, the real data is generated at the edge. Running inference and training on the same edge device reduces the total cost of ownership (TCO), latency, and security breaches.
Non-perfect input:
The PoC is always a concern when the input data is “not clean”. These PoC work accurately when the user provides the inputs with accuracy and correct details. In a real environment, camera and sensor inputs from medical devices, robots, and moving cars will have distorted input such as dark pictures, and objects from various angles. These inputs need to be cleaned up before they can be offered to AI models for processing and analysis. To provide proper and accurate decision-making, it is important to first feed the correct form of data to the AI models.
Some chips are good at AI inference acceleration, but they are known only to accelerate a portion of a full application. If you study retail, for instance, pre-processing of the data includes video decoding and computer vision algorithms. This is followed by reshaping and resizing and format conversion of the videos.
The post-processing includes object tracking and looking up into the database. The end-user is concerned only with the real-time response time of the full application pipeline. FPGA and SoC are known to speed up the pre and post-processing algorithms through domain-specific architectures (DSA). The addition of AI inference DSA enables the system optimization to meet the product requirements.
The changing “state of the art” models:
The community running AI research is very active with new AI models invented daily by top researchers worldwide. These models enhance accuracy, decrease computational requirements, and address a new type of AI application. This puts a lot of pressure on semiconductor hardware to work in tandem with the newly developed AI models and modern algorithms.
Standard benchmarks like MLPerf prove that the state of art CPUs, GPUs, and AI ASIC chips fall well below 30 percent of the performance while executing real-life AI workloads. This constantly pushes the need for new DSA to keep up with the innovation.
Several trends have been driving the requirement for new DSAs. Depthwise convolution is upcoming where that requires the use of memory bandwidth with specialized internal memory caching for enhanced efficiency. Researchers constantly innovate new custom layers that traditional chips do not support natively.
Sparse neural networks is another optimization in which the networks are heavily pruned, sometimes up to 99%. This is done by trimming the network ages eliminating fine-grained Matrix values in convolution and other factors. Although to execute this with hardware efficiency it requires specialized sparse architecture, an encoder, and a decoder for such operations which most of the chips do not exhibit.
Binary/ternary are the extreme optimizations that transform the math operations to bit manipulation operations. Most AI chips and GPUs only support 8 bit 16 bit or floating-point calculation units. Because of this, you cannot gain performance for power efficiency with extremely low precisions.
FPGAs and adaptable SoCs are perfect as a developer which can develop perfect DSA and reprogram the existing device for the workload of the product.
No problem if there is no hardware expertise:
The huge challenge for FPGA and SoC is the requirement for hardware expertise for the implementation and deployment of DSAs. The best part is that there are tools like the Vitis unified software platform which supports C++, Python, and other widely used AI frameworks like TensorFlow and PyTorch. This bridges the gap for software and AI developers.
Along with this open-source libraries like Vitis hardware-accelerated libraries significantly boost adoption within the developer community. With the recent design contest, Xilinx was able to bring about more than 1,000 developers and happened to publish multiple innovative projects from a hand-gesture-controlled drone to reinforcement learning to a binarized neural network. The most important thing to notice here is that the project was submitted by software developers who had no prior experience with FPGAs. This proves that the FPGA industry is taking the right steps to enable software developers to solve real-world AI productization challenges.
Until recently, unlocking the power of hardware adaptability was unachievable for an average software developer and AI scientist. It required specific hardware expertise previously but because of new open-source tools, software developers can use adaptable hardware.
With this ease of programming FPGAs and adaptable SoCs will continue to become more accessible to hundreds of thousands of software developers and AI scientists. This makes these devices the hardware solution for next-generation applications. Also, DSA will represent the future of AI inference with software developers and scientists. They can harness the hardware adaptability to develop the next generation applications.