Link Search Menu Expand Document

Use ONNX Runtime for Inference

Docker Images

API Documentation

API Supported Versions Samples
Python 3.5, 3.6, 3.7, 3.8 (3.8 excludes Win GPU and Linux ARM) Python Dev Notes Samples
C#   Samples
C++   Samples
C   Samples
WinRT Windows.AI.MachineLearning Samples
Java 8+ Samples
Ruby (external project) 2.4-2.7 Samples
Javascript (node.js) 12.x Samples

Supported Accelerators

Execution Providers

CPU GPU IoT/Edge/Mobile Other
Default CPU - MLAS (Microsoft Linear Algebra Subprograms) + Eigen NVIDIA CUDA Intel OpenVINO  
Intel DNNL NVIDIA TensorRT ARM Compute Library (preview) Rockchip NPU (preview)
Intel nGraph DirectML Android Neural Networks API (preview) Xilinx Vitis-AI (preview)
Intel MKL-ML (build option) AMD MIGraphX (*preview) ARM-NN (preview)  

Deploying ONNX Runtime

Cloud

IoT and edge devices

The expanding focus and selection of IoT devices with sensors and consistent signal streams introduces new opportunities to move AI workloads to the edge. This is particularly important when there are massive volumes of incoming data/signals that may not be efficient or useful to push to the cloud due to storage or latency considerations. Consider: surveillance tapes where 99% of footage is uneventful, or real-time person detection scenarios where immediate action is required. In these scenarios, directly executing model inference on the target device is crucial for optimal assistance.

Client applications

Build from Source

For production scenarios, it’s strongly recommended to build only from an official release branch.