Use ONNX Runtime for Inference

Docker Images

ONNX-Ecosystem: includes ONNX Runtime (CPU, Python), dependencies, tools to convert from various frameworks, and Jupyter notebooks to help get started
Additional dockerfiles

API Documentation

API	Supported Versions	Samples
Python	3.5, 3.6, 3.7, 3.8 (3.8 excludes Win GPU and Linux ARM) Python Dev Notes	Samples
C#		Samples
C++		Samples
C		Samples
WinRT	Windows.AI.MachineLearning	Samples
Java	8+	Samples
Ruby (external project)	2.4-2.7	Samples
Javascript (node.js)	12.x	Samples

Supported Accelerators

Execution Providers

CPU	GPU	IoT/Edge/Mobile	Other
Default CPU - MLAS (Microsoft Linear Algebra Subprograms) + Eigen	NVIDIA CUDA	Intel OpenVINO
Intel DNNL	NVIDIA TensorRT	ARM Compute Library (preview)	Rockchip NPU (preview)
Intel nGraph	DirectML	Android Neural Networks API (preview)	Xilinx Vitis-AI (preview)
Intel MKL-ML (build option)	AMD MIGraphX (*preview)	ARM-NN (preview)

Roadmap: Upcoming accelerators

Deploying ONNX Runtime

Cloud

ONNX Runtime can be deployed to any cloud for model inference, including Azure Machine Learning Services.
- Detailed instructions
- AzureML sample notebooks
ONNX Runtime Server (beta) is a hosting application for serving ONNX models using ONNX Runtime, providing a REST API for prediction.
- Usage details
- Image installation instructions

IoT and edge devices

Reference implementations

The expanding focus and selection of IoT devices with sensors and consistent signal streams introduces new opportunities to move AI workloads to the edge. This is particularly important when there are massive volumes of incoming data/signals that may not be efficient or useful to push to the cloud due to storage or latency considerations. Consider: surveillance tapes where 99% of footage is uneventful, or real-time person detection scenarios where immediate action is required. In these scenarios, directly executing model inference on the target device is crucial for optimal assistance.

Client applications

Install or build the package you need to use in your application. (sample implementations using the C++ API)
On newer Windows 10 devices (1809+), ONNX Runtime is available by default as part of the OS and is accessible via the Windows Machine Learning APIs. (Tutorials for Windows Desktop or UWP app)

Build from Source

For production scenarios, it’s strongly recommended to build only from an official release branch.

Instructions for additional build flavors