A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.
-
Updated
Nov 26, 2024 - Python
A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.
The simplest way to serve AI/ML models in production
An open-source computer vision framework to build and deploy apps in minutes
Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
A REST API for Caffe using Docker and Go
The goal of RamaLama is to make working with AI boring.
This is a repository for an nocode object detection inference API using the Yolov3 and Yolov4 Darknet framework.
This is a repository for an nocode object detection inference API using the Yolov4 and Yolov3 Opencv.
This is a repository for an object detection inference API using the Tensorflow framework.
Work with LLMs on a local environment using containers
Serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints
Orkhon: ML Inference Framework and Server Runtime
ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.
K3ai is a lightweight, fully automated, AI infrastructure-in-a-box solution that allows anyone to experiment quickly with Kubeflow pipelines. K3ai is perfect for anything from Edge to laptops.
Deploy DL/ ML inference pipelines with minimal extra code.
A standalone inference server for trained Rubix ML estimators.
Friendli: the fastest serving engine for generative AI
Wingman is the fastest and easiest way to run Llama models on your PC or Mac.
Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX
Fullstack machine learning inference template
Add a description, image, and links to the inference-server topic page so that developers can more easily learn about it.
To associate your repository with the inference-server topic, visit your repo's landing page and select "manage topics."