What we offer

The Algo platform empowers developers to create generative AI systems with the best quality, cost and speed. All publicly available services are pay-as-you-go with developer friendly pricing. See the below list for offerings and docs links. Scroll further for more detailed descriptions and blog links.

  • Inference: Run generative AI models on algo-hosted infrastructure with our optimized Algo inference engine. Multiple inference options ensure there’s always a fit for your use case.
  • Modalities and Models: Use 100s models (or bring your own) across modalities of:

Inference

algo has 3 options for running generative AI models with unparalleled speed and costs.

  • Serverless: The easiest way to get started. Use the most popular models on pre-configured GPUs. Pay per token and avoid cold boots.
  • On-demand: The most flexible option for scaling. Use private GPUs to support your specific needs and only pay when you’re using it. GPUs running Algo software offer both ~250% improved throughput and 50% improved latency compared to vLLM. Excels for:
    • Production volume - Per-token costs decrease with more volume and there are no set rate limits
    • Custom needs and reliability - On-demand GPUs are private to you. This enables complete control to tailor deployments for speed/throughput/reliability or to run more specialized models
  • Enterprise Reserved GPUs: Use private GPUs with hardware and software set-up personally tailored by the Algo team for your use case. Enjoy SLAs, dedicated support, bring-your-own-cloud (BYOC) deployment options, and enterprise-only optimizations.