Serverless GPU inference for ML models

Pay-per-millisecond API to run ML in production.

  • < 1% cold start

    Proprietary caching system

  • 10x

    Cost savings

  • Minutes

    Deploy your models in minutes, not days

When should you use Serverless GPUs?

  • Fast and reliable

    Our API is robust, reliable and built for speed. Focus on optimising your ML models and trust the deployment to Pipeline AI.

  • Cost Saving compute

    Only pay for compute that you need for inference and deployment; no tie-in’s and no GPU fixed fees.

  • Scalable by design

    Speed up time-to-market and respond to demand by super-scaling your deployment, not your team.

Why use Serverless GPUs for ML inference?

Fast and reliable

We know how important it is to be the best-in-class when it comes to powering machine learning in production.

With our proprietary caching management at the heart of our infrastructure, Pipeline AI delivers low latency and high speeds.

Going serverless means optimised performance at minimal cost as workloads are distributed across an array of GPU’s.

And we don’t skimp on the hardware; our serverless cloud is run on the latest NVIDIA Ampere and Volta GPU’s.

Cost saving compute

With enterprise level cloud providers you generally need to purchase a server or a block of servers that are constantly running and always available, even when unused. This is expensive and wasteful in terms of resources.

With Pipeline AI you only pay for the compute time that you use. Our API will respond instantly when it is called and will not incur costs when at rest.

No more complex dashboards, hidden charges or tie in periods.

Scalable by design

Scale your business not your team! Why hire dedicated DevOps engineers when Pipeline AI can abstract away the complexities of production machine learning deployment?

With our toolset your whole team can focus on deploying ML models and growing your customer base rather than building and maintaining costly infrastructure.

From a few requests to thousands, our platform can handle all the cloud infrastructure needed to run your ML models and is inherently scalable by design.

And it’s easy to get started! With Pipeline AI you can convert your model to a pipeline and get access to our API within minutes. You can create custom models or use pre-trained models from our open source library and be up and running in no time.

Frequently asked questions

What is serverless GPU? What is serverless GPU?

Serverless GPU refers to the use of graphics processing units (GPUs) in a serverless computing environment. In a serverless environment, you can run your applications and services without having to worry about the underlying infrastructure. Instead of setting up and maintaining physical servers or virtual machines, you can simply deploy your model and let Pipeline AI handle the rest.

Is inference faster on GPU or CPU?

In general, GPU (graphics processing unit) is faster than CPU (central processing unit) for machine learning inference, because GPU is specifically designed to perform many calculations in parallel. Machine learning models often involve a large number of mathematical operations, and GPU can perform these operations much faster than CPU. However, the actual speed difference will depend on the specific hardware and the type of model being used.

For example, some types of machine learning models may be more suited to CPU inference, because they require a high level of sequential processing that is better suited to CPU architecture.In general, deep learning models, which involve a large number of matrix operations, are well- suited to GPU inference, while models that rely more on sequential processing, such as decision trees, may be better suited to CPU inference.

Ultimately, the choice between GPU and CPU for machine learning inference will depend on the specific needs of your application and the hardware resources that are available to you.

Can I use your serverless API for my own ML models?

Yes! You can deploy custom models and create a private endpoint for you.

Do you have a limit to how many API calls you can handle?

No. Pipeline AI is built to support a few requests to hundreds of requests per minute. If your usage is higher than the average customer, we offer our enterprise plan. Feel free to contact us for more information on this.

How do you handle cold-start?

Until bandwidth across PCIE into the GPU is instantaneous, all ML compute providers will be troubled by the cold-start issue where models first require loading before inference can be served. With our custom distribution software we have minimised the impact of these.

Start your AI journey today

Join over 2,500+ customers already using Pipeline.