In this blogpost we’re looking at why and how serverless is the right choice to power AI apps and products, and what we are doing at Mystic to deliver a fully hosted solution to run your deep learning models in production.
We believe that serverless computing is at an inflection point and have made it our mission to:
A serverless GPU cloud system works as a cloud server—a computing engine that automatically responds to requests made by computing clients. Instead of managing the server software and running it in-house, a cloud system deploys and runs the server software externally, on the cloud service provider’s own servers. Clients interact with the software remotely over the network.
The serverless compute model provides an efficient alternative to server-based platforms for applications that require compute in the cloud. Our customers use Pipeline tools and cloud infrastructure to deploy open source LLM’s, and to host and serve their own deep learning models.
In serverless computing, infrastructure is abstracted from application code, which can then be run on any infrastructure. By providing powerful tools for abstracting away the hardware we can enable devs to concentrate on developing and deploying data pipelines. Compared to server-side computing, a serverless solution requires fewer concerns, and a user only needs to configure the application code.
Our serverless infrastructure means that we can offer low latency and high speeds. This is due to our proprietary task distribution system that minimises under-utilisation of hardware and operates advanced caching management to minimise the cold start problem. Until bandwidth across PCIE into the GPU is instantaneous, all ML compute providers will be troubled by the cold-start issue where models first require loading before inference can be served. With our custom distribution software we have minimised the impact of these, and you will never be charged for model loading time.
With enterprise level cloud providers you generally need to purchase a server or a block of servers that are constantly running and always available, even when unused. This is expensive and wasteful in terms of resources. There is also generally a level of complexity involved to get started and you need to be an expert to ensure your account is optimised and to avoid paying for services that you are not actually using.
As a company that is focused on doing a few very difficult things really well, our mission is to offer a really straightforward, affordable and simple serverless setup for machine learning practitioners. — Paul Hetherington, CEO Mystic
Our Quickstart set up process makes it super easy to make a Pipeline account and deploy a model. Our API will respond instantly when it is called and will not incur costs when at rest. We only charge for the time it takes for the GPU to process your compute (see our pricing page for details).
DEVELOPER: From standard language tasks to image processing, our model hub hosts a range of production-ready ML models - or users can upload their own.
ENTERPRISE: End to end infrastructure on cloud GPUs for inference or training of open source or bespoke models. Smart scaling and pay-per-use supports usage as it grows; no quotas, no minimums, no hourly rental.
PIPELINE STACK: Our optimised ML task distribution software empowers organizations to drive workplace innovation, reduce on-prem hardware costs and achieve their ML goals.
See our pricing page for details our our pay-per-second rates for compute.