Vertex AI Model Registry is a central repository where you can manage the lifecycle of your ML models1. You can import models from various sources, such as BigQuery ML, AutoML, or custom models, and assign them to different versions and aliases1. You can also deploy models to endpoints, which are resources that provide a service URL for online prediction2.
By importing the new model to the same Vertex AI Model Registry as a different version of the existing model, you can keep track of the model versions and compare their performance metrics1. You can also use aliases to label the model versions according to their readiness for production, such as default or staging1.
By deploying the new model to the same Vertex AI endpoint as the existing model, you can use traffic splitting to gradually shift the production traffic from the old model to the new model2. Traffic splitting is a feature that allows you to specify the percentage of prediction requests that each deployed model in an endpoint should handle2. This way, you can minimize the impact to the existing and future model users, and monitor the performance of the new model over time2.
The other options are not suitable for your scenario, because they either require creating a separate endpoint or a Cloud Run service, which would increase the complexity and maintenance of your deployment, or they do not allow you to use traffic splitting, which would create a sudden change in your prediction results. References:
Introduction to Vertex AI Model Registry | Google Cloud
Deploy a model to an endpoint | Vertex AI | Google Cloud