Using Backup with ScyllaDB
This article will show you how to deploy Vald with ScyllaDB as a backup database using Helm and run it on your Kubernetes cluster.
Overview
This tutorial leads you to deploy Vald and the external database for backup.
As one of the features, Vald can auto index backup using MySQL + Redis or Cassandra to enable disaster recovery.
In this tutorial, you will use ScyllaDB deployed to the Persistent Volume for backup.
And you will also deploy more microservices than Get Started.
If you haven’t completed Get Started yet, we recommend trying it out at first.
The following image is the architecture image of this tutorial.

The 5 steps to Using Backup with ScyllaDB:
- Check and Satisfy the Requirements
- Prepare Kubernetes Cluster
- Deploy Vald on Kubernetes Cluster
- Run Example Code
- Cleanup
Requirements
- Kubernetes: v1.19 ~
- Go: v1.15 ~
- Helm: v3 ~
- libhdf5 (only required for this tutorial)
Helm is used to deploying Vald on your Kubernetes and Hdf5 is used to decode the sample data file to run the example.
If Helm or HDF5 is not installed, please install Helm and HDF5.
Installation command for Helm
curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
Installation command for HDF5
# yum
yum install -y hdf5-devel
# apt
apt-get install libhdf5-serial-dev
# homebrew
brew install hdf5
Prepare the Kubernetes Cluster
Prepare Kubernetes cluster
To complete get started, the Kubernetes cluster is required.
Vald will run on Cloud Service such as GKE, AWS. In the sense of trying to “Get-Started”, k3d or kind are easy Kubernetes tools to use.Prepare ScyllaDB and Kubernetes metrics-server
Deploy ScyllaDB as a backup database.
make k8s/external/scylla/deploy
In this make command, we are deploying a lightweight Cassandra-compatible ScyllaDB using Operator.
If you're interested in this make command, take a look here for more detail of make command
- Deploy cert-manager for ScyllaDB
kubectl apply -f https://github.com/jetstack/cert-manager/releases/latest/download/cert-manager.yaml kubectl wait -n cert-manager --for=condition=ready pod -l app=cert-manager --timeout=60s kubectl wait -n cert-manager --for=condition=ready pod -l app=cainjector --timeout=60s kubectl wait -n cert-manager --for=condition=ready pod -l app=webhook --timeout=60s
- Deploy ScyllaDB Operator
kubectl apply -f https://raw.githubusercontent.com/scylladb/scylla-operator/master/examples/common/operator.yaml kubectl wait -n scylla-operator-system --for=condition=ready pod -l statefulset.kubernetes.io/pod-name=scylla-operator-controller-manager-0 --timeout=600s
- Deploy ScyllaDB
kubectl apply -f k8s/external/scylla/scyllacluster.yaml kubectl wait -n scylla --for=condition=ready pod -l statefulset.kubernetes.io/pod-name=vald-scylla-cluster-dc0-rack0-0 --timeout=600s kubectl -n scylla get pods
- Configure ScyllaDB
kubectl apply -f example/manifest/scylla kubectl wait --for=condition=complete job/scylla-init --timeout=60s
For documentation on ScyllaDB operator, please refer to here
Apply Kubernetes metrics server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml kubectl wait -n kube-system --for=condition=ready pod -l k8s-app=metrics-server --timeout=600s
Deploy Vald on Kubernetes Cluster
This chapter will show you how to deploy using Helm and run Vald on your Kubernetes cluster.
This chapter uses ScyllaDB as a backend data store for indexing and data backup.
If you want to learn about ScyllaDB, please refer to the official website.
Clone the Repository
To use the
deployment yaml
for deployment, let’s clonevdaas/vald
repository.git clone https://github.com/vdaas/vald.git cd vald
Confirm which Cluster to Deploy
kubectl cluster-info
Deploy Vald Using Helm
# add vald repo into helm repo helm repo add vald https://vald.vdaas.org/charts # deploy vald on your kubernetes cluster helm install vald vald/vald --values example/helm/values-scylla.yaml
Verify
When finish deploying Vald, you can check the Vald’s pods status following the command.
kubectl get pods
Example output
If the deployment is successful, all Vald components should be running.NAME READY STATUS RESTARTS AGE scylla-init-vhdp5 0/1 Completed 0 7m12s vald-agent-ngt-0 1/1 Running 0 7m12s vald-agent-ngt-1 1/1 Running 0 7m12s vald-agent-ngt-2 1/1 Running 0 7m12s vald-agent-ngt-3 1/1 Running 0 7m12s vald-agent-ngt-4 1/1 Running 0 7m12s vald-agent-ngt-5 1/1 Running 0 7m12s vald-backup-gateway-68c8b4ffd4-df8zp 1/1 Running 0 6m56s vald-backup-gateway-68c8b4ffd4-dmwrd 1/1 Running 0 6m56s vald-backup-gateway-68c8b4ffd4-nm8f7 1/1 Running 0 7m12s vald-discoverer-7f9f697dbb-q44qh 1/1 Running 0 7m11s vald-lb-gateway-6b7b9f6948-4z5md 1/1 Running 0 7m12s vald-lb-gateway-6b7b9f6948-68g94 1/1 Running 0 6m56s vald-lb-gateway-6b7b9f6948-cvspq 1/1 Running 0 6m56s vald-manager-backup-5fb5f8dc7-h22sv 1/1 Running 0 7m12s vald-manager-backup-5fb5f8dc7-ncrw4 1/1 Running 0 6m56s vald-manager-backup-5fb5f8dc7-nzbkh 1/1 Running 0 6m56s vald-manager-compressor-78bf64459f-27ckg 1/1 Running 0 6m56s vald-manager-compressor-78bf64459f-9kl9b 1/1 Running 0 7m12s vald-manager-compressor-78bf64459f-dkx24 1/1 Running 0 6m56s vald-manager-index-74c7b5ddd6-jrnlw 1/1 Running 0 7m12s vald-meta-747f757bbb-9v5xz 1/1 Running 0 7m12s vald-meta-747f757bbb-mpwqp 1/1 Running 0 6m56s vald-meta-gateway-8c5f55dd-8fsch 1/1 Running 0 6m56s vald-meta-gateway-8c5f55dd-sdd5q 1/1 Running 0 7m12s vald-meta-gateway-8c5f55dd-vfkn6 1/1 Running 0 6m56s
Run Example Code
This chapter shows how to perform a search action in Vald with fashion-mnist dataset.
Port Forward
At first, port-forward is required to make request from your local environment possible.
kubectl port-forward deployment/vald-meta-gateway 8081:8081
Download Dataset
Download fashion-mnist that is used as dataset for indexing and search query.
# move to the working directory cd example/client # download fashion-mnist testing dataset wget http://ann-benchmarks.com/fashion-mnist-784-euclidean.hdf5
Run Example
We use
example/client/main.go
to run the example.
This example will insert and index 400 vectors into the Vald from the fashion-mnist dataset via gRPC. And then after waiting for indexing, it will request for searching the nearest vector 10 times. You will get the 10 nearest neighbor vectors for each search query.
Run example codes by executing the below command.# run example go run main.go
The detailed explanation of example code is shown in Get Started
Cleanup
Remove the Vald pods by executing:
helm uninstall vald
Recommended Documents
Congratulation! You achieved this tutorial!
For more information, we recommend you to check: