Using Backup with ScyllaDB

This article will show you how to deploy Vald with ScyllaDB as a backup database using Helm and run it on your Kubernetes cluster.

Overview

This tutorial leads you to deploy Vald and the external database for backup. As one of the features, Vald can auto index backup using MySQL + Redis or Cassandra to enable disaster recovery.
In this tutorial, you will use ScyllaDB deployed to the Persistent Volume for backup. And you will also deploy more microservices than Get Started. If you haven’t completed Get Started yet, we recommend trying it out at first.

The following image is the architecture image of this tutorial.

The 5 steps to Using Backup with ScyllaDB:

  1. Check and Satisfy the Requirements
  2. Prepare Kubernetes Cluster
  3. Deploy Vald on Kubernetes Cluster
  4. Run Example Code
  5. Cleanup

Requirements

  • Kubernetes: v1.19 ~
  • Go: v1.15 ~
  • Helm: v3 ~
  • libhdf5 (only required for this tutorial)

Helm is used to deploying Vald on your Kubernetes and Hdf5 is used to decode the sample data file to run the example.
If Helm or HDF5 is not installed, please install Helm and HDF5.

Installation command for Helm
curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
Installation command for HDF5
# yum
yum install -y hdf5-devel

# apt
apt-get install libhdf5-serial-dev

# homebrew
brew install hdf5

Prepare the Kubernetes Cluster

  1. Prepare Kubernetes cluster

    To complete get started, the Kubernetes cluster is required.
    Vald will run on Cloud Service such as GKE, AWS. In the sense of trying to “Get-Started”, k3d or kind are easy Kubernetes tools to use.

  2. Prepare ScyllaDB and Kubernetes metrics-server

    Deploy ScyllaDB as a backup database.

    make k8s/external/scylla/deploy
    

    In this make command, we are deploying a lightweight Cassandra-compatible ScyllaDB using Operator.

    If you're interested in this make command, take a look here for more detail of make command
    1. Deploy cert-manager for ScyllaDB
    kubectl apply -f https://github.com/jetstack/cert-manager/releases/latest/download/cert-manager.yaml
    kubectl wait -n cert-manager --for=condition=ready pod -l app=cert-manager --timeout=60s
    kubectl wait -n cert-manager --for=condition=ready pod -l app=cainjector --timeout=60s
    kubectl wait -n cert-manager --for=condition=ready pod -l app=webhook --timeout=60s
    
    1. Deploy ScyllaDB Operator
    kubectl apply -f https://raw.githubusercontent.com/scylladb/scylla-operator/master/examples/common/operator.yaml
    kubectl wait -n scylla-operator-system --for=condition=ready pod -l statefulset.kubernetes.io/pod-name=scylla-operator-controller-manager-0 --timeout=600s
    
    1. Deploy ScyllaDB
    kubectl apply -f k8s/external/scylla/scyllacluster.yaml
    kubectl wait -n scylla --for=condition=ready pod -l statefulset.kubernetes.io/pod-name=vald-scylla-cluster-dc0-rack0-0 --timeout=600s
    kubectl -n scylla get pods
    
    1. Configure ScyllaDB
    
    kubectl apply -f example/manifest/scylla
    kubectl wait --for=condition=complete job/scylla-init --timeout=60s
    

    For documentation on ScyllaDB operator, please refer to here

  3. Apply Kubernetes metrics server

    kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
    kubectl wait -n kube-system --for=condition=ready pod -l k8s-app=metrics-server --timeout=600s
    

Deploy Vald on Kubernetes Cluster

This chapter will show you how to deploy using Helm and run Vald on your Kubernetes cluster.
This chapter uses ScyllaDB as a backend data store for indexing and data backup.
If you want to learn about ScyllaDB, please refer to the official website.

  1. Clone the Repository

    To use the deployment yaml for deployment, let’s clone vdaas/vald repository.

    git clone https://github.com/vdaas/vald.git
    cd vald
    
  2. Confirm which Cluster to Deploy

    kubectl cluster-info
    
  3. Deploy Vald Using Helm

    # add vald repo into helm repo
    helm repo add vald https://vald.vdaas.org/charts
    # deploy vald on your kubernetes cluster
    helm install vald vald/vald --values example/helm/values-scylla.yaml
    
  4. Verify

    When finish deploying Vald, you can check the Vald’s pods status following the command.

    kubectl get pods
    
    Example output
    If the deployment is successful, all Vald components should be running.
    NAME                                       READY   STATUS      RESTARTS   AGE
    scylla-init-vhdp5                          0/1     Completed   0          7m12s
    vald-agent-ngt-0                           1/1     Running     0          7m12s
    vald-agent-ngt-1                           1/1     Running     0          7m12s
    vald-agent-ngt-2                           1/1     Running     0          7m12s
    vald-agent-ngt-3                           1/1     Running     0          7m12s
    vald-agent-ngt-4                           1/1     Running     0          7m12s
    vald-agent-ngt-5                           1/1     Running     0          7m12s
    vald-backup-gateway-68c8b4ffd4-df8zp       1/1     Running     0          6m56s
    vald-backup-gateway-68c8b4ffd4-dmwrd       1/1     Running     0          6m56s
    vald-backup-gateway-68c8b4ffd4-nm8f7       1/1     Running     0          7m12s
    vald-discoverer-7f9f697dbb-q44qh           1/1     Running     0          7m11s
    vald-lb-gateway-6b7b9f6948-4z5md           1/1     Running     0          7m12s
    vald-lb-gateway-6b7b9f6948-68g94           1/1     Running     0          6m56s
    vald-lb-gateway-6b7b9f6948-cvspq           1/1     Running     0          6m56s
    vald-manager-backup-5fb5f8dc7-h22sv        1/1     Running     0          7m12s
    vald-manager-backup-5fb5f8dc7-ncrw4        1/1     Running     0          6m56s
    vald-manager-backup-5fb5f8dc7-nzbkh        1/1     Running     0          6m56s
    vald-manager-compressor-78bf64459f-27ckg   1/1     Running     0          6m56s
    vald-manager-compressor-78bf64459f-9kl9b   1/1     Running     0          7m12s
    vald-manager-compressor-78bf64459f-dkx24   1/1     Running     0          6m56s
    vald-manager-index-74c7b5ddd6-jrnlw        1/1     Running     0          7m12s
    vald-meta-747f757bbb-9v5xz                 1/1     Running     0          7m12s
    vald-meta-747f757bbb-mpwqp                 1/1     Running     0          6m56s
    vald-meta-gateway-8c5f55dd-8fsch           1/1     Running     0          6m56s
    vald-meta-gateway-8c5f55dd-sdd5q           1/1     Running     0          7m12s
    vald-meta-gateway-8c5f55dd-vfkn6           1/1     Running     0          6m56s
    

Run Example Code

This chapter shows how to perform a search action in Vald with fashion-mnist dataset.

  1. Port Forward

    At first, port-forward is required to make request from your local environment possible.

    kubectl port-forward deployment/vald-meta-gateway 8081:8081
    
  2. Download Dataset

    Download fashion-mnist that is used as dataset for indexing and search query.

    # move to the working directory
    cd example/client
    
    # download fashion-mnist testing dataset
    wget http://ann-benchmarks.com/fashion-mnist-784-euclidean.hdf5
    
  3. Run Example

    We use example/client/main.go to run the example.
    This example will insert and index 400 vectors into the Vald from the fashion-mnist dataset via gRPC. And then after waiting for indexing, it will request for searching the nearest vector 10 times. You will get the 10 nearest neighbor vectors for each search query.
    Run example codes by executing the below command.

    # run example
    go run main.go
    

    The detailed explanation of example code is shown in Get Started

Cleanup

Remove the Vald pods by executing:

helm uninstall vald

Congratulation! You achieved this tutorial!

For more information, we recommend you to check: