First steps
Once you have followed the steps in the Installation section to install the operator and its dependencies, now deploy an HDFS cluster and its dependencies. Afterward, you can verify that it works by creating, verifying and deleting a test file in HDFS.
Setup
Zookeeper
To deploy a Zookeeper cluster create one file called zk.yaml:
---
apiVersion: zookeeper.stackable.tech/v1alpha1
kind: ZookeeperCluster
metadata:
  name: simple-zk
spec:
  image:
    productVersion: 3.9.3
  servers:
    roleGroups:
      default:
        replicas: 1Define a ZNode that is used by the HDFS cluster to reference Zookeeper.
Create another file called znode.yaml:
---
apiVersion: zookeeper.stackable.tech/v1alpha1
kind: ZookeeperZnode
metadata:
  name: simple-hdfs-znode
spec:
  clusterRef:
    name: simple-zkApply both of these files:
kubectl apply -f zk.yaml
kubectl apply -f znode.yamlThe state of the Zookeeper cluster can be tracked with kubectl:
kubectl rollout status --watch --timeout=5m statefulset/simple-zk-server-defaultHDFS
An HDFS cluster has three components: the namenode, the datanode and the journalnode.
Create a file named hdfs.yaml defining 2 namenodes and one datanode and journalnode each:
---
apiVersion: hdfs.stackable.tech/v1alpha1
kind: HdfsCluster
metadata:
  name: simple-hdfs
spec:
  image:
    productVersion: 3.4.1
  clusterConfig:
    zookeeperConfigMapName: simple-hdfs-znode
    dfsReplication: 1
  nameNodes:
    config:
      listenerClass: external-stable # This exposes your Stacklet outside of Kubernetes. Remove this configuration if this is not desired
    roleGroups:
      default:
        replicas: 2
  dataNodes:
    config:
      listenerClass: external-unstable # This exposes your Stacklet outside of Kubernetes. Remove this configuration if this is not desired
    roleGroups:
      default:
        replicas: 1
  journalNodes:
    roleGroups:
      default:
        replicas: 1Where:
- 
metadata.namecontains the name of the HDFS cluster
- 
the HDFS version in the Docker image provided by Stackable must be set in spec.image.productVersion
| Please note that the version you need to specify for spec.image.productVersionis the desired version of Apache HDFS.
You can optionally specify thespec.image.stackableVersionto a certain release like24.7.0but it is recommended to leave it out and use the default provided by the operator.
For a list of available versions please check our image registry. Information on how to browse the registry can be found here.
It should generally be safe to simply use the latest image version that is available. | 
Create the actual HDFS cluster by applying the file:
kubectl apply -f hdfs.yaml
Track the progress with kubectl as this step may take a few minutes:
kubectl rollout status --watch --timeout=10m statefulset/simple-hdfs-datanode-default
kubectl rollout status --watch --timeout=10m statefulset/simple-hdfs-namenode-default
kubectl rollout status --watch --timeout=10m statefulset/simple-hdfs-journalnode-defaultVerify that it works
To test the cluster operation, create a new file, check its status and then delete it.
You can execute these actions from within a helper Pod.
Create a file called webhdfs.yaml:
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: webhdfs
  labels:
    app: webhdfs
spec:
  replicas: 1
  serviceName: webhdfs-svc
  selector:
    matchLabels:
      app: webhdfs
  template:
    metadata:
      labels:
        app: webhdfs
    spec:
      containers:
        - name: webhdfs
          image: oci.stackable.tech/sdp/testing-tools:0.2.0-stackable0.0.0-dev
          stdin: true
          tty: trueApply it and monitor its progress:
kubectl apply -f webhdfs.yaml
kubectl rollout status --watch --timeout=5m statefulset/webhdfsTo begin with the cluster should be empty: this can be verified by listing all resources at the root directory (which should return an empty array!):
  kubectl exec -n default webhdfs-0 -- curl -s -XGET "http://simple-hdfs-namenode-default-0.simple-hdfs-namenode-default.default.svc.cluster.local:9870/webhdfs/v1/?op=LISTSTATUS"Creating a file in HDFS using the Webhdfs API requires a two-step PUT (the reason for having a two-step create/append is to prevent clients from sending out data before the redirect).
First, create a file with some text in it called testdata.txt and copy it to the tmp directory on the helper pod:
kubectl cp -n default ./testdata.txt webhdfs-0:/tmpThen use curl to issue a PUT command:
  kubectl exec -n default webhdfs-0 -- \
  curl -s -XPUT -T /tmp/testdata.txt "http://simple-hdfs-namenode-default-0.simple-hdfs-namenode-default.default.svc.cluster.local:9870/webhdfs/v1/testdata.txt?user.name=stackable&op=CREATE&noredirect=true"This returns a location that looks similar to this:
http://simple-hdfs-datanode-default-0.simple-hdfs-datanode-default.default.svc.cluster.local:9864/webhdfs/v1/testdata.txt?op=CREATE&user.name=stackable&namenoderpcaddress=simple-hdfs&createflag=&createparent=true&overwrite=falseYou can assign this to a local variable - e.g. $location - or you can copy-and-paste it into the URL, and then issue a second PUT like this:
kubectl exec -n default webhdfs-0 -- curl -s -XPUT -T /tmp/testdata.txt "$location"Rechecking the status again with:
  kubectl exec -n default webhdfs-0 -- curl -s -XGET "http://simple-hdfs-namenode-default-0.simple-hdfs-namenode-default.default.svc.cluster.local:9870/webhdfs/v1/?op=LISTSTATUS"now displays some metadata about the file that was created in the HDFS cluster:
{
    "FileStatuses": {
        "FileStatus": [
            {
                "accessTime": 1660821734999,
                "blockSize": 134217728,
                "childrenNum": 0,
                "fileId": 16396,
                "group": "supergroup",
                "length": 597,
                "modificationTime": 1660821735602,
                "owner": "stackable",
                "pathSuffix": "testdata.txt",
                "permission": "644",
                "replication": 3,
                "storagePolicy": 0,
                "type": "FILE"
            }
        ]
    }
}To clean up, the file can be deleted like this:
  kubectl exec -n default webhdfs-0 -- curl -s -XDELETE "http://simple-hdfs-namenode-default-0.simple-hdfs-namenode-default.default.svc.cluster.local:9870/webhdfs/v1/testdata.txt?user.name=stackable&op=DELETE"What’s next
Look at the Usage guide to find out more about configuring your HDFS cluster.