This is the minimum list of hardening and other steps that need to be performed to secure the Linux server containing the DIVOC platform. The assumption is that the installation happens on a bare metal setup. While the concepts remain the same, the methodology might differ for commercial cloud setups.
Identifying open connections to the internet is a critical mission.
netstat -antp
Once you have identified the open ports, you can stop/purge the applications which keep unnecessary ports open.
The only acceptable open ports are 22 and 443. Access to the other ports outside the Kubernetes network should be prohibited.
All ingress should be routed through the 443 port only as needed.
SSH is secure, but we need to harden this service as well. If we can disable SSH, then the problem is solved. However, if we want to use it, we have to change the default configuration of SSH. Password-based authentication should be disabled and only key-based authentication should be allowed. The steps for creating sudo users with public and private keys are as follows:
Create a non-root sudo user
adduser <user>
Add the user to sudo users group
usermod -aG sudo <user>
Login to the machine as the user
su - <user>
Create SSH directory with appropriate permissions
mkdir -p $HOME/.ssh chmod 0700 $HOME/.ssh
Generate a key pair for the protocol, and run:
ssh-keygen -t ed25519 -C "My key for DIVOC server"
Share the public key created with users you expect to connect to the server
$HOME/.ssh/id_ed25519.pub
You can modify your SSH configuration to be more secure by performing the following changes to the configuration file:
nano /etc/ssh/sshd_config
Make sure that root cannot login remotely through SSH
PermitRootLogin no
Allow some specific users
AllowUsers [username]
Enable public key-based authentication and disable password-based authentication
PubkeyAuthentication yes PasswordAuthentication no
There are some additional options that must exist in the “sshd_config” file: MaxAuthTries 5
Finally, set the permissions on the sshd_config file so that only root users can change its contents:
chown root:root /etc/ssh/sshd_config
chmod 600 /etc/ssh/sshd_config
All content on this page by eGov Foundation is licensed under a Creative Commons Attribution 4.0 International License.
To recover from broken nodes in the control plane, use the "recover-control-plane.yml" playbook.
Back up what you can.
Provision new nodes to replace the broken ones.
Place the surviving nodes of the control plane first in the "etcd" and "kube_control_plane" groups.
Add the new nodes below the surviving control plane nodes in the "etcd" and "kube_control_plane" groups.
One or more bare metal node(s) suffer from unrecoverable hardware failure.
One or more node(s) fail during patching or upgrading.
Etcd database corruption.
Other node-related failures that leave your control plane degraded or nonfunctional.
Note: You need at least one functional control plane node to be able to recover using this method. If all control planes go down, there is no scope of recovery, and you will have to reinstall Kubernetes. Typically, even if the control plane goes down, the application still functions. Kubernetes functions like scaling, creating new pods, upgrading deployments, etc. will not work. The application, as is available, will continue to function.
Move any broken etcd nodes into the "broken_etcd" group, make sure the "etcd_member_name" variable is set.
Move any broken control plane nodes into the "broken_kube_control_plane" group.
Run the playbook with --limit etcd,kube_control_plane, and increase the number of etdc retries by setting -e etcd_retries=10 or something even larger. The amount of retries required is difficult to predict.
Once you are done, you should have a fully working control plane again.
The playbook attempts to figure out if the etcd quorum is intact. If the quorum is lost, it will attempt to take a snapshot from the first node in the "etcd" group and restore from that.
To restore from an alternate snapshot, set the path to that snapshot in the "etcd_snapshot" variable: -e etcd_snapshot=/tmp/etcd_snapshot.
Currently, you cannot remove the first node in your kube_control_plane and etcd-master list. If you still want to remove this node, you have to do the following:
Modify the order of your control plane list by pushing your first entry to any other position, such as if you want to remove node-1 of the following example:
children: kube_control_plane: hosts: node-1: node-2: node-3: kube_node: hosts: node-1: node-2: node-3: etcd: hosts: node-1: node-2: node-3:
2. Run upgrade-cluster.yml or cluster.yml. After this, you are good to go on with the removal.
Add a new node to the inventory.
Run scale.yml. You can use --limit=NODE_NAME to limit Kubespray to avoid disturbing other nodes in the cluster. Before using --limit, run playbook facts.yml without the limit to refresh facts cache for all nodes.
Remove an old node with remove-node.yml. With the old node still in the inventory, run remove-node.yml. You need to pass -e node=NODE_NAME to the playbook to limit the execution to the node being removed. If the node you want to remove is not online, you should add reset_nodes=false and allow_ungraceful_removal=true to your extra-vars: -e node=NODE_NAME -e reset_nodes=false -e allow_ungraceful_removal=true. Use this flag even when you remove other types of nodes like a control plane or etcd nodes.
Remove node from the inventory.
Append the new host to the inventory and run cluster.yml. You cannot use scale.yml for that.
In all hosts, restart nginx-proxy pod. This pod is a local proxy for the apiserver. Kubespray will update its static config, but it needs to be restarted to reload:
docker ps | grep k8s_nginx-proxy_nginx-proxy | awk '{print $1}' | xargs docker restart
3. With the old node still in the inventory, run remove-node.yml. You need to pass -e node=NODE_NAME to the playbook to limit the execution to the node being removed. If the node you want to remove is not online, you should add reset_nodes=false and allow_ungraceful_removal=true to your extra-vars.
All content on this page by eGov Foundation is licensed under a Creative Commons Attribution 4.0 International License.
Use a Debian-based Linux distribution (preferably Ubuntu)
Experience in running simple shell and bash commands
Debian-based OS (Ubuntu)
sshpass
Ansible
GIT
kubectl
List of servers and ability to access them using key-based authentication
Server map to list servers against software
Access to the DIVOC installer repository
Access to the implementation-specific DIVOC code
The sizing and count of the servers can change based on the load requirements. However, for a truly HA setup, the following are the minimum requirements:
3 servers for HA setup: one master and 2 replicas. The etcd cluster can also be set up on the same servers.
6 servers: 3 for control plane (or master node can be relatively smaller sized instances) and 3 worker nodes (for deploying the application).
3 servers containing both Zookeeper and Kafka (Ideally Zookeeper and Kafka need to be installed on separate servers but we should be fine to install both on the same machine).
3 servers
3 servers
1 server
There are three scripts that need to be run to complete the DIVOC installation process:
Installing the prerequisites and setting various hardware clusters as detailed above.
Building the pushing the docker images to the appropriate registry.
Deploying code from the registry into Kubernetes cluster.
Clone the repository available at https://github.com/egovernments/divoc-installer.
Create an inventory file from the sample inventory file located at https://github.com/egovernments/divoc-installer/blob/master/inventory.example.ini.
Add the inventory details as per the comments present in the file.
Run the install.sh present within the divoc-installer with the elevated privileges (we can also use nohup for running in the background):
sudo sh install.sh -i <path to inventory file>
It will install the dependencies like python3, ansible, etc.
It will install the applications and configure them on the servers mentioned in the inventory file.
Run the build.sh file with elevated privileges.
sudo sh build.sh -d <IP Address of Docker Registry> -r <GIT REPO URL>
Default values for the Docker repository are from dockerhub.
The Default value for the GIT repo is the master branch of the https://github.com/egovernments/DIVOC.git.
The sample default Kubernetes deployment files are available at https://github.com/egovernments/divoc-installer/tree/master/kube-deployment-config-example.
Make a copy of the folder and change the internal script files to have the following configurations. It is recommended that you maintain your own configuration in a separate Github repository so that you have version control and backup (you require only the example folder, not the full repository).
a. Within the divoc-installer director, open the divoc-config.yaml file present within the deployment configuration directory and make the following changes:
- DB_HOST
- DB_USER
- DB_PASS
- DB_PORT
- KAFKA_BOOTSTRAP_SERVERS
- REDIS_URL
- CLICKHOUSE_URL
- AUTH_PRIVATE_KEY
- AUTH_PUBLIC_KEY
- CERTIFICATE_NAMESPACE
- CERTIFICATE_NAMESPACE_V2
- CERTIFICATE_CONTROLLER_ID
- CERTIFICATE_PUBKEY_ID
- CERTIFICATE_DID
- CERTIFICATE_ISSUER
- CERTIFICATE_BASE_URL
- CERTIFICATE_FEEDBACK_BASE_URL
- CERTIFICATE_INFO_BASE_URL
- CERTIFICATE_PUBLIC_KEY
- CERTIFICATE_PRIVATE_KEY
- CITIZEN_PORTAL_URL
b. Modify registry-deployment.yaml to change the following:
- connectionInfo_password
- connectionInfo_uri
- connectionInfo_username
- elastic_search_enabled
- registry_base_apis_enable
- taskExecutor_index_queueCapacity
- auditTaskExecutor_queueCapacity
- Signature_enabled
c. Modify keycloak-deployment.yaml to add the following information:
- DB_ADDR
- DB_DATABASE
- DB_PASSWORD
- DB_PORT
- DB_USER
- DB_VENDOR
- KEYCLOAK_USER
- KEYCLOAK_PASSWORD
- ENABLE_OTP_MESSAGE
- KAFKA_BOOTSTRAP_SERVERS
3. Run the deploy script to deploy the application on Kubernetes.
sudo sh deploy.sh -i <path to inventory file> -p <Directory containing Kubernetes Config files> -d <Private Docker Registry IP> -k <Kube Master Node IP> -s <Key file to access Kube Master>
The indexes for efficient querying of the database tables do not get automatically created and hence need to be created manually. Execute registry_index.sql is present within the DIVOC codebase on the database. A restart of the registry service is required for this change to reflect.
Note: Database tables are only created when the first API request is received.
All content on this page by eGov Foundation is licensed under a Creative Commons Attribution 4.0 International License.
This is the generic solution for backup and restore. Depending on the backup strategy used, the tools might change.
The Ansible script will automatically configure pg_basebackup, pgbackrest, wal-g and other recovery tools. For the sake of simplicity, we can use pg_dump and pg_restore.
The following command takes a backup. This will create a compressed tarball backup in the directory mentioned:
pg_dump -h 192.168.0.100 -U postgres -F c remote_db1 > remote_db1.tar
This can be scheduled using a cron as shown below:
0 0 * * * <path to backup script>
Pg_basebackup is installed along with psql client
sudo apt install postgresql-client
You can restore from pg_dump as follows:
pg_restore -h 192.168.0.100 -U postgres -F -C -d db1 < db1.tar
The plan is to use clickhouse-backup, which is open sourced under the liberal MIT license. This tool has the ability to create archived backups and upload them to NFS, S3, GCS, AZBlob, SFTP and other remote data repositories.
Download the latest release from https://github.com/AlexAkulov/clickhouse-backup/releases
Untar the archive
tar -zxvf clickhouse-backup.tar.gz
Create a configuration file as follows, call it config.ini
general: remote_storage: none # REMOTE_STORAGE, if `none` then `upload` and `download` command will fail max_file_size: 1073741824 # MAX_FILE_SIZE, 1G by default, useless when upload_by_part is true, use for split data parts files by archives disable_progress_bar: true # DISABLE_PROGRESS_BAR, show progress bar during upload and download, have sense only when `upload_concurrency` and `download_concurrency` equal 1 backups_to_keep_local: 0 # BACKUPS_TO_KEEP_LOCAL, how much newest local backup should keep, 0 mean all created backups will keep on local disk # you shall to run `clickhouse-backup delete local <backup_name>` command to avoid useless disk space allocations backups_to_keep_remote: 0 # BACKUPS_TO_KEEP_REMOTE, how much newest backup should keep on remote storage, 0 mean all uploaded backups will keep on remote storage. # if old backup is required for newer incremental backup, then it will don't delete. Be careful with long incremental backup sequences.
log_level: info # LOG_LEVEL allow_empty_backups: false # ALLOW_EMPTY_BACKUPS download_concurrency: 1 # DOWNLOAD_CONCURRENCY, max 255 upload_concurrency: 1 # UPLOAD_CONCURRENCY, max 255 restore_schema_on_cluster: "" # RESTORE_SCHEMA_ON_CLUSTER, execute all schema related SQL queryes with `ON CLUSTER` clause as Distributed DDL, look to `system.clusters` table for proper cluster name upload_by_part: true # UPLOAD_BY_PART download_by_part: true # DOWNLOAD_BY_PART clickhouse: username: default # CLICKHOUSE_USERNAME password: "" # CLICKHOUSE_PASSWORD host: localhost # CLICKHOUSE_HOST port: 9000 # CLICKHOUSE_PORT, don't use 8123, clickhouse-backup doesn't support HTTP protocol disk_mapping: {} # CLICKHOUSE_DISK_MAPPING, use it if your system.disks on restored servers not the same with system.disks on server where backup was created skip_tables: # CLICKHOUSE_SKIP_TABLES - system.* - INFORMATION_SCHEMA.* - information_schema.* timeout: 5m # CLICKHOUSE_TIMEOUT freeze_by_part: false # CLICKHOUSE_FREEZE_BY_PART secure: false # CLICKHOUSE_SECURE, use SSL encryption for
connect
skip_verify: false # CLICKHOUSE_SKIP_VERIFY sync_replicated_tables: true # CLICKHOUSE_SYNC_REPLICATED_TABLES log_sql_queries: true # CLICKHOUSE_LOG_SQL_QUERIES, enable log clickhouse-backup SQL queries on `system.query_log` table inside clickhouse-server debug: false # CLICKHOUSE_DEBUG config_dir: "/etc/clickhouse-server" # CLICKHOUSE_CONFIG_DIR restart_command: "systemctl restart clickhouse-server" # CLICKHOUSE_RESTART_COMMAND, this command use when you try to restore with --rbac or --config options ignore_not_exists_error_during_freeze: true # CLICKHOUSE_IGNORE_NOT_EXISTS_ERROR_DURING_FREEZE, allow avoiding backup failures when you often CREATE / DROP tables and databases during backup creation, clickhouse-backup will ignore `code: 60` and `code: 81` errors during execute `ALTER TABLE ... FREEZE` azblob: endpoint_suffix: "core.windows.net" # AZBLOB_ENDPOINT_SUFFIX account_name: "" # AZBLOB_ACCOUNT_NAME account_key: "" # AZBLOB_ACCOUNT_KEY sas: "" # AZBLOB_SAS use_managed_identity: false # AZBLOB_USE_MANAGED_IDENTITY container: "" # AZBLOB_CONTAINER path: "" # AZBLOB_PATH compression_level: 1 # AZBLOB_COMPRESSION_LEVEL compression_format: tar # AZBLOB_COMPRESSION_FORMAT sse_key: "" # AZBLOB_SSE_KEY buffer_size: 0 # AZBLOB_BUFFER_SIZE, if less or eq 0 then calculated as max_file_size / 10000, between 2Mb and 4Mb max_buffers: 3 # AZBLOB_MAX_BUFFERS s3: access_key: "" # S3_ACCESS_KEY secret_key: "" # S3_SECRET_KEY bucket: "" # S3_BUCKET endpoint: "" # S3_ENDPOINT region: us-east-1 # S3_REGION acl: private # S3_ACL assume_role_arn: "" # S3_ASSUME_ROLE_ARN force_path_style: false # S3_FORCE_PATH_STYLE path: "" # S3_PATH disable_ssl: false # S3_DISABLE_SSL compression_level: 1 # S3_COMPRESSION_LEVEL compression_format: tar # S3_COMPRESSION_FORMAT sse: "" # S3_SSE, empty (default), AES256, or aws:kms disable_cert_verification: false . # S3_DISABLE_CERT_VERIFICATION storage_class: STANDARD # S3_STORAGE_CLASS concurrency: 1 # S3_CONCURRENCY part_size: 0 # S3_PART_SIZE, if less or eq 0 then calculated as max_file_size / 10000 debug: false # S3_DEBUG gcs: credentials_file: "" # GCS_CREDENTIALS_FILE credentials_json: "" # GCS_CREDENTIALS_JSON bucket: "" # GCS_BUCKET path: "" # GCS_PATH compression_level: 1 # GCS_COMPRESSION_LEVEL compression_format: tar # GCS_COMPRESSION_FORMAT debug: false # GCS_DEBUG cos: url: "" # COS_URL timeout: 2m # COS_TIMEOUT secret_id: "" # COS_SECRET_ID secret_key: "" # COS_SECRET_KEY path: "" # COS_PATH compression_format: tar # COS_COMPRESSION_FORMAT compression_level: 1 # COS_COMPRESSION_LEVEL ftp: address: "" # FTP_ADDRESS timeout: 2m # FTP_TIMEOUT username: "" # FTP_USERNAME password: "" # FTP_PASSWORD tls: false # FTP_TLS path: "" # FTP_PATH compression_format: tar # FTP_COMPRESSION_FORMAT compression_level: 1 # FTP_COMPRESSION_LEVEL debug: false # FTP_DEBUG sftp: address: "" # SFTP_ADDRESS username: "" # SFTP_USERNAME password: "" # SFTP_PASSWORD key: "" # SFTP_KEY path: "" # SFTP_PATH concurrency: 1 # SFTP_CONCURRENCY compression_format: tar # SFTP_COMPRESSION_FORMAT compression_level: 1 # SFTP_COMPRESSION_LEVEL debug: false # SFTP_DEBUG api: listen: "localhost:7171" # API_LISTEN enable_metrics: true # API_ENABLE_METRICS enable_pprof: false # API_ENABLE_PPROF username: "" # API_USERNAME, basic authorization for API endpoint password: "" # API_PASSWORD secure: false # API_SECURE, use TLS for listen API socket certificate_file: "" # API_CERTIFICATE_FILE private_key_file: "" # API_PRIVATE_KEY_FILE create_integration_tables: false # API_CREATE_INTEGRATION_TABLES allow_parallel: false # API_ALLOW_PARALLEL, could allocate much memory and spawn go-routines, don't enable it if you not sure
Ensure configuration under clickhouse and general section of the configuration file. The rest are not mandatory.
If automated remote upload functionality is needed, the appropriate section needs to be filled in: sftp, ftp, s3, GCS, AZBlob, etc.
The following command can be run:
sh <path-to-cllickhouse-backup-dir>/bin/clickhouse-backup create -C <path to config.ini>
The following is the list of possible commands which can be executed:
COMMANDS: tables Print list of tables create Create new backup create_remote Create and upload upload Upload backup to remote storage list Print list of backups download Download backup from remote storage restore Create schema and restore data from backup restore_remote Download and restore delete Delete specific backup default-config Print default config print-config Print current config clean Remove data in 'shadow' folder from all `path` folders available from `system.disks` server Run API server help, h Shows a list of commands or help for one command
Backup zookeeper state data
- Go to file
kafka/config/zookeeper.properties
- Copy location of dataDir property (typically, /tmp/zookeeper)
- Run the following command:
tar -czf /home/kafka/zookeeper-backup.tar.gz /tmp/zookeeper/*
Backup Kafka topics and messages
- Go to the file kafka/config/server.properties
- Copy location of log.dirs (typically, /tmp/kafka-logs)
- Stop Kafka:
sudo systemctl stop kafka
- Login as kafka user:
sudo -iu kafka
- Run the following command:
tar -czf /home/kafka/kafka-backup.tar.gz /tmp/kafka-logs/*
Restore zookeeper
- sudo systemctl stop kafka
- sudo systemctl stop zookeeper
- sudo -iu kafka
- rm -r /tmp/zookeeper/*
- tar -C /tmp/zookeeper -xzf /home/kafka/zookeeper-backup.tar.gz
--strip-components 2
Restore kafka
- rm -r /tmp/kafka-logs/*
- tar -C /tmp/kafka-logs -xzf /home/kafka/kafka-backup.tar.gz --strip-components 2
- sudo systemctl start kafka
- sudo systemctl start zookeeper
Verification of Restoration
- ~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic
BackupTopic --from-beginning
Redis provides an in-built command to save a backup.
Install redis-cli using
sudo apt install redis-cli
The following command takes a backup of the redis-server:
echo save | redis-cli -u redis://<user>:<pass>@<host>:<port> >> /tmp/redis-backup.log
This will save the backup as dump.rdb within:
/var/lib/redis
Restoration can be done in the following way:
Locate the redis data directory, typically:
/var/lib/redis
Move the dump.rdb file into this folder
Start redis server
This will ensure that data is restored automatically
This section will cover the following:
All content on this page by eGov Foundation is licensed under a Creative Commons Attribution 4.0 International License.
All content on this page by eGov Foundation is licensed under a Creative Commons Attribution 4.0 International License.