Bryan Apperson
From the Cloud to The Ocean
A Ceph cluster on Raspberry Pi is an awesome way to create a RADOS home storage solution (NAS) that is highly redundant and low power usage. It’s also a low cost way to get into Ceph, which may or may not be the future of storage (software defined storage definitely is as a whole). Ceph on ARM is an interesting idea in and of itself. I built one of these as a development environment (playground) for home. It can be done on a relatively small budget. Since this was a spur of the moment idea, I purchased everything locally. I opted for the Raspberry Pi 2 B (for the 4 cores and 1GB of RAM). I’d really recommend going with the Pi 2 B, so you have one core and 256MB RAM for each USB port (potential OSD). In this guide I will outline the parts, software I used and some options that you can use for achieving better performance. This guide assumes you have access to a Linux PC with an SD card reader. It also assumes you have a working knowledge of Linux in general and a passing familiarity with Ceph.
Parts
Although I will explain many options in this guide, this is the minimum you will need to get a cluster up and running, this list assumes 3 Pi nodes.
1234567 3 x 3ft Cat6 Cables3 x Raspberry Pi 2 B3 x Raspberry Pi 2 B Case3 x 2 Amp Micro USB Power Supply3 empty ports on a gigabit router3 x Class 10 MicroSD (16GB or more) for OS drive3–12 x USB 2.0 Flash Drives (at least 32GB, better drive for better performance)I used 3 x 64GB flash drives, 3 x 32GB MicroSD and existing ports on my router. My cost came in at about $250. You can add to this list based on what you add to your setup throughout the guide, but this is pretty much the minimum for a fully functional Ceph cluster.
Operating System
Raspbian. The testing repository for Raspbian has the many packages of Ceph 0.80.9 and dependencies pre-compiled. Everything you’ll need for this tutorial and is the “de facto” OS of choice for flexibility on Raspberry Pi. You can download the Raspbian image here: Raspbian Download. Once you have the image, you’ll want to put it on an SD card. For this application I recommend using at least a 16GB MicroSD card (Class 10 preferably – OS drive speed matters for Ceph monitor processes). To transfer the image on Linux, you can use DD. run the lsblk command to display your devices once you’ve inserted the card into your card reader. Then you can use dd to transfer the image to your SD. The command below assumes the image name is raspbian–wheezy.img and that it lives in your present working directory. The above command also assumes that your SD card is located at /dev/mmcblk0 adjust these accordingly and make sure that your SD card doesn’t contain anything important and is empty.
1 sudo dd bs=4M if=raspbian–wheezy.img of=/dev/mmcblk0This command will take a few minutes to complete. Once it does run sync to flush all cache to disk and make sure it is safe to remove the device. You’ll then boot up into Raspbian, re-size the image to the full size of your MicroSD, set a memorable password, overclock if you want.
Once this is done there are a few modifications to make. We’ll get into this in the installation section below. I don’t recommend using too large of a MicroSD as later in this tutorial we will image the whole OS from our first MicroSD for deployment to our other Pi nodes.
Hardware Limitations
The first limitation to consider is overall storage space. Ceph OSD processes require roughly 1MB of RAM per GB of storage. Since we are co-locating monitor processes the effective storage limitation is 512GB per Pi 2 B (4 x 128GB sticks) RAW (before Ceph replication or erasure coding overhead). Network speed is also a factor as discussed later in document. You will hit network speed limitations before you hit the speed limitations of the Pi 2 B’s single USB 2.0 bus (480Mbit).
Network
In this setup I used empty ports on my router. I run a local DNS server on my home router and use static assignments for local DNS. You may want to consider just using a flat 5 or 8 port (depending on number of nodes you plan to have) gigabit switch for the cluster network and WiPi modules for the public (connected to your router via WiFi). The nice thing about using a flat layer 2 switch is that if all the Pi nodes are in the same subnet, you don’t have to worry about a gateway and it also keeps the cost down (compared to using router ports) while reducing the network overhead (for Ceph replication) on your home network. Using a dedicated switch for the cluster network will also increase your cluster performance, especially considering the 100Mbit limitations of the Pi 2 B’s network port. By using a BGN Dongle for Pi and a dedicated switch for the cluster network, you will get a speedier cluster. This will use one of your 4 USB ports and thus, you will get one less OSD per Pi. Keep in mind, depending on if you use replication or erasure coding private traffic can be 1-X times greater then client IO (X being 3 in a standard replication profile) if that matters for your application. Of course this is all optional and for additional “clustery goodness”. It really depends on budget, usage – etcetera.
Object Storage Daemons
In this guide, I co-located OSD journals on the OSD drives. For better performance, you can use a faster USB like the SanDisk Extreme 3.0 (keep in mind that you’ll be limited by the 60MB/s speed of USB 2.0). Using a dedicated (faster) journal drive will yield much better performance. But you don’t really need to worry about it unless you are using multiple networks as outlined above. If you are not, 4 decent USB sticks will saturate your 100Mbit NIC per node. There is a lot more to learn about Ceph architecture that I cover in this article and I highly recommend you do so here.
OSD Filesystem
XFS is the default in Ceph Firefly. I prefer BTRFS as an OSD filesystem for multi-fold reasons and I use it in this tutorial.
Installation
Assuming you have setup your network and operating system – have 3 nodes and the hardware you want to use – we can begin. The first thing to do is wire up power and network as you see fit. After that, you’ll want to run through the initial raspi–config on what will become your admin node. Then it’s time to make some changes. Once your admin node is booted and configured, you have to edit /etc/apt/sources.list . Raspbian Wheezy has archaic versions of Ceph in the main repository, but the latest firefly version in the testing repository. Before we delve into this, I find it useful to install some basic tools and requirements. Connect via SSH or directly to terminal and issue this command from the Pi:
1 sudo apt–get install vim screen htop iotop btrfs–tools lsb–release gdiskFrom this point forward we will assume you are connecting to your Pi nodes via SSH. You’ve just installed BTRFS-tools, vim (better then vi) and some performance diagnostics tools I like. Now that we have vim it’s time to edit our sources:
1 vi /etc/apt/sources.listYou’ll see the contents of your sources file. Which will look like this:
123 deb http://mirrordirector.raspbian.org/raspbian/ wheezy main contrib non-free rpi# Uncomment line below then ‘apt-get update’ to enable ‘apt-get source’#deb-src http://archive.raspbian.org/raspbian/ wheezy main contrib non-free rpiModify it to look like this:
123 deb http://mirrordirector.raspbian.org/raspbian/ testing main contrib non-free rpi# Uncomment line below then ‘apt-get update’ to enable ‘apt-get source’#deb-src http://archive.raspbian.org/raspbian/ testing main contrib non-free rpiWe’ve replaced wheezy with testing .Once this is done, then issue this command:
1 sudo apt–get updateOnce this process has completed is time to start getting the OS ready for Ceph. Everything we do in this section up to the point of imaging the OS is needed for nodes that will run Ceph.
First we will create a ceph user and give it password-less sudo access. To do so issue these commands:
123 ssh user@ceph–serversudo useradd –d /home/ceph –m cephsudo passwd cephSet the password to a memorable one as it will be used on all of your nodes in this guide. Now we need to give the ceph user sudo access
12 echo “ceph ALL = (root) NOPASSWD:ALL” | sudo tee /etc/sudoers.d/cephsudo chmod 0440 /etc/sudoers.d/cephWe’ll be using ceph-deploy later and it’s best to have a defult user to login as all the time. Issue this command:
1 mkdir –p ~/.ssh/Then create this file using vi:
1 vi ~/.ssh/configI assume 3 nodes in this tutorial and a naming convention of piY, where Y is the node number starting from 1.
123456789 Host pi1Hostname pi1User cephHost pi2Hostname pi2User cephHost pi3Hostname pi3User cephSave the file and exit. As far as hostnames, you can use whatever you want of course. As I mentioned, I run local DNS and DHCP with static assignments. If you do not, you’ll need to edit /etc/hosts so that your nodes can resolve each-other. You can do this after the OS image, as each node will have a different IP.
Now it’s time to install the ceph–deploy tool. Raspbian wget can be strange with HTTPS so we will ignore the certificate (do so at your own peril):
12 wget —no–check–certificate –q –O– ‘https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc’ | sudo apt–key add –echo deb http://ceph.com/debian-firefly/ wheezy main | sudo tee /etc/apt/sources.list.d/ceph.listNow that we’ve added the Ceph repository, we can install ceph-deploy:
1 sudo apt–get update && sudo apt–get install ceph–deploy ceph ceph–commonSince we are installing ceph from the Raspbian repositories, we need to change the default behavior of ceph-deploy:
1 sudo vi /usr/share/pyshared/ceph_deploy/hosts/debian/install.pyChange
123 def install(distro, version_kind, version, adjust_repos):codename = distro.codenamemachine = distro.machine_typeTo
1234 def install(distro, version_kind, version, adjust_repos):adjust_repos = Falsecodename = distro.codenamemachine = distro.machine_typeThis will prevent ceph-deploy from altering repos as the Ceph armhf (Rasberry Pi’s processor type) repos are mostly empty.
Finally, we should revert the contents of /etc/apt/sources.list :
1 sudo vi /etc/apt/sources.listYou’ll see the contents of your sources file. Which will look like this:
123 deb http://mirrordirector.raspbian.org/raspbian/ testing main contrib non-free rpi# Uncomment line below then ‘apt-get update’ to enable ‘apt-get source’#deb-src http://archive.raspbian.org/raspbian/ testing main contrib non-free rpiModify it to look like this:
123 deb http://mirrordirector.raspbian.org/raspbian/ wheezy main contrib non-free rpi# Uncomment line below then ‘apt-get update’ to enable ‘apt-get source’#deb-src http://archive.raspbian.org/raspbian/ wheezy main contrib non-free rpi
We’ve replaced testing with wheezy .Once this is done, then issue this command:
1 sudo apt–get update
Kernel Tweaks
We are also going to tweak some kernel parameters for better stability. To do so we will edit /etc/sysctl.conf .
1 vi /etc/sysctl.confAt the bottom of the file, change add the following lines:
123 vm.swappiness=1vm.min_free_kbytes = 32768kernel.pid_max = 32768Imaging the OS
Now we have a good baseline for deploying ceph to our other Pi nodes. It’s time to stop our admin node and image the drive (MicroSD). Issue:
1 sudo haltThen unplug power to your Pi node and remove the MicroSD. Insert the microSD in your SD adapter, then the SD adapter into your Linux PC. You’ll need at least as much free drive space on your PC as the size of the MicroSD card.Where /dev/mmcblk0 is your SD card and pi-ceph.img is your image destination, run:
1 sudo dd if=/dev/mmcblk0 of=ceph–pi.img bs=4MThis can take a vary long time depending on the size of your SD and you can compress it with gzip or xz for long term storage (empty space compresses really well it turns out). Once the command returns, run sync to flush the cache to disk and make sure you can remove the MicroSD
Imaging Your Nodes OS Drives
Now that you have a good baseline image on your PC, you are ready to crank out “Ceph-Pi” nodes – without redoing all of the above. To do so, insert a fresh MicroSD into your adapter and then PC. Then assuming ceph–pi.img is your OS image and /dev/mmcblk0 is your MicroSD card run:
1 sudo dd if=ceph–pi.img of=/dev/mmcblk0 bs=4MRepeat this for a many nodes as you intend to deploy.
Create a Ceph Cluster on Raspberry Pi
Insert your ceph-pi MicroSD cards into your Pi nodes and power them all on. You’ve made it this far, now it’s time to get “cephy”. Deploying with ceph-deploy is a breeze. First we need to SSH to our admin node, make sure you have setup IPs, network and /etc/hosts on all Pi nodes if you are not using local DNS and DHCP with static assignments.
We need to generate and distribute an SSH key for password-less authentication between nodes. To do so run (leave the password blank):
1234567 ssh–keygenGenerating public/private key pair.Enter file in which to save the key (/ceph–client/.ssh/id_rsa):Enter passphrase (empty for no passphrase):Enter same passphrase again:Your identification has been saved in /ceph–client/.ssh/id_rsa.Your public key has been saved in /ceph–client/.ssh/id_rsa.pub.Now copy the key to all nodes (assuming 3 with the naming convention from above):
123 ssh–copy–id ceph@pi1ssh–copy–id ceph@pi2ssh–copy–id ceph@pi3You will be prompted for the password you created for the ceph user each time to establish initial authentication.
Once that is done and you are connected to your admin node (1st node in the cluster) as the pi user you’ll want to create an admin node directory:
12 mkdir –p ~/ceph–pi–clustercd ~/ceph–pi–clusterCreating an initial Ceph Configuration
We are going to create an initial Ceph configuration assuming all 3 pi nodes as monitors. If you have more, keep in mind – you always want an odd number of monitors to avoid a split-brain scenario. To to this run:
1 ceph–deploy new pi1 pi2 pi3Now there are some special tweaks that should be made for best stability and performance within the hardware limitations of the Raspberry Pi 2 B. To apply these changes we’ll need to edit the ceph.conf here on the admin node before it is distributed. To do so:
1 vi ~/ceph–pi–cluster/ceph.confAfter the existing lines add:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960 # Disable in-memory logsdebug_lockdep = 0/0debug_context = 0/0debug_crush = 0/0debug_buffer = 0/0debug_timer = 0/0debug_filer = 0/0debug_objecter = 0/0debug_rados = 0/0debug_rbd = 0/0debug_journaler = 0/0debug_objectcatcher = 0/0debug_client = 0/0debug_osd = 0/0debug_optracker = 0/0debug_objclass = 0/0debug_filestore = 0/0debug_journal = 0/0debug_ms = 0/0debug_monc = 0/0debug_tp = 0/0debug_auth = 0/0debug_finisher = 0/0debug_heartbeatmap = 0/0debug_perfcounter = 0/0debug_asok = 0/0debug_throttle = 0/0debug_mon = 0/0debug_paxos = 0/0debug_rgw = 0/0osd heartbeat grace = 8[mon]mon compact on start = truemon osd down out subtree_limit = host[osd]# Filesystem Optimizationsosd mkfs type = btrfsosd journal size = 1024# Performance tuningmax open files = 327680osd op threads = 2filestore op threads = 2#Capacity Tuningosd backfill full ratio = 0.95mon osd nearfull ratio = 0.90mon osd full ratio = 0.95# Recovery tuningosd recovery max active = 1osd recovery max single start = 1osd max backfills = 1osd recovery op priority = 1# Optimize Filestore Merge and Splitfilestore merge threshold = 40filestore split multiple = 8
Creating Initial Monitors
Now we can deploy our spiffy ceph.conf, create our initial monitor daemons, deploy our authentication keyring and chmod it as needed. We will be deploying to all 3 nodes for the purposes of this guide:
123 ceph–deploy mon create–initialceph–deploy admin pi1 pi2 pi3for i in pi1 pi2 pi3;do ssh $i chmod 644 /etc/ceph/ceph.client.admin.keyring;doneCreating OSDs (Object Storage Daemons)
Ready to create some storage? I know I am. Insert your USB keys of choice into your Pi USB ports. For the purposes of this guide I will be deploying 1 OSD (USB key) per Pi node. I will also be using the BTRFS filesystem and co-locating the journals on the OSDs with a default journal size of 1GB (assuming 2 * 40MB/s throughput max and a default filestor max sync interval of 5). This value is hard coded into our ceph-pi config above. The formula is:
1 osd journal size = {2 * (expected throughput * filestore max sync interval)}So let’s deploy our OSDs. Once our USBs are plugged in, use lsblk to display the device locations. To make sure our drives are clean and have a GPT partition table, use the gdisk command for each OSD on each node. Assuming /dev/sda as our OSD:
gdisk /dev/sda
Create a new partition table, write it to disk and exit. Do this for each OSD on each node. You can craft a bash for loop if you are feeling “bashy” or programmatic.
Once all OSD drives have a fresh partition table you can use ceph-deploy to create your OSDs (using BTRFS for this guide) where pi1 is our present node and /dev/sda is the OSD we are creating:
1 ceph–deploy osd create —fs–type btrfs pi1:/dev/sdaRepeat this for all OSD drives on all nodes (or write a for loop). Once you’ve created at least 3 you are ready to move on.
Checking Cluster Health
Congratulations! You should have a working Ceph-Pi cluster. Trust, but verify. Get the health status of your cluster using this command:
1 ceph –sand for a less verbose output
1 ceph healthWhat to do now?
Use your storage cluster! Create an RBD, mount it – export NFS or CIFS. There is a lot of reading out there. Now you know how to deploy a Ceph cluster on Raspberry Pi.
References
http://millibit.blogspot.com/2014/12/ceph-pi-installing-ceph-on-raspberry-pi.html
http://ceph.com/docs/v0.80.5/start/
https://www.raspberrypi.org/14 Comments
Building a Ceph Cluster on Raspberry Pi
Pages: 1 2
Hi,
This looks like an interesting use of the Raspberry Pi, but I wonder if this is really that cost-effective of a solution?
When I crunched the numbers, it came out to about $1 / GB of storage, if you maxed out your nodes with 4 128GB drives and had 3 replicas… but it seems like, once you need to scale above a TB or so of storage, it’s more cost effective to just build “real” servers using spinning drives at a much higher capacity per node?
Of course, this is more of a proof-of-concept for learning ceph. Not meant to be cheaper per GB, but cheaper for initial cost. A x86_64 ceph cluster with 10Gbit networking costs 5 figures. This is a 3 figure cost of entry way to begin learning ceph.
Hey, its working now on my 3 raspberrys 2 too with saltstack implementation and automatical installation script :)!
Thanks for this documentation!
Overread that you changed the source.list two times and only for the ceph installation on the first try.
Yeah, I automated install as well. However I am a fan of making people perform the commands so that they learn rather then:
wget bash.sh
chmod 755 bash.sh
sudo ./bash.sh
Teaches bad form (and security)!
Thanks for going through the tutorial. Is there a link to your implementation for others to use?
I’ve done something similar with a 6+1 node Pi cluster running Ceph. I’m currently using 24x 8GB USB sticks as storage.
Hi Bryan,
I’m getting stuck at the apt-get install ceph-deploy with the following error:
Reading state information… Done
E: Unable to locate package ceph-deploy
Any thoughts on why this may be? Using Wheezy, also tried Jessie same result.
ceph and ceph-common have been installed.
Thanks,
Niels
just a note: doesn’t work for Debian Jessie. I either have to backport to Wheezy (not optimal) or go through a ton of various hacking and such without using ceph-deploy.
just a heads up.
Hi Bryan,
Great article on Ceph installation. I have one problem, at the step of installing ceph-deploy. It is not found in the package, I have tried different revisions of ceph and the package is just not found. The ceph and ceph-common packages are installed fine. What might I be doing wrong?
Thanks for this intro to a cost effective ceph cluster
Cheers,
Niels
It may not be in the repository any longer. Have you looked in the ceph repos? You may be able to pull it down individually.
Hi Bryan,
Very usefull article Thanks for posting, I wanted to implement a storage server within area, where client side is windows OS, so is it possible to implement this project.
Yes, you’ll probably want to use CIFS or NFS to export an RBD image.
Thanks a lot for this nice tutorial. Quick question: my deployment fails when I do:
ceph-deploy mon create-initial
It connects to the remote host, runs a bunch of stuff, then comes up with this error:
Failed to execute command: sudo systemctl enable ceph.target
I’m stuck; don’t know what to do next. If I run that command manually, I get the same message.
Failed to execute operation: No such file or directory
HI Bryan,
Do you have any experience with the Ubuntu Mate on arm processor? I have the new Odroid which is much better (hardware-wise) than Rpi and I have trouble getting stuff to work. It installs CEPH just fine from repositories, but then.. I’m stuck
Any advice