Me, elsewhere on the net
  • Mastodon
  • Telegram
  • Facebook
  • X
  • VK
  • Instagram

Paolo Redaelli

A civil engineer with a longlife fondness for Software Libero
  • Home
  • Contatti
  • Gnome Apps
  • Info
  • La conta dei biscotti

Building a Ceph Cluster on Raspberry Pi

Paolo Redaelli2017-05-092017-05-09

Bryan Apperson

From the Cloud to The Ocean

 

The Definitive Guide: Ceph Cluster on Raspberry Pi

May 13, 201515 minute readby Bryan Apperson
In Development, Linux Tutorials, Technology

3 Node Ceph Cluster on Raspberry Pi

 

A Ceph cluster on Raspberry Pi is an awesome way to create a RADOS home storage solution (NAS) that is highly redundant and low power usage. It’s also a low cost way to get into Ceph, which may or may not be the future of storage (software defined storage definitely is as a whole). Ceph on ARM is an interesting idea in and of itself. I built one of these as a development environment (playground) for home.  It can be done on a relatively small budget. Since this was a spur of the moment idea, I purchased everything locally. I opted for the Raspberry Pi 2 B (for the 4 cores and 1GB of RAM). I’d really recommend going with the Pi 2 B, so you have one core and 256MB RAM for each USB port (potential OSD). In this guide I will outline the parts, software I used and some options that you can use for achieving better performance. This guide assumes you have access to a Linux PC with an SD card reader. It also assumes you have a working knowledge of Linux in general and a passing familiarity with Ceph.

Parts

Although I will explain many options in this guide, this is the minimum you will need to get a cluster up and running, this list assumes 3 Pi nodes.

1
2
3
4
5
6
7
3 x 3ft Cat6 Cables
3 x Raspberry Pi 2 B
3 x Raspberry Pi 2 B Case
3 x 2 Amp Micro USB Power Supply
3 empty ports on a gigabit router
3 x Class 10 MicroSD (16GB or more) for OS drive
3–12 x USB 2.0 Flash Drives (at least 32GB, better drive for better performance)

I used 3 x 64GB flash drives, 3 x 32GB MicroSD and existing ports on my router. My cost came in at about $250. You can add to this list based on what you add to your setup throughout the guide, but this is pretty much the minimum for a fully functional Ceph cluster.

Operating System

Raspbian. The testing repository for Raspbian has the many packages of Ceph 0.80.9 and dependencies pre-compiled. Everything you’ll need for this tutorial and is the “de facto” OS of choice for flexibility on Raspberry Pi. You can download the Raspbian image here: Raspbian Download. Once you have the image, you’ll want to put it on an SD card. For this application I recommend using at least a 16GB MicroSD card (Class 10 preferably – OS drive speed matters for Ceph monitor processes). To transfer the image on Linux, you can use DD. run the lsblk command to display your devices once you’ve inserted the card into your card reader. Then you can use dd  to transfer the image to your SD. The command below assumes the image name is raspbian–wheezy.img  and that it lives in your present working directory. The above command also assumes that your SD card is located at /dev/mmcblk0 adjust these accordingly and make sure that your SD card doesn’t contain anything important and is empty.

1
sudo dd bs=4M if=raspbian–wheezy.img of=/dev/mmcblk0

This command will take a few minutes to complete. Once it does run sync to flush all cache to disk and make sure it is safe to remove the device. You’ll then boot up into Raspbian, re-size the image to the full size of your MicroSD, set a memorable password, overclock if you want.

Once this is done there are a few modifications to make. We’ll get into this in the installation section below. I don’t recommend using too large of a MicroSD as later in this tutorial we will image the whole OS from our first MicroSD for deployment to our other Pi nodes.

Hardware Limitations

The first limitation to consider is overall storage space. Ceph OSD processes require roughly 1MB of RAM per GB of storage. Since we are co-locating monitor processes the effective storage limitation is 512GB per Pi 2 B (4 x 128GB sticks) RAW (before Ceph replication or erasure coding overhead). Network speed is also a factor as discussed later in document. You will hit network speed limitations before you hit the speed limitations of the Pi 2 B’s single USB 2.0 bus (480Mbit).

Network

In this setup I used empty ports on my router. I run a local DNS server on my home router and use static assignments for local DNS. You may want to consider just using a flat 5 or 8 port (depending on number of nodes you plan to have) gigabit switch for the cluster network and WiPi modules for the public (connected to your router via WiFi). The nice thing about using a flat layer 2 switch is that if all the Pi nodes are in the same subnet, you don’t have to worry about a gateway and it also keeps the cost down (compared to using router ports) while reducing the network overhead (for Ceph replication) on your home network. Using a dedicated switch for the cluster network will also increase your cluster performance, especially considering the 100Mbit limitations of the Pi 2 B’s network port. By using a BGN Dongle for Pi  and a dedicated switch for the cluster network, you will get a speedier cluster. This will use one of your 4 USB ports and thus, you will get one less OSD per Pi. Keep in mind, depending on if you use replication or erasure coding private traffic can be 1-X times greater then client IO  (X being 3 in a standard replication profile) if that matters for your application. Of course this is all optional and for additional “clustery goodness”. It really depends on budget, usage – etcetera.

Object Storage Daemons

In this guide, I co-located OSD journals on the OSD drives. For better performance, you can use a faster USB like the SanDisk Extreme 3.0 (keep in mind that you’ll be limited by the 60MB/s speed of USB 2.0). Using a dedicated (faster) journal drive will yield much better performance. But you don’t really need to worry about it unless you are using multiple networks as outlined above. If you are not, 4 decent USB sticks will saturate your 100Mbit NIC per node. There is a lot more to learn about Ceph architecture that I cover in this article and I highly recommend you do so here.

OSD Filesystem

XFS is the default in Ceph Firefly. I prefer BTRFS as an OSD filesystem for multi-fold reasons and I use it in this tutorial.

Installation

Assuming you have setup your network and operating system – have 3 nodes and the hardware you want to use – we can begin. The first thing to do is wire up power and network as you see fit. After that, you’ll want to run through the initial raspi–config on what will become your admin node. Then it’s time to make some changes. Once your admin node is booted and configured, you have to edit /etc/apt/sources.list . Raspbian Wheezy has archaic versions of Ceph in the main repository, but the latest firefly version in the testing repository. Before we delve into this, I find it useful to install some basic tools and requirements. Connect via SSH or directly to terminal and issue this command from the Pi:

1
sudo apt–get install vim screen htop iotop btrfs–tools lsb–release gdisk

From this point forward we will assume you are connecting to your Pi nodes via SSH. You’ve just installed BTRFS-tools, vim (better then vi) and some performance diagnostics tools I like. Now that we have vim  it’s time to edit our sources:

1
vi /etc/apt/sources.list

You’ll see the contents of your sources file. Which will look like this:

1
2
3
deb http://mirrordirector.raspbian.org/raspbian/ wheezy main contrib non-free rpi
# Uncomment line below then ‘apt-get update’ to enable ‘apt-get source’
#deb-src http://archive.raspbian.org/raspbian/ wheezy main contrib non-free rpi

Modify it to look like this:

1
2
3
deb http://mirrordirector.raspbian.org/raspbian/ testing main contrib non-free rpi
# Uncomment line below then ‘apt-get update’ to enable ‘apt-get source’
#deb-src http://archive.raspbian.org/raspbian/ testing main contrib non-free rpi

We’ve replaced wheezy  with testing .Once this is done, then issue this command:

1
sudo apt–get update

Once this process has completed is time to start getting the OS ready for Ceph. Everything we do in this section up to the point of imaging the OS is needed for nodes that will run Ceph.

First we will create a ceph user and give it password-less sudo access. To do so issue these commands:

1
2
3
ssh user@ceph–server
sudo useradd –d /home/ceph –m ceph
sudo passwd ceph

Set the password to a memorable one as it will be used on all of your nodes in this guide. Now we need to give the ceph user sudo access

1
2
echo “ceph ALL = (root) NOPASSWD:ALL” | sudo tee /etc/sudoers.d/ceph
sudo chmod 0440 /etc/sudoers.d/ceph

We’ll be using ceph-deploy later and it’s best to have a defult user to login as all the time. Issue this command:

1
mkdir –p ~/.ssh/

Then create this file using vi:

1
vi ~/.ssh/config

I assume 3 nodes in this tutorial and a naming convention of piY, where Y is the node number starting from 1.

1
2
3
4
5
6
7
8
9
Host pi1  
   Hostname pi1  
   User ceph  
Host pi2  
   Hostname pi2  
   User ceph  
Host pi3  
   Hostname pi3  
   User ceph

Save the file and exit. As far as hostnames, you can use whatever you want of course. As I mentioned, I run local DNS and DHCP with static assignments. If you do not, you’ll need to edit /etc/hosts  so that your nodes can resolve each-other. You can do this after the OS image, as each node will have a different IP.

Now it’s time to install the ceph–deploy tool. Raspbian wget  can be strange with HTTPS so we will ignore the certificate (do so at your own peril):

1
2
wget —no–check–certificate –q –O– ‘https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc’ | sudo apt–key add –
echo deb http://ceph.com/debian-firefly/ wheezy main | sudo tee /etc/apt/sources.list.d/ceph.list

Now that we’ve added the Ceph repository, we can install ceph-deploy:

1
sudo apt–get update && sudo apt–get install ceph–deploy ceph ceph–common

Since we are installing ceph from the Raspbian repositories, we need to change the default behavior of ceph-deploy:

1
sudo vi /usr/share/pyshared/ceph_deploy/hosts/debian/install.py

Change

1
2
3
def install(distro, version_kind, version, adjust_repos):  
   codename = distro.codename  
   machine = distro.machine_type

To

1
2
3
4
def install(distro, version_kind, version, adjust_repos):  
   adjust_repos = False
   codename = distro.codename  
   machine = distro.machine_type

This will prevent ceph-deploy from altering repos as the Ceph armhf (Rasberry Pi’s processor type) repos are mostly empty.

Finally, we should revert the contents of /etc/apt/sources.list :

1
sudo vi /etc/apt/sources.list

You’ll see the contents of your sources file. Which will look like this:

1
2
3
deb http://mirrordirector.raspbian.org/raspbian/ testing main contrib non-free rpi
# Uncomment line below then ‘apt-get update’ to enable ‘apt-get source’
#deb-src http://archive.raspbian.org/raspbian/ testing main contrib non-free rpi

Modify it to look like this:

1
2
3
deb http://mirrordirector.raspbian.org/raspbian/ wheezy main contrib non-free rpi
# Uncomment line below then ‘apt-get update’ to enable ‘apt-get source’
#deb-src http://archive.raspbian.org/raspbian/ wheezy main contrib non-free rpi

 

We’ve replaced testing  with wheezy .Once this is done, then issue this command:

1
sudo apt–get update

 

Kernel Tweaks

We are also going to tweak some kernel parameters for better stability. To do so we will edit /etc/sysctl.conf .

1
vi /etc/sysctl.conf

At the bottom of the file, change add the following lines:

1
2
3
vm.swappiness=1
vm.min_free_kbytes = 32768
kernel.pid_max = 32768

Imaging the OS

Now we have a good baseline for deploying ceph to our other Pi nodes. It’s time to stop our admin node and image the drive (MicroSD). Issue:

1
sudo halt

Then unplug power to your Pi node and remove the MicroSD. Insert the microSD in your SD adapter, then the SD adapter into your Linux PC. You’ll need at least as much free drive space on your PC as the size of the MicroSD card.Where /dev/mmcblk0 is your SD card and pi-ceph.img is your image destination, run:

1
sudo dd if=/dev/mmcblk0 of=ceph–pi.img bs=4M

This can take a vary long time depending on the size of your SD and you can compress it with gzip  or xz  for long term storage (empty space compresses really well it turns out). Once the command returns, run sync  to flush the cache to disk and make sure you can remove the MicroSD

Imaging Your Nodes OS Drives

Now that you have a good baseline image on your PC, you are ready to crank out “Ceph-Pi” nodes – without redoing all of the above. To do so, insert a fresh MicroSD into your adapter and then PC. Then assuming ceph–pi.img  is your OS image and /dev/mmcblk0 is your MicroSD card run:

1
sudo dd if=ceph–pi.img of=/dev/mmcblk0 bs=4M

Repeat this for a many nodes as you intend to deploy.

Create a Ceph Cluster on Raspberry Pi

Insert your ceph-pi MicroSD cards into your Pi nodes and power them all on. You’ve made it this far, now it’s time to get “cephy”. Deploying with ceph-deploy is a breeze. First we need to SSH to our admin node, make sure you have setup IPs, network and /etc/hosts on all Pi nodes if you are not using local DNS and DHCP with static assignments.

We need to generate and distribute an SSH key for password-less authentication between nodes. To do so run (leave the password blank):

1
2
3
4
5
6
7
ssh–keygen
Generating public/private key pair.
Enter file in which to save the key (/ceph–client/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /ceph–client/.ssh/id_rsa.
Your public key has been saved in /ceph–client/.ssh/id_rsa.pub.

Now copy the key to all nodes (assuming 3 with the naming convention from above):

1
2
3
ssh–copy–id ceph@pi1  
ssh–copy–id ceph@pi2  
ssh–copy–id ceph@pi3

You will be prompted for the password you created for the ceph user each time to establish initial authentication.

Once that is done and you are connected to your admin node (1st node in the cluster) as the pi user you’ll want to create an admin node directory:

1
2
mkdir –p ~/ceph–pi–cluster
cd ~/ceph–pi–cluster

Creating an initial Ceph Configuration

We are going to create an initial Ceph configuration assuming all 3 pi nodes as monitors. If you have more, keep in mind – you always want an odd number of monitors to avoid a split-brain scenario. To to this run:

1
ceph–deploy new pi1 pi2 pi3

Now there are some special tweaks that should be made for best stability and performance within the hardware limitations of the Raspberry Pi 2 B. To apply these changes we’ll need to edit the ceph.conf here on the admin node before it is distributed. To do so:

1
vi ~/ceph–pi–cluster/ceph.conf

After the existing lines add:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
  # Disable in-memory logs
  debug_lockdep = 0/0
  debug_context = 0/0
  debug_crush = 0/0
  debug_buffer = 0/0
  debug_timer = 0/0
  debug_filer = 0/0
  debug_objecter = 0/0
  debug_rados = 0/0
  debug_rbd = 0/0
  debug_journaler = 0/0
  debug_objectcatcher = 0/0
  debug_client = 0/0
  debug_osd = 0/0
  debug_optracker = 0/0
  debug_objclass = 0/0
  debug_filestore = 0/0
  debug_journal = 0/0
  debug_ms = 0/0
  debug_monc = 0/0
  debug_tp = 0/0
  debug_auth = 0/0
  debug_finisher = 0/0
  debug_heartbeatmap = 0/0
  debug_perfcounter = 0/0
  debug_asok = 0/0
  debug_throttle = 0/0
  debug_mon = 0/0
  debug_paxos = 0/0
  debug_rgw = 0/0
  osd heartbeat grace = 8
[mon]
  mon compact on start = true
  mon osd down out subtree_limit = host
[osd]
  # Filesystem Optimizations
  osd mkfs type = btrfs
  osd journal size = 1024
  # Performance tuning
  max open files = 327680
  osd op threads = 2
  filestore op threads = 2
  
  #Capacity Tuning
  osd backfill full ratio = 0.95
  mon osd nearfull ratio = 0.90
  mon osd full ratio = 0.95
  # Recovery tuning
  osd recovery max active = 1
  osd recovery max single start = 1
  osd max backfills = 1
  osd recovery op priority = 1
  # Optimize Filestore Merge and Split
  filestore merge threshold = 40
  filestore split multiple = 8

 

Creating Initial Monitors

Now we can deploy our spiffy ceph.conf, create our initial monitor daemons, deploy our authentication keyring and chmod it as needed. We will be deploying to all 3 nodes for the purposes of this guide:

1
2
3
ceph–deploy mon create–initial
ceph–deploy admin pi1 pi2 pi3
for i in pi1 pi2 pi3;do ssh $i chmod 644 /etc/ceph/ceph.client.admin.keyring;done

Creating OSDs (Object Storage Daemons)

Ready to create some storage? I know I am. Insert your USB keys of choice into your Pi USB ports. For the purposes of this guide I will be deploying 1 OSD (USB key) per Pi node. I will also be using the BTRFS filesystem and co-locating the journals on the OSDs with a default journal size of 1GB (assuming 2 * 40MB/s throughput max and a default filestor max sync interval of 5). This value is hard coded into our ceph-pi config above. The formula is:

1
osd journal size = {2 * (expected throughput * filestore max sync interval)}

So let’s deploy our OSDs. Once our USBs are plugged in, use lsblk to display the device locations. To make sure our drives are clean and have a GPT partition table, use the gdisk  command for each OSD on each node. Assuming /dev/sda  as our OSD:

gdisk /dev/sda

Create a new partition table, write it to disk and exit. Do this for each OSD on each node. You can craft a bash for  loop if you are feeling “bashy” or programmatic.

Once all OSD drives have a fresh partition table you can use ceph-deploy to create your OSDs (using BTRFS for this guide) where pi1 is our present node and /dev/sda is the OSD we are creating:

1
ceph–deploy osd create —fs–type btrfs pi1:/dev/sda

Repeat this for all OSD drives on all nodes (or write a for loop). Once you’ve created at least 3 you are ready to move on.

Checking Cluster Health

Congratulations! You should have a working Ceph-Pi cluster. Trust, but verify. Get the health status of your cluster using this command:

1
ceph –s

and for a less verbose output

1
ceph health

What to do now?

Use your storage cluster! Create an RBD, mount it – export NFS or CIFS. There is a lot of reading out there. Now you know how to deploy a Ceph cluster on Raspberry Pi.

References

http://millibit.blogspot.com/2014/12/ceph-pi-installing-ceph-on-raspberry-pi.html

http://ceph.com/docs/v0.80.5/start/
https://www.raspberrypi.org/

14 Comments

  1. Pingback: Distributed file storage with a Ceph cluster on Raspberry Pi | Raspberry Pi Pod
  2. Mike Kelly 2 years ago /Reply

    Hi,

    This looks like an interesting use of the Raspberry Pi, but I wonder if this is really that cost-effective of a solution?

    When I crunched the numbers, it came out to about $1 / GB of storage, if you maxed out your nodes with 4 128GB drives and had 3 replicas… but it seems like, once you need to scale above a TB or so of storage, it’s more cost effective to just build “real” servers using spinning drives at a much higher capacity per node?

    • Bryan Apperson 2 years ago /Reply

      Of course, this is more of a proof-of-concept for learning ceph. Not meant to be cheaper per GB, but cheaper for initial cost. A x86_64 ceph cluster with 10Gbit networking costs 5 figures. This is a 3 figure cost of entry way to begin learning ceph.

  3. Thomas Bludau 2 years ago /Reply

    Hey, its working now on my 3 raspberrys 2 too with saltstack implementation and automatical installation script :)!
    Thanks for this documentation!
    Overread that you changed the source.list two times and only for the ceph installation on the first try.

    • Bryan Apperson 2 years ago /Reply

      Yeah, I automated install as well. However I am a fan of making people perform the commands so that they learn rather then:

      wget bash.sh
      chmod 755 bash.sh
      sudo ./bash.sh

      Teaches bad form (and security)!

      Thanks for going through the tutorial. Is there a link to your implementation for others to use?

  4. Steven Pemberton 2 years ago /Reply

    I’ve done something similar with a 6+1 node Pi cluster running Ceph. I’m currently using 24x 8GB USB sticks as storage.

  5. Niels 1 year ago /Reply

    Hi Bryan,

    I’m getting stuck at the apt-get install ceph-deploy with the following error:
    Reading state information… Done
    E: Unable to locate package ceph-deploy

    Any thoughts on why this may be? Using Wheezy, also tried Jessie same result.

    ceph and ceph-common have been installed.

    Thanks,
    Niels

  6. Dave Graham 1 year ago /Reply

    just a note: doesn’t work for Debian Jessie. I either have to backport to Wheezy (not optimal) or go through a ton of various hacking and such without using ceph-deploy.

    just a heads up. 😉

  7. Niels Sommer 1 year ago /Reply

    Hi Bryan,

    Great article on Ceph installation. I have one problem, at the step of installing ceph-deploy. It is not found in the package, I have tried different revisions of ceph and the package is just not found. The ceph and ceph-common packages are installed fine. What might I be doing wrong?

    Thanks for this intro to a cost effective ceph cluster 🙂

    Cheers,

    Niels

    • Bryan Apperson 1 year ago /Reply

      It may not be in the repository any longer. Have you looked in the ceph repos? You may be able to pull it down individually.

  8. Mayur 12 months ago /Reply

    Hi Bryan,
    Very usefull article Thanks for posting, I wanted to implement a storage server within area, where client side is windows OS, so is it possible to implement this project.

    • Bryan Apperson 10 months ago /Reply

      Yes, you’ll probably want to use CIFS or NFS to export an RBD image.

  9. Marian 8 months ago /Reply

    Thanks a lot for this nice tutorial. Quick question: my deployment fails when I do:
    ceph-deploy mon create-initial

    It connects to the remote host, runs a bunch of stuff, then comes up with this error:
    Failed to execute command: sudo systemctl enable ceph.target

    I’m stuck; don’t know what to do next. If I run that command manually, I get the same message.
    Failed to execute operation: No such file or directory

  10. Marian 8 months ago /Reply

    HI Bryan,

    Do you have any experience with the Ubuntu Mate on arm processor? I have the new Odroid which is much better (hardware-wise) than Rpi and I have trouble getting stuff to work. It installs CEPH just fine from repositories, but then.. I’m stuck 🙂
    Any advice

  • Click to Press This! (Opens in new window) Press This
  • Click to share on Telegram (Opens in new window) Telegram
  • Pocket
  • Share on Tumblr
  • More
  • Click to share on WhatsApp (Opens in new window) WhatsApp
  • Tweet

Like this:

Like Loading...

Related

Pages: 1 2
Posted in Documentations, HardwareCeph Raspberry

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related

  • Nokia feature phones by HMD
  • Drop Caps | CSS-Tricks
  • Saturating the name-space

Post navigation

Previous Post
Actually alive!
Powered by WordPress | Theme: Jorvik by UXL Themes
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish.Accept Read More
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT
%d