Scality and Seagate Hackathon: Challenge Developers to Extend S3 Server Capabilities Using Kinetic Protocol

Giorgio Regni is CTO at Scality. Follow him on Twitter @GiorgioRegni

Object storage has become the fastest-growing model for storing rich content for cloud applications, for data such as images and video. The Amazon Web Services (AWS) S3 API has emerged as a standard and is used by thousands of applications in use in the cloud and the enterprise.

If you’re a developer, gaining expertise with the S3 API can provide a cornerstone to your career in bringing these new applications to life.

Scality, Seagate and Holberton School have announced a hackathon aimed at offering developers that opportunity — the S3 Server Hackathon, October 21, 22 and 23 in San Francisco at the Holberton School. Developers are invited — are you up to the challenge? Sign up to participate here. (Are there prizes? Of course! See the image at the bottom of this page!)

S3 Server Hackathon — the project at hand

Scality has recently announced their new open-source implementation of a full Amazon S3 Server under an Apache 2.0 license. This project took a full year with a team of 15 people; Scality recruited eight of them following a similar hackathon held in Paris in 2015.

This hackathon will give developers a chance to interact directly with the S3 Server dev team, develop their own application on top of it, or extend its storage capabilities by interfacing with Kinetic IP drives from Seagate.

How the Kinetic protocol will help achieve our goal

The open source version of S3 Server uses Docker volumes for storage. Typically, these volumes are mapped to physical drives on a server. This works well, but limits the amount of storage that can be served by S3 Server to a single machine.

A single machine can easily grow to 500TB usable today — but the purpose of this hackathon is to overcome this limit by interfacing with network-enabled “IP drives”, using the Kinetic protocol over standard ethernet.

This means that the S3 Server container can run anywhere, like on a Docker swarm or Kubernetes cluster, while addressing any number of Kinetic drives for an unlimited amount of capacity. You need more storage? Just rack new drives and scale-out capacity without bounds.

Seagate Kinetic drives implement the Kinetic protocol and are a great way easily get started with this decoupled architecture.

Kinetic protocol — the basics outlined

The Kinetic protocol is explored in depth at the OpenKinetic website — dig in there a bit to learn more. At a basic level, it’s an open source protocol that has many interesting properties:

It has a SSL/TCP/IP stack, leveraging the existing data center network fabric
It has a Key/Value store interface supporting PUT/GET/DELETE semantics
Data is not accessed by block and offset, but by a key pointing to objects of variable sizes

Theoretically, applications can directly store their data onto the drives, but in practice an intermediate layer such as the Scality S3 Server takes cares of:

Cutting the objects in smaller chunks — for load balancing, and because the drives are limited to 1MB per object
Keeping track of data location and managing failures
Applying data protection schemes, like parity protection between drives with erasure coding
Managing metadata required for a protocol like AWS S3 (e.g. buckets, ACLs, attributes, etc)

Some key problems Kinetic protocol is solving:

It leverages the Ethernet fabric.
Eliminates (most of) the traditional storage stack.
The internal details of the disks are hidden by the public API. This leads to better hardware abstraction.
Maximizes innovation: SW/HW features of the drive can immediately be exposed as an API, instead of waiting for kernels/vendors to expose the features through their driver.

Types of innovations enabled by the Kinetic protocol:

Partial failure management.
Encryption in transit.
Encryption at rest.
Authentication (currently HMAC).
Comprehensive end-to-end integrity, by having the drive checking client’s CRC.
Drive-to-drive data transit: It is possible to “program” the drives to do peer-to-peer operations (e.g. exchange data).

Who can join the S3 Server Hackathon, and what will they do?

The hackathon is open to any developer who wants to tinker with hardware resources and build Exabyte-scale systems. S3 Server is written in Node.js but javascript is not a requirement as the administrative parts can be written in python for example and the high-performance parts are typically written in C/C++.

Participation in the hackathon is open to hackers working in solo or in teams of up to three. Competitors will be judged at the end of the weekend based on quality of the design and creativity. Scality engineers experienced in Node.js, Python and C as well as Seagate engineers specialized in IP drive software and hardware will be present for support during the entire event.

This hackathon challenge will consist, starting from existing open source code, in finding interesting way for S3 Server to discover, manage, and store data on IP drives.

Some teams will focus particularly on administrative tools (discovery, UI), while others will focus on data placement and repair strategies, possibly using P2P strategies on the drives themselves.

Example hackathon projects

You could decide to extend S3 Server to support another API like AWS Glacier or OpenStack Swift,
Another idea could be to focus specifically on administrative tools for IP drives (for example discovery, or monitoring GUI),
Others may focus on smart data placement and repair strategies for placing data on IP drives.
But let your imagination and creativity run free – that’s the key to a successful hackathon.
The hackathon is open to any developer who wants to work on storage systems and their API, going from terabytes to exabytes of data.

Prizes? Of course!

Of course like any great hackathon, there will be prizes for the developers’ efforts — see the image at the bottom of this page, and visit the sign-up page for more details.

Are you up to the challenge???