Unverified Commit 2d99072b authored by Loïc Dachary's avatar Loïc Dachary
Browse files

NGI pointer

parents
Online application form https://ngi-pointer-open-call-2.fundingbox.com/apply
----------------------------------------------------------------------
BICEPS
BICEPS is a scale out, self healing storage system based on protocols provided by Ceph and tailored to immutable and never deleted archive content. The durability of objects is based on independent actors federated together via the W3C Activity Pub protocol. The integrity of the content relies on a cryptographic web of trust. GNU/Linux distributions can use BICEPS as a foundation for distributing and archiving packages that natively provides mirroring. Content archives such as Archive.org or Software Heritage can scale beyond their individual storage capacity. Content consumers can conveniently detect malicious or corrupted content by verifying their cryptographic signature against a web of trust similar to what the Debian operating system uses to authenticate software packages.
Its first use case will be the Software Heritage archive which currently has 750TB of raw data, totaling 10 billions small immutable objects and growing by 50TB every month.
Eligibility Minimum Quality Criteria
1) How does your project fit the scope of the renovation of the internet architecture?
New protocols emerge specifically addressing storage, such as S3. Most of them are defined on top of HTTP and suffer from its limitations. A few others, such as Ceph, only rely on TCP or UDP and are implemented to allow distributed storage that is strictly consistent. Although most of them are designed to be low latency, Ceph has implemented specific layers to handle high latency replication and distribution, when crossing datacenter boundaries. The next frontier is to mix these protocols with those dedicated to federation such as Activity Pub. With a federation of actors producing immutable content at a global scale, archive systems such as archive.org or Software Heritage could become sustainable and an integral part of the internet fabric.
Storage suffers from a lack of standards (formal or de-facto) which leads to data silo controlled by large corporations. With the emergence of Ceph and similar software and protocols dedicated to self-healing distributed storage, the possibility emerges to enable individuals and organizations by allowing them to store their data over the network securely and privately. Adressing large scale immutable storage offers the possibility to combine this evolution with federation. Once these software and protocols are used by the general public it will durably transform how the internet is used.
Ceph has introduced protocols (RBD and RADOS) and software stacks (RGW, CephFS, ...) dedicated to scale out, self-healing distributed storage during the past decade. They are now available as an integral part of all major Operating Sytems distributions based on GNU/Linux. They are also part of the mainline Linux kernel and readily available as networked block devices or file systems. The current use cases are focused on LAN and low latency WAN and BICEPS will expand these to high loosely coupled actors cooperating over the Internet with high latency connections. Targetting immutable content providers is a simple and useful first step towards the generalization of federated storage at a global scale.
100% of the implementation of BICEPS will be published under a Free Software license. BICEPS will be published on the https://git.easter-eggs.org/ forge and publicly available with open registration to enable contributions from third parties. The governance of the project will be horizontal. It will included extensive integration testing, which is a precondition for any storage system to prevent functional and performance regressions. Access to the Grid5000 clusters was provided for performance testing. The Ceph internals may be modified to allow for small objects packing, an precondition to efficiently distribute them over the internet, where bandwidth is low and latency high compared to a LAN. These changes will become an integral part of the upcoming Ceph releases published under the same license.
* Design of the immutable large scale object storage
* Benchmarks of the design to verify it delivers the expected performances
* User research to identify the emerging themes that are of importance to producers and users of large scale immutable contents
* Design of the small object packing in Ceph
* Implementation of the small object packing feature in Ceph
* Implementation of BICEPS
* At scale ingestion of the Software Heritage content
* Reach out to the Free Software community to advocate for the use of BICEPS, actively looking for feedback and more practical use cases, using Software Heritage as a showcase
* Publication of the first release of BICEPS
* Continuously mirror the content of Software Heritage over the internet to demonstrate how BICEPS federation
* Get the small object packing feature accepted in the upcoming Ceph stable release
* First implementation (June 1st)
* Order hardware (July 1st)
* Prepare the hardware for production (September 1st)
* Complete the tests for production (October 1st)
* Complete the software and documentation to transition from the current Software Heritage setup to the new object storage to demonstrate it delivers the expected benefits
* Mirror the content of from the production site to a third party site (November 1st)
* Collect feedback and fix issues (December 31)
---
Facebook, LinkedIN, archive.org, Software Heritage need to store an ever growing number of small immutable objects: images, source code, web pages etc. Using Software Heritage as a showcase, as of February 2021 it contains 10 billions unique source code files (or “objects”, in the following) totaling ~750TB of (uncompressed) data and grows by 50TB every month. 75% of these objects have a size smaller than 16KB and 50% have a size smaller than 4KB. Using an off-the-shelf object storage is not a good fit because:
* There is a significant space amplification for small objects: at least 25%, depending on the object storage
* Mirroring the content of the archive can only be done one object at a time and not in bulk which takes at least 10 times longer
These two problems has a significant impact on their workload. It is so impractical to enumerate billions of small objects that mirroring requires low latency and high bandwidth connections. It cannot be federated over the internet.
---
Objects are written to databases running on a fixed number of machines (the Write Storage) that can vary to control the write throughput. When a threshold is reached (e.g. 100GB) all objects are put together in container (a Shard), and moved to a readonly storage that keeps expanding over time. After a successful write, a unique identifier (the Object ID) is returned to the client. It can be used to read the object back from the readonly storage. Reads scale out because the unique identifiers of the objects embed the name of the container (the Shard UUID). Writes also scales out because the Database is chosen randomly. This is the Layer 0.
Clients that cannot keep track of the name of the container can rely on an API that relies on an index mapping all known objects signatures (the Object HASH below) to the name of the container where they can be found.
---
The key concepts are:
Packing millions of Objects together in Shards to:
save space and,
efficiently perform bulk actions such as mirroring or enumerations.
Two different storage:
Read Storage that takes advantage of the fact that Objects are immutable and never deleted and,
Write Storage from which Shards are created and moved to the Read Storage.
Identifying an object by its Object HASH and the Shard UUID that contains it so that its location can be determined from the Object ID.
While the architecture based on these concepts scales out for writing and reading, it cannot be used to address Objects with their Object HASH alone which is inconvenient for a number of use cases. An index mapping the Object HASH to the Shard UUID must be added to provide this feature, but it does not scale out writes.
---
Impact 1: Demonstrable Internet Architecture Renovation (1 out of 3 expected impacts; at least 2 must be addressed). If not addressed, indicate "not applicable"
----
Team introduction
Loïc Dachary became a Ceph developer in 2014 and worked primarily on introducing Erasure Coding, as well as other essential parts of the core.
Team motivation (NGI Architects concept matching)
I would like to improve the durability of the data that Software Heritage already has. Ideally it would be spread all over the globe and distributed among so many machines, organizations, and sovereign states that loosing one byte would be less likely than an extinction level event.
Dear Applicant,
Thank you for participating in the 2nd Open Call of NGI Pointer.
We are sorry to inform you that, after the Evaluation process described in the Guide for Applicants (Section 4), your proposal has not been selected during the Consensus meeting, to take part in the Support Programme of NGI Pointer.
Your proposal has been evaluated by 2 recognized experts, who assessed the potential of your project. Even though your proposal met the threshold of 10 points, it was not selected by the Selection Committee, due to the high number of quality applications for this Open Call’s application.
Please find below the final score and comments provided by those evaluators as feedback
Final Score of your proposal: 10,5 out of 15 points.
Criteria
Evaluators feedback
Excellence
Evaluator 1: Technically interesting and ambitious project. Project seems capable of delivering on its brief.
Evaluator 2: The goal of this project is to develop an object storage system on top of the Ceph RBD layer and protocol. The progress beyond the state of the art is not discussed in insufficient details. As the proposal lacks technical details, it is hard to see novelty. Overall, the context, motivation and ambition are described at a very high level. This hinders its differentiation from existing storage system used by the commercial cloud providers (e.g., Amazon S3)
Impact
Evaluator 1: Positive impact is to be expected. Project seems well aligned with NGI Pointer and the broader NGI mission and vision. Outcome of the project will be reusable and available as OS. Software Heritage use case adds to this.
Evaluator 2: The team describes the alignment of the project outcomes with the 3 of the 3 expected impacts. Provided reasoning is logical and has the potential to create successful outcomes. Lack of track record of delivering previous successful open source project is a slight weakness.
Implementation
Evaluator 1: Applicants seem to have a proven track record in this specific field. The tasks are not very well defined hence it is difficult to determine the relevance of the budget. Budget does not look overly generous though.
Evaluator 2: The work plan is not described in sufficient details. Discussion on major Risks and Mitigation strategies are missing, which is a shortcoming.
The work plan is also missing major deliverables and milestones, which is also a weakness.
If you consider that a mistake has been made and that your interests have been prejudiced as a result, please follow the appeal procedure described in the Guide for Applicants (Section 7.2). The external evaluation is run by experts in the Internet Architecture field, and we do not interfere with their assessment, therefore we will not evaluate complaints related to the results of the evaluation.
In any case, we want to thank you for your participation in the 2nd Open Call, we sincerely wish you every future success for your project and hope you will stay in touch with us via the NGI Community.
Best Regards,
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment