Written by Francois Bodin and Laurent Morin from University Rennes
The RUDI project offers a collaborative platform of data sets of general interest to the metropolitan area of Rennes (in Brittany, France). This data can be of different nature: socio-economic, environmental, geographical, or mobility data. RUDI is a corner stone in the implementation of Rennes' smart-city strategy.
RUDI aims to give the means for all metropolitan actors (local authorities, service operators, non-profit associations, as well as public and private bodies) to publish and share their data with the community. One key objective is to increase significantly the number of data producers via a rapid, easy-to-handle, inexpensive implementation, fully based on open source.
A central question in RUDI is therefore “what does it mean to become a data producer and what tools should be provided”. This question reflects the need for a solution that applies to a very heterogeneous population with regard to its digital skills and that makes it possible for the data providers to fully control their data. In such context, Fenix contributes to the study and to the specification of the RUDI infrastructure solution regarding data publication.
From a technical point of view, RUDI is based on two main components: a portal that offers a global catalogue of metadata and producer nodes to facilitate the publication of data. One of the keys for increasing the publication of new data relies in the efficiently and simplicity to easily deploy new data nodes from a new member towards the RUDI federation;
Data producers can be heterogeneous. Some may be occasional providers with little data (for example, an association), others may be major operators (for example, a transport network manager) in mainland France with large volumes and real-time constraints. In the first case, the technical means of the operators may be limited, while in the second case the support team might be able to install and deploy more sophisticated tools, such as virtual machines.
Figure: RUDI architecture overview
RUDI, an open, distributed architecture
The star-shaped organisation implies that producer nodes are deployed for different us-ages, with the portal at the center. It acts as a single point of reference for the Rennes metropolis area. The producer nodes allow to register and record metadata, as well as to store the data itself. They also implement the metadata exchange protocol with the por-tal. The producer node is shown in Figure 2. The node integrates three main elements: a metadata catalogue, a node manager (console) and a storage platform. These different elements could be distributed in order to limit cybersecurity risks. For example, the node manager may be hosted on an internal network (while the catalogue and storage space would typically reside on an open network). Furthermore, the node combines a firewall with a regulatory proxy (HAProxy) for access filtering. The deployment of the producer node is possible from virtual machines or containers. A public API of the OpenAPI type documents access to metadata and standardizes interoperability with the common portal.
Figure: Architecture of a producer node in RUDI
Deployment of RUDI using Fenix infrastructure
The Fenix infrastructure is supporting the RUDI project in two ways: first, in assisting R&D regarding the design of a deployment plan adapted to the metropolitan context in terms of services, security and management. Secondly, in testing the deployed solution. The goal is to set up a long-term experimentation with various data producers. Indeed, the choices will determine if the solution scales also to several hundreds of data producers.
The suggested solution for the deployment of producer nodes should cover many different situations. Either the data producers do not own a solution for deploying data nodes (they even may be less skilled in the publication of data). Or, in a second case, they already own an in-house solution. Here, the integration of a RUDI producer node into the company’s network may be complex, and represent a cybersecurity risk. The Fenix infrastructure allowed to test and validate the architecture of the producer nodes, and offers confirmation that setting up producer nodes is easy and straightforward.
Covering a wide variety of deployment patterns
The deployment of RUDI on the OpenStack Fenix infrastructure makes it possible to evaluate the operation of producer nodes at different scales, from a single machine coping with everything, to a combination of servers implementing the different parts of the producer node with a high-availability aspect. The platform has been designed to support such deployments, but at the cost of a more complex configuration. Moreover, dedicated tools are also needed.
Moreover, in order to support different deployment techniques, several methods of software packaging must be combined and tested: different container types (LXD/Docker), different images (ISO9660, OVF), and different auto-configuration mechanisms, such as CloudIinit. Other methods – based on Kubernetes in particular – are not yet in place but under preparation. Finally, the platform takes into account cybersecurity by design, trying to integrate guarantees with respect to system integrity, availability and confidentiality.
Wide dissemination involves real-life operational constraints
Beyond the development of the different deployment mechanisms, an increasing number of nodes have been made available for active or future partners of the project. Each partner has access to a producer node that is deployed and isolated using containers on Fenix nodes, themselves protected by a “bastion” acting as a single access point. All this can be addressed by the end user by a single domain name and dedicated subdomains. This allows to deploy producer nodes quickly and in an almost fully automated way on the on the OpenStack Fenix platform, making it easier to update the producer nodes. Moreover, the deployment approach also allows – if need be – to easily redeploy the nodes on a different infrastructure (which should of course be similar to the Fenix Infrastructure). As of today, archiving and high availability functions are not yet enabled for nodes on the platform, but such features can easily be added.
To date, thirteen producer nodes have been deployed for partners such as Rennes Métropole (Administration), Audiar (SME), Arkea (Bank), Les Champs Libres (non-profit association), Amplisim (SME), OpenStreetMap (local independent organisation). More producer nodes are about to be launched involving partners such as Enedis (large energy distribution network provider), Bruz (township), and Airbreizh (non-profit association).
These partners of the RUDI federation were chosen for the variety of their profiles, both from the point of view of data harvesting and for the constraints implied by the integration with their in-house information system. It is important to note that the heterogeneity of data producers is intrinsic to the development of such a metropolitan open data service. The success of this development depends in part on the ability to deploy producer nodes in an efficient, secure, and transparent manner: open-source software is a key element. This experimental phase is planned to end in May 2023. The nodes hosted on the Fenix infrastructure will then be redeployed on permanent resources.
The IRISA RUDI team would thank the TGCC Support team who helped for testing and deploying the RUDI services on the Fenix infrastructure.