-
Notifications
You must be signed in to change notification settings - Fork 38
BEMOSS Data Management
Home | Installation Guide | Get Started | Developer Resources>Database | BEMOSS™ Team | FAQ
BEMOSS™ uses two databases to handle its data needs. These databases and their use in BEMOSS™ are discussed below.
Cassandra is an open-source NoSQL distributed database system suited for high frequency and high quantity data. It has been used by major users, like Netflix, eBay, Apple, Comcast etc. for large active data management. It is popular because of its familiar SQL like query language (called Cassandra Query Language (CQL)), linear scalability, high performance (operations per second per node) and fault tolerance.
Data Storage and Distribution Architecture in Cassandra
Cassandra is a distributed database, so there can be Cassandra database system (called Cassandra node) running simultaneously in different machines to act as a single large and distributed database. The data storage hierarchy in Cassandra is: Database -> Keyspace -> Tables -> Rows -> Columns. Each Cassandra system will have a single database, but can have any number of keyspaces, tables and rows (limited by system capability) to organize data. Cassandra follows partitioned row store architecture, which means each row of a table is evenly distributed between different nodes i.e. distribution is handled at row level. Even though the rows of a table are stored across multiple physical devices, Cassandra handles this distribution transparently, and to the end user it all appears as a single large database, with all data accessible from all nodes. If a user queries for some data in some node, and if that data is not available at that particular node, the node will communicate with other nodes to fetch the data, and return to the user. All of this is handled in background, so the user does not need to take care of it. Each entry (or row) in a table must contain a partition key whose Murmur3 Hash is used to assign it to a particular node in the cluster, which makes it very efficient to figure out which row is stored in which node.
Data Replication in Cassandra Another feature that is fundamental in Cassandra is data replication. As discussed above, based on the hash of the partition key, the node at which a given row is to be saved is determined. Now, based on the user set replication factor, the same row is also saved redundantly on other node(s). The number of nodes on which the duplicate row is saved to is determined by the replication factor which is controlled at the keyspace level. Also the replication can follow different strategies: SimpleStrategy, or NetworkTopology strategy. SimpleStrategy is a replication where all nodes are treated equally, and replica are saved on successive nodes on the cluster ring. NetworkTopology strategy is for scenarios involving multiple nodes in multiple data-centers and multiple racks in a data-center, in which case especial data replication rule is desired to ensure data is distributed both within data-centers and across data-centers. For BEMOSS™ deployment, the SimpleStrategy is deployed.
Use of Cassandra in BEMOSS™ Cassandra is used in BEMOSS to store time-series data of various device parameters. All device agents query the devices to check any parameters (eg. Temperature, heat-setpoints, on/off condition, power etc) have been changed, and if yes, store the device parameters with time-stamp on the database. This enables the user to query historical usage pattern of devices, or to see historical trends of sensor readings. This also provides platform to write intelligent applications that process these historical data, and automatically control devices to optimize user-comfort and/or energy efficiency, or alerts the user if suspicious trends indicating device failure is detected.
Cassandra nodes are setup to run parallel to BEMOSS™ multi-node system. Each BEMOSS™ node and core should have one Cassandra node running, and all of the Cassandra nodes should be part of the same cluster. BEMOSS™ multi-node system facilitates the setup of Cassandra cluster by automatically filling up required inputs in Cassandra configuration files, and launching Cassandra nodes if required, when BEMOSS™ is launched. During operation, the device agents connect to the Cassandra node on the same machine the agent is running, using Cassandra python driver, and writes time series data to a unique table named after the device_id it is controlling. Also, all such tables are placed in a common keyspace for BEMOSSS, named bemossspace, whose replication factor is user selectable during setup. The bemoss_web_ui will connect to the cassandra node in the core, to retrieve these time-series data when the user queries to view historical data for the device. Even though the device might be running in the node, the time series data it has saved will be distributed across all nodes, and when a query is made in a node, it gathers all data and returns to the user.
PostgreSQL is an object-relational database management system that is completely open source. The database choices reflects the idea of sticking to the use of powerful open source tools. It is used in wide range of applications to store relational information, and is popular due to its easy interface but powerful features.
Database Architecture There is a single Postgresql server running on the BEMOSS core, and all agents (via psycopg2 library) and the Django framework for the web_ui connect to it. Also, the agents on the node directly connect to the postgres database on the core via TCP connection. In order to facilitate this, the configuration file for postgres needs to be updated to allow connection from clients on the same network.
Use of Postgresql in BEMOSS The BEMOSS Web interface is designed using Django framework, and the Postgresql database forms the inherent component of the models. In addition to that, there are several tables to store information regarding the devices discovery and control, access management, BEMOSS multinode system management. These additional tables and data are used by both agents on the OS side, and by the Web UI to store various meta-data.