Migration to a Relational Database

db

#1

Hello,

I am currently taking a course on relational databases in which there is a project component of building our own relational database, I recall that there was some discussion about eventually the future of db.json in this thread. As such, I would like to implement a relational database for Janitor to succeed db.json as a part of my project for the course. I plan to use this thread to share my progress of this project.

My next step is to design the database and get it approved by the professor of the course, which includes:

  • Description
  • Entity-Relationship diagram

There should be little concern about which DBMS to use provided it is relational and mostly compatible/similar to MariaDB, so I was thinking PostgreSQL.


#2

Description

Janitor is a cloud development environment that is a collection of hosts that enable users to collaborate on projects served from a central web app. It also features integration with Microsoft Azure and the ability to run in a development mode for ease in the development of the project. Users are kept informed about the project with a newsletter style blog entries. New users are added to the wait list until there is sufficient resources to add them to the project.

Entities and attributes

  • WEBAPP

    • hostnames
    • ports-http
    • ports-https
    • azure-credentials
    • azure-clientID
    • azure-clientSecret
    • azure-tenantID
    • azure-subscriptionID
    • security-forceHTTP
    • security-forceInsecure
    • mailer-block
    • mailer-from
    • mailer-host
    • mailer-auth
    • mailer-user
    • mailer-pass
  • WAITLIST

    • email
    • time
  • HOST

    • name
    • letsencrypt-key
    • letsencrypt-email
    • properties-port
    • properties-ca
    • properties-crt
    • propertieskey
    • oauth2client-id
    • oauth2client-server
  • PROJECT

    • id
    • name
    • description
    • icon
  • DOCKER

    • host
    • image
    • path
    • ports
    • update
    • logs
    • updateTime
  • USER

    • name
    • emails
    • settings-notifications
    • settings-newsletter
    • admin
  • NOTIFICATION

    • date
    • message
    • link
  • CONTAINER

    • id
    • properties
    • name
    • status
    • docker-container
    • docker-next
  • CONFIGURATION

    • file
    • path
  • NEWS (see https://docs.discourse.org/#tag/Topics%2Fpaths%2F~1t~1{id}.json%2Fget)

    • api-key
    • api-key-username
    • id
    • title
    • fancy_title
    • slug
    • posts_count
    • reply_count
    • highest_post_number
    • image_url
    • created_at
    • last_posted_at
    • bumped
    • bumped_at
    • unseen
    • pinned
    • unpinned
    • visible
    • closed
    • archived
    • bookmarked
    • liked
    • tags
    • views
    • like_count
    • has_summary
    • archetype
    • last_poster_username
    • category_id
    • pinned_globally
    • featured_link
    • has_accepted_answer
    • posters-extras
    • posters-description
    • posters-user_id
    • posters-primary_group_id
    • post_body_html
  • EVENT

    • type
    • payload
    • dueTime
    • scheduledTime
    • emittedTime
    • consumed

Entity-Relationship diagram

The lines represent a relationship described in the text beside, read like from source to destination (e.g. HOST is home to PROJECT). The connector with the three way split means you can have multiple relationships (e.g. HOST is (note the solid line) home to many PROJECT and PROJECT could (note the dashed line)). I am under the impression that what I placed at the bottom is part of the web app and thus does not have any relationship with the HOST, USER, PROJECT relationship.

JanitorDB-E-R%20Diagram

A few notes:

  • It seems to me that the updated field under data is the latest update time, so I removed it.
  • I added many entities in order to better group together related items
  • Do you expect the current GitHub OAuth2 provider implementation in db.json to change?
  • I want to reflect the relation of blog posts to the webapp, so I chose NEWS as an entity containing the blog post entries
  • I added a link attribute to NOTIFICATION as I think this could be useful for helping to bring users to the pertinent interface (i.e. Build Complete would link to the C9 container)
  • Please note, I am still looking over the code to ensure I have accounted for everything

I hope to complete the Functional Dependency Diagram tomorrow along with Normalization up to the 3NF level. I am currently investigating if going further would be beneficial for the project.

Thanks!


#3

I think we’ve switched to hostnames being a list (so that you can have multiple, e.g. ['janitor.technology', 'jntr.io']).

Actually just key, and also email.

I don’t think this attribute exists.

user > profile > settings: I think settings is directly under user.

user > machine > admin: What is this field? I think you wanted to place an admin boolean just under user.

user > machine > configurations: That’s also probably just under user.

In general, please have a look in /lib/ to find the code that creates datastructures for the various entities (that way you’ll know what attributes there are exactly today, and of what type).

Otherwise, looks good to me! :+1: Many thanks for working on this! :smile:


#4

Yes, that’s true, I envisioned that field to be for a host's name (e.g. pianosa). I think the best way to handle that is to add another entity at the web app level called HOSTNAMES.

Good catches, many thanks :slight_smile:

I plan to update previous to reflect the review.


#5

I edited the changes to reflect the recommendations from my professor. It has since been approved for me to work :smiley:


#6

I found this about JSON and PostgreSQL: https://severalnines.com/blog/overview-json-capabilities-within-postgresql


#7

For an introductory databases course, you should probably avoid using JSON blobs in your database. They’re basically the schema-integrity equivalent of dropping to cffi or unsafe. If you’re not sure that you need it, then you don’t.


#8

Next steps:

  1. Construct a Functional Dependency Diagram for the attributes
  2. Normalize the database (see https://en.wikipedia.org/wiki/Database_normalization#Normal_forms)
    a. UNF
    b. 1NF
    c. 2NF
    d. 3NF