Migration to a Relational Database

db
#1

Hello,

I am currently taking a course on relational databases in which there is a project component of building our own relational database, I recall that there was some discussion about eventually the future of db.json in this thread. As such, I would like to implement a relational database for Janitor to succeed db.json as a part of my project for the course. I plan to use this thread to share my progress of this project.

My next step is to design the database and get it approved by the professor of the course, which includes:

  • Description
  • Entity-Relationship diagram

There should be little concern about which DBMS to use provided it is relational and mostly compatible/similar to MariaDB, so I was thinking PostgreSQL.

2 Likes

#2

Description

Janitor is a cloud development environment that is a collection of hosts that enable users to collaborate on projects served from a central web app. It also features integration with Microsoft Azure and the ability to run in a development mode for ease in the development of the project. Users are kept informed about the project with a newsletter style blog entries. New users are added to the wait list until there is sufficient resources to add them to the project.

Entity-Relationship diagram

The lines represent a relationship described in the text beside, read like from source to destination (e.g. HOST is home to PROJECT). The connector with the three way split means you can have multiple relationships (e.g. HOST is (note the solid line) home to many PROJECT and PROJECT could (note the dashed line)). I am under the impression that what I placed at the bottom is part of the web app and thus does not have any relationship with the HOST, USER, PROJECT relationship.

Entities and attributes

  • WEBAPP

    • id
    • hostname
    • cookieName
    • ports-http
    • ports-https
    • ports-key
    • ports-crt
    • ports-ca
    • tls-ca-crt
    • tls-ca-key
    • tls-client-crt
    • tls-client-key
    • azure-credentials
    • azure-clientID
    • azure-clientSecret
    • azure-tenantID
    • azure-subscriptionID
    • security-forceHTTP
    • security-forceInsecure
    • discourse-api-key
    • discourse-api-key-username
    • mailer-block
    • mailer-from
    • mailer-host
    • mailer-auth
    • mailer-user
    • mailer-pass
  • AUTHORIZATION (For Janitor)

    • id
    • webappID
    • client
    • date
    • email
    • scope
  • OAUTH2PROVIDER (For third-party services)

    • id
    • webappID
    • secret
    • hostname
    • authorizePath
    • accessTokenPath
    • customHeaders
  • WAITLIST

    • email
    • webappID
    • time
  • HOST

    • id
    • webappID
    • name
    • docker-version
    • letsencrypt-key
    • letsencrypt-email
    • properties-port
    • properties-ca
    • properties-crt
    • properties-key
    • oauth2client-id
    • oauth2client-server
  • PROJECT

    • id
    • hostID
    • name
    • description
    • icon
    • docker-host
    • docker-image
    • docker-path
    • docker-ports
    • docker-logs
  • USER

    • id
    • webappID
    • name
    • settings-notifications
    • settings-newsletter
    • admin
  • EMAIL

    • userID
    • email
  • NOTIFICATION

    • userID
    • date
    • message
    • link
  • CONTAINER

    • id
    • projectID
    • name
    • creation
    • status
    • data
    • docker-container
    • docker-ports
    • docker-status
    • docker-proxy
  • PORT

    • id
    • projectID
    • containerID
    • port
    • proxy
  • CONFIGURATION

    • userID
    • file
    • path
  • NEWS (api call: https://docs.discourse.org/#tag/Topics%2Fpaths%2F~1t~1{id}.json%2Fget)

    • id
    • webappID
    • title
    • slug
    • comments_count
    • comments_url
    • post_body_html
  • EVENT

    • id
    • webappID
    • type
    • payload
    • dueTime
    • scheduledTime
    • emittedTime
    • consumed

A few notes:

  • I added many entities in order to better group together related items
  • I want to reflect the relation of blog posts to the webapp, so I chose NEWS as an entity containing the blog post entries
  • I added a link attribute to NOTIFICATION as I think this could be useful for helping to bring users to the pertinent interface (i.e. Build Complete would link to the C9 container)
  • I was thinking of storing the user’s keys in files referenced by CONFIGURATION
  • Store authorization object from lib/hosts.js and tls from lib/docker.js? (yes, store them)
  • Could we move the log files of a Docker container to a file instead? (yes)
  • We could query the previous port instead of saving the next one, remove this attribute (yes)
  • I was thinking of getting events to replace the new containers-trash field in db.json (create a timestamp)
0 Likes

#3

I think we’ve switched to hostnames being a list (so that you can have multiple, e.g. ['janitor.technology', 'jntr.io']).

Actually just key, and also email.

I don’t think this attribute exists.

user > profile > settings: I think settings is directly under user.

user > machine > admin: What is this field? I think you wanted to place an admin boolean just under user.

user > machine > configurations: That’s also probably just under user.

In general, please have a look in /lib/ to find the code that creates datastructures for the various entities (that way you’ll know what attributes there are exactly today, and of what type).

Otherwise, looks good to me! :+1: Many thanks for working on this! :smile:

1 Like

#4

Yes, that’s true, I envisioned that field to be for a host's name (e.g. pianosa). I think the best way to handle that is to add another entity at the web app level called HOSTNAMES.

Good catches, many thanks :slight_smile:

I plan to update previous to reflect the review.

1 Like

#5

I edited the changes to reflect the recommendations from my professor. It has since been approved for me to work :smiley:

1 Like

#6

I found this about JSON and PostgreSQL: https://severalnines.com/blog/overview-json-capabilities-within-postgresql

0 Likes

#7

For an introductory databases course, you should probably avoid using JSON blobs in your database. They’re basically the schema-integrity equivalent of dropping to cffi or unsafe. If you’re not sure that you need it, then you don’t.

2 Likes

#8

Next steps:

  1. Construct a Functional Dependency Diagram for the attributes
  2. Normalize the database (see https://en.wikipedia.org/wiki/Database_normalization#Normal_forms)
    a. UNF
    b. 1NF
    c. 2NF
    d. 3NF
1 Like

#9

@notriddle What are your thoughts on simply storing the title, post_body_html, slug, comments_count, comments_url (as shown here) instead of the entire response from the Discourse API call?

0 Likes

#10

I don’t object. The only reason why the current version stores the entire response is because I didn’t do anything to filter it, not because I had a particular reason for storing it all.

1 Like