Explaining db.json


#1

Why a JSON file?

The Janitor’s servers currently store all their data in a local ./db.json file, using a tiny ad-hoc db.js module to read and write this file.

When a Janitor server starts running, the db.json datastore is loaded into RAM, and only written back to disk occasionally in order to prevent data loss in case of a server crash.

This is by no means an industry-standard database, and we should upgrade to something more robust and scalable in the future (if you know about these things, please post your suggestions here!). It was the simplest system we could implement to efficiently store all of our data with very little complexity.

What’s in there?

A db.json file usually contains the following entries:

  • "hostname": "localhost" (see explanation)
  • "ports": (see explanations)
    • "http": 1080
    • "https": 1443
  • "security": (see pull request and explanations)
    • "forceHttp": false
    • "forceInsecure": false
  • "mailer": (see email-login’s “mailer” specification)
    • "block": false
    • "from": ""
    • "host": ""
    • "auth":
      • "user": ""
      • "pass" ""
  • "hosts":
    • "localhost": (see host creation code)
      • "properties":
        • "port": "2376"
        • "ca": ""
        • "crt": ""
        • "key": ""
      • "oauth2client": (automatically generated for all new hosts)
        • "id": ""
        • "secret": ""
  • "projects":
  • "users":
  • "waitlist": []
  • "admins":
    • "admin@example.com": true

Examples


The admin interface
#2

I use dotenv file to do the things that you are trying to do with db.json.


#3

Discourse and bors-ng both use postgresql and it works fine. Not sure why Janitor would want to use anything else.


#4

Thanks for your very helpful suggestions!

I think the most reasonable step forward would be to migrate our lib/db.js to postgresql (our database probably won’t outgrow a single server anytime soon, so it would be incredibly overkill to move to a distributed DB like Cassandra today. This might change in the future if our cluster sees massive and sustained growth, and if they all need to talk to the one DB, but that would be a good problem to have in a few years, and we’ll definitely see it coming).

However, since our current db.json already works reasonably well for our needs, I probably won’t spend a lot of time migrating it to something more robust just now (unless the circumstances change and we suddenly need to migrate fast). Thanks again!


#5

From an interesting IRC discussion in #janitor today:

16:47:18 espadrine> my favorite modern low-maintenance database is cockroachdb though
16:48:00 espadrine> starting it is a one-line bash command, and it can go from one server to tens of servers
16:48:43 @janx> espadrine: your favourite low-maintenance database is no longer the “single JS variable” ?
16:49:27 espadrine> janx: I should have added, “… that scales beyond a single thread”
16:49:32 @janx> espadrine++


#6

CockroachDB performs notably worse when you run it on single server. Also it’s mostly compatible with PostgreSQL, so it wouldn’t be much trouble when you try to migrate.


#7

Another interesting IRC discussion in #janitor today:

16:32:40 <philipcristiano> A replicated database would be better
16:33:09 <philipcristiano> Instead of writing to a file, writing to a versioned S3 bucket would be an interesting next step
16:33:31 <philipcristiano> Easier than moving to a relational db
16:34:14 <philipcristiano> Problem at that point is how to handle state across instances of the app
16:35:44 <philipcristiano> S3 versions are pretty transparent to the app. It’s still just a write/read operation. Versions can be used as part of the API, but gives a way to rollback manually and removes the truncation issue

16:34:48 <@ishitatsuyuki> Too much complexity, just use RDB :stuck_out_tongue:
16:36:07 <@janx> ishitatsuyuki: “just use RDB” implies “choose the right RDB technology, schema, and migrate to it”, and we’d still need to solve “how to handle state across instances of the same app” I think
16:36:40 <philipcristiano> A relational DB is a nicer solution though, operationally
16:37:00 <philipcristiano> Transactions would be the way to handle state across instances (for locking)

16:37:46 <@janx> this sounds pretty complex to implement (essentially refactor Janitor into micro-services)
16:39:09 <philipcristiano> I don’t think there is a need for microservices to move the storage layer
16:39:16 <philipcristiano> (for most apps)
16:39:34 <@janx> philipcristiano++
16:39:51 <philipcristiano> It would be replacing any calls to the app state with a call to a database
16:40:05 <philipcristiano> When I’ve done this kind of thing before I do it in a couple of steps
16:40:26 <philipcristiano> 1) Abstract the state store into an object with well defined functions
16:40:48 <philipcristiano> 2) Create a second object that has the same interface as the one I just made
16:41:03 <philipcristiano> 3) swap in the new object + run migrations as needed.
16:41:36 <philipcristiano> 3 might wind up being something like: write to both implementations while a background migration is running. Always read from the first implementation.
16:41:49 <philipcristiano> Once the background migration is done read from the second implementation
16:41:55 <philipcristiano> 4) eventually remove the first implementation
16:42:05 <@janx> philipcristiano: really cool suggestions! thanks
16:42:30 <philipcristiano> Depends on how much downtime you can get away with.
16:43:16 <philipcristiano> Operationally, a common database is nice as it follows an expected pattern for how to manage the app. Backups can follow the normal way postgres would be backed up, for example

16:42:55 <@janx> I guess that for us, 1) implies refactoring how db.js works, because right now it loads a JSON into RAM, and the in-memory object is our state (and db.save() are just file dumps)
16:43:28 <@janx> so we’d have to either replace any state-alterations by clearly defined functions, or have some sort of ORM do that for us
16:44:05 <@janx> philipcristiano: I really like the idea of having several Janitor front-ends talking to the same DB, thus having the same state
16:45:51 <@janx> makes app containers stateless and somewhat interchangeable
16:46:08 <philipcristiano> It might be a bunch of effort to actually make the changes, and can wind up with more errors as it becomes even more of a distributed system
16:47:26 <@janx> true, and we don’t have a 99.9999% SLA, so can stomach downtime and reap the complexity savings
16:48:37 <philipcristiano> We’ll see what happens for moz. I would wind up being paged if something breaks and it blocks devs