Iterations on the Palavyr API - Part 2

~15 minute read

Introduction

The Palavyr chatbot configuration service has evolved a great deal since it was first written in 2020. The Palavyr API currently has nearly 100 endpoints that perform a wide variety of functions, including updating configuration, returning conversation nodes to the Palavyr Widget App, and much more. As it has grown from a single endpoint into the dozens that it is now, the API surface has undergone multiple transformations. In addition, much of the core has been transformed as well.

This is the second part of a 3 part series on how we took the Palavyr API from a prototype to a piece of software that is ready for production.

Part 2: Palavyr Persistence Abstractions

The Palavyr Database

Early on in the development cycle of Palavyr, the premature decision was made to split the Palavyr model into three. The intention was to pre-empt a solution for the traffic differences expected for tables that relate to chatbot activity vs tables that relate to chatbot configuration. The underlying reasoning was that these chatbot activity tables would receive far more activity than configuration tables, and those tables should be managed separately than the configuration tables.

This decision led to a significant challenge when dealing with database entities. Indeed at this point, these entities were not actually considered entities at all… conceptually. They were classes that were loosely exchanged between components and even with the client itself. More on that in a bit.

The Palavyr API currently has nearly 100 endpoints. Each endpoint does something more or less unique, and nearly every endpoint requires interaction with the database. This means that every controller or component (wherever the logic is) needs to inject the entire context of the database, which must be maintained and quite often repeated. And since there are three unique contexts, sometimes all three contexts must be injected. (Good gracious… What a lot of work!)

[ApiController]
public class ExampleController : ControllerBase
{
    public DBContextA DBContextA { get; set; }
    public DBContextA DBContextB { get; set; }
    public DBContextA DBContextC { get; set; }

    public ExampleController(
        DBContextA DBContextA,
        DBContextA DBContextB,
        DBContextA DBContextC)
    {
        this.DBContextA = BContextA;
        this.DBContextB = DBContextB;
        this.DBContextC = DBContextC;
    }

    [HttpGet]
    public Task<Resource> Put(Resource resource)
    {
        // do something with the resource
    }
}

Moving to a repository pattern

To compensate for this decision, we introduced a repository pattern. A repository is an abstraction that encapsulates all of the logic that is required to access a data store. For example, have a look at these two contrasting implementations used to access the database.

[ApiController]
public class ExampleB : ControllerBase
{
    public Repository Repository { get; set; }

    public ExampleB(Repository Repository)
    {
        this.Repository = Repository;
    }

    public Task<Resource> Put(Resource resource)
    {

        // do something with the resource
        var entity = Repository.GetThatRecordFromThatTable(resource.MyProperty);
        if (entity is null)
        {
            throw new DomainException("Resource not found", 404);
        }
    }
}

Pros: Encapsulating database logic in this way is a convenient way to decrease how much repetition there is in your code base. Imagine you need to retrieve a subset of records from one of your tables - and you need to do this in several places for different reasons. If you abstract this logic into a repository method, then you don’t need to reimplement the same database logic over and over. That’s a win.

Cons: On the flip side, a class that encapsulates special snowflake database calls will grow out of control. If a repository class has a method for each special snowflake database call, then after about 10 methods or so, three things become apparent.

First, and most importantly, the repository code is tightly coupled to the database tables they are used to access. Any attempt to access the database via the repository will require writing a method for a specific table and for a specific subset of columns and rows in that table.

Second, it's easy to simply lose track of which methods are even there. Imagine you’re 1.5 years into the project - every time you need to grab some data, you will likely need to spend (too much) time reviewing your repository to see if you ‘have a method for that’. Chances are, you might end up reimplementing that database call. Terrible.

Third, this class is going to become so bloated so quickly that you just might want to give up on the project. Especially if you’re providing an interface for the repository that you have to keep in sync with the implementation. Goodness…

Composition. Compose. Components.

I don’t see these concepts emphasized enough... and I see them emphasized A LOT. These are glorious concepts in software engineering. Composing high cohesion software components (as opposed to creating complex and generally incomprehensible inheritance trees) is an extremely scalable and manageable way to write and maintain software.

Take a moment to imagine yourself breezing through your implementations, plucking a component from here, and another component from there, and assembling them together like legos to form a structure that serves a purpose. And then taking some of those components and mixing them with yet other components. It’s whimsical, almost magical. You can apply this to your database abstractions to make them versitile things.

Taking this concept of composition to the Palavyr persistence strategy, we developed an approach that gave us such composability, with the additional benefit fitting a unit of work pattern. A unit of work pattern is one where all modifications to the database within the lifetime scope of some action (whether it be a request to an API, or otherwise) are recorded in some sort of tracker, and when that action is complete, all changes to the store are played forward and committed. It turns out that implementing this is quite simple, so long as you know a few tricks to facilitate composability.

Palavyr uses EF Core as its ORM which interfaces with a postgres database. Our abstraction therefore sits over EF Core dbContexts. The interface to the abstraction looks like a simple CRUD interface, where you have your Create, Read, Update, and Delete operations. Each method takes an id by which to filter results as well as a property expression, which effectively allows the method to point to a particular column (to which the id is applied).

To enhance this abstraction, a generic EntityStore class was introduced which is registered as a Generic. As you resolve the EntityStore in a constructor (which is delivered by your container, in accordance with the D in SOLID principle - Dependency Inversion), you will provide a table name as the generic parameter, which is used to resolve the correct table from the EFCore dbcontext.I promised you I would talk more about database models - well to satisfy the compiler when passing table names to the EntityStore generic parameter, we specified a common IEntity interface for each entity. In use, it might look something like this:

public class ExampleC
{
    public EntityStore<MyEntity> myEntityStore { get; set; }

    public ExampleC(EntityStore<MyEntity> myEntityStore)
    {
        this.myEntityStore = myEntityStore;
    }

    public void Method(Resource resource)
    {
        var entity = myEntityStore.GetOrNull(resource.id, x => x.MyProperty);
        if (entity is null)
        {
            throw new DomainException("Resource not found", 404);
        }
    }
}

The magic of this approach is that not only does it abstract away all of the database concerns, it does so in a way that is completely composable. And if you find yourself composing the same patterns again and again, then you can lean on extension methods to encapsulate that particular logic, so you don’t have to keep repeating yourself. In the Palavyr API, we compose database calls 99% of the time with only a few extension methods to hold common yet bulky operations.

public static class MySpecificyEntityExtensionMethods
{
    public static MyEntity GetThatSpecialThangOrNull(this EntityStore<MyEntity> myEntityStore, int id, string someValue)
    {
        return myEntityStore.Query()
            .Where(x => x.MyProperty == id )
            .Where(x => x.MyOtherProperty == someValue)
            .FirstOrDefault();
    }
}

One question that might come to mind, considering that we are trying our best to follow SOLID design principles when designing and building this out, is how would we go about registering this with a Dependency Injection container? Palavyr uses Autofac, so I'll show you how we go about using that.

using Autofac;
using Autofac.Module;

public class GeneralModule : Module
{
    protected override void Load(ContainerBuilder builder)
    {
        builder
            .RegisterGeneric(typeof(EntityStore<>))
            .As(typeof(IEntityStore<>))
            .AsImplementedInterfaces()
            .InstancePerLifetimeScope();
    }
}

The store itself implementes the IEntityStore interface, which is a generic interface that has some type contraints. In the case of Palavyr, each method on the store (which will be applied to any table that can be resovled from the store via the generic parameters) will be a method that takes an id and a property expression. The id is used to filter results, and the property expression is used to select a particular column from the table.

public interface IEntityStore<TEntity> where TEntity : class, IEntity
{
    Task<TEntity> GetOrNull(string id, Expression<Func<TEntity, string>> propertySelectorExpression);
}

This approach is effective because it decouples the database tables from the code that is used to access them. We have a single generic interface that allows to use the same methods to access data from any table. Thats a win.

Finally, any modification to the database held in memory is tracked deep within the abstraction (actually in this case it is tracked directly by EF Core). Manually committing transactions (doing anything manually for that matter) is something that should be avoided. If you find yourself doing something over and over again, it's time to rethink your design. To manage this finalization action, we close out the Unit of Work that is opened at the beginning of the API action.

The overall flow goes something like this:

  1. Request is received and passes through middleware
  2. A unit of work middleware receives the data and begins transactions.
  3. Database operations are executed, but nothing is committed.
  4. The actions completes and the controller returns the response back to the middleware pipeline.
  5. The Unit of work middleware commits everything (for all three contexts) and then passes the response along, which exits the API.

I’d like to close with a final thought. A great deal of emphasis ought to be put into designing your software such that you get as much behavior for free as you can. Do you have a class that always has associated action taken, but that action doesn’t really fit within the responsibility of that class? Consider registering a decorator. Do you have logic that needs to ‘listen’ for actions taken from essentially anywhere within your system and respond? Consider implementing a mediator. The one thing you do not want to have to do when build systems is to have to spend time doing the same thing over and over. You also want to give your system components as few reasons to change as possible. Its fun at first during the learning journey, but when it comes time to get down to business, it just takes too much time. This is particularly true when dealing with database abstractions, since these are used everywhere. And aint nobody got time for that.

Thanks very much for reading!

More from me
Iterations on the Palavyr API - Part 1

2022-04-20

Iterations on the Palavyr API - Part 3

2022-05-16