At Witty Works we are, like pretty much the entire industry, using quite a lot of open source software. This is meant as a bit of followup to my previous blog post (note sadly we have so far not made any progress on lobbying for a browser API for custom “spell checkers”).
Now in this blog post I wanted to go over the key components we are using both to share with others what is working for us but also as a small thank you to these projects to give visibility to these projects and their developers. While the target audience for this blog post are developers, I added a small glossary to the bottom of the article where I explain some key terms that non developers may not be familiar with.
There are two common pieces we use across our different projects. To collect user analytics we are using Posthog and we use Sentry for logging and performance monitoring. That being said, currently we use the respective cloud versions of these projects but we are considering eventually using the self-hosted options.
A special mention goes out to Jesús Mejuto who inspired me to write this post. He was very responsive in adding a license to his work that was underlying German Gender Finder which we at one time adopted to help detect the gender of words in German (yeah gendered languages exist).
As our core NLP library we are using spaCy to understand the structure of text (you can read more details about this in a blog post by Elena). There is quite a bit of customization we do so we appreciate that spaCy has extension points to make this possible. The pandas library is used to prepare our custom rules for integration into our API. We are using FastText for language detection. Most of the code in the API was prototyped inside so-called Jupyter Notebooks. For our machine learning efforts we are leveraging the hugging face transformers library.
Another very important component is LanguageTool, which allows us to offer spelling and grammar checks. That being said, we made quite a few adaptations to their configuration to disable some of their rules, which we felt are counter to our objective of inclusive language.
In order to turn this code into an API we are leveraging FastAPI. This was my first Python project, so I appreciate the high quality documentation along with the concise nature of FastAPI that is really optimized for the very specific use case of building an API. Redis is used as a local cache of user configuration from the Dashboard.
Our API is written in Python, a popular choice for data science and machine learning projects, which means there is an abundance of libraries and tools available. I want to highlight a few more such projects, specifically we use Pytest for testing (special thank you for pytest-snapshot).
To be able to support both US and British spelling we are using eng to convert rules automatically. In order to generate grammatically correct alternatives we are using several libraries. We just recently added Inflex to help us in English. For German we are using german-nouns to detect the word gender and generate nouns in the correct form, still looking for a solution for verbs and adjectives to replace our current simplistic algorithm (potentially DEMorphy).
We are using the web-extension-starter as the base template for our browser extension which in turn pulls in quite a few dependencies. We are writing our components in React using the Typescript syntax. We recently adopted playwright for testing so that we can run automated tests against various websites.
The Dashboard is built using the popular PHP framework Laravel and more specifically the Jetstream/Socialstream application starter kit with the livewire UI and Tailwind CSS framework. This code itself depends on countless libraries from the PHP ecosystem to numerous to name here. On top of this basis we have adopted several libraries from Spatie along with several other libraries from different authors.
- API: An application programming interface (API) is a way for two or more computer programs to communicate with each other.
- NLP: Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data.
- Open Source: “Open source is source code that is made freely available for possible modification and redistribution.”