I am now starting a new series of articles, where I showcase my own internal development project. The purpose is to demonstrate my way of working and things I am able to do, and also to present new kinds of ideas for developing digital systems.
The goal of the project is to construct a learning data analysis system, that can be used on many different devices and also in web browsers. In short term, it will be used as a small-scale reference project, that could be used for presenting and trying out various ideas, but in longer term I hope to develop a marketable customer product from it. This would of course require growing the company or finding suitable partner companies. However, technology has reached the point where it is possible to create a prototype by myself.
Usually, when I’m starting a new project, I begin with an exploration phase, in which I evaluate various techniques and tools to find the suitable ones for getting the expected results. Usually I also perform some tests and build some quick prototypes to get an intuitive feeling of how the ideas work in practice. I try to avoid getting too attached with tools and techniques that I’m familiar with, but instead try to discover the most suitable options, since I’m not afraid of having to learn new things. Some constraints have to be placed, however, so I can get started quickly. It is not a good idea to do everything with completely new tools.
In this project, the most important constraints arise from my wish to use functional programming and graph databases in the core modules that process data. I will write more about these things later, but this decision was based on my prior experience of functional programming making the development of challenging algorithmic code faster and more manageable as compared with using more traditional programming languages. The data processing and analysis methods that I plan to use are graph-based, and the data to be processed will be very heterogeneous, so graph databases will make data processing easier and more intuitive.
The project must obviously contain server components, since the same data must be accessible from many different devices and web browsers. I do not wish to waste time configuring any hardware, so rented cloud servers are an easy solution. I want to use servers located in Finland, and the service must be scalable so it will be possible to grow the business in future. Because I work alone, I want to take advantage of readily available modules that have been tested and proven to work. For this reason, the cloud platform must support Docker that enables using such packaged modular software components.
Based on these constraints, the initial project environment looks like this: the cloud server provider will be Nebula, which is a Finnish company. They can provide me a scalable service based on OpenStack servers with out-of-the-box Docker support. As the graph database engine I have chosen neo4j, because it can scale well for future applications, it has existing interfaces to many other platforms and tools, and it has a Docker container available. I was also convinced by the architecture of their graph database, but this I will discuss in more detail in a future article.
Server-side programming I will do initially with Haskell and Yesod, because I’m familiar with them and I can get started quickly using them. For production-level applications I have to look for more mature tools, though, and I have been eyeing the Scala programming language and Apache Spark as the most promising options, but I will look into other tools as well. On browser side I want to keep things simple and light, and the among the tools I’m familiar with Bootstrap and jQuery look like the best options at the moment. On desktop and mobile applications I dont want to spend too much time, so Qt and Android are a natural choice since they are very common and I’m familiar with them.
The next step will be setting up the neo4j database on the server and developing a light prototype application for browsing the database and making small changes. I will report on my progress next week, and also discuss graph databases in general.