modular system development

Before delving into graph database, I am going to write down some thoughts on modularity. When starting to develop a prototype system, especially if doing that alone, it is not a good idea to waste time on irrelevant things but focus on the most important new ideas. Besides, systems are inherently wide in scope and include multiple separate parts, so constructing them is time-consuming.

Developing software systems becomes remarkably laborious, if one wishes to do everything on their own. Mythical heroic epics of software development reach their climax when all existing solutions are found to be lacking in some respect, after extensive trials and learning, and at the end everything is developed from scratch. These attitudes are slowly fading into obscurity. The current trend is developing only the essential parts of the system in-house, and the rest is constructed from ready-made building blocks. The reason for this change is, that modularity of software has improved and libraries have become better, but also that hardware has become sufficiently capable so it’s not mandatory anymore to optimize every last extra processing cycle from the code. Also the mindset has changed: developers are more tolerant of not every detail of the system being exactly just so.

My own web pages are a good example of modular development. I did not want to use very much time for building the platform itself, because that is not at the core of my business. The purpose of the web page is to server as a marketing tool, a channel for distributing information and a blogging platform, so the relavant task is producing the content. I need servers for various purposes, so using just a web hosting service was not sufficient, but I absolutely didn’t want to waste too much time with hardware either.

As I described in the previous part of this series, I got cloud servers from Nebula for my company. They are virtual servers residing in a datacenter, and they can be brought online and shut down on demand. OpenStack provides an easy-to-use interface for managing the servers in a browser. SSH encryption keys can be used to make the maintenance more secure, and the virtual servers are pre-configured and isolated units that perform only the task appointed to them. The environment can be controlled very tightly, which makes it more manageable also from the security point of view, and resources are used more efficiently.

Large amounts of data and users cannot be handled with single machines. It is easy to construct also clusters of servers from virtual machines, and there are ready-made solutions for balancing the load between the servers in the cluster. For instance, Apache Spark can be used for making queries from a large mass of data in such a way, that multiple machines process the search at the same time, and finally the results are combined. Also the Neo4j graph database supports distributing the database among multiple servers. There are also solutions for combining Neo4j and Spark.

Cloud servers make hardware modular, expandable and manageable. Docker does the same for software. The Docker containers, that can be seen as ‘cargo containers’ for software, are very light-weight virtual machines intended for running one restricted service. One server, such as a virtual server inside a data center, can run multiple docker containers at the same time. Each one can contain a part of the system, and these parts can communicate with each other. The tasks and interfaces of each container can be specified in detail.

As an example, on my own web server I have MySQL database in one Docker and WordPress content management system in another. With the Docker Compose tool I can configure this system in one file and run the system with one command. The data in the database and the files used by the content management system can reside on a separate storage volume, that can be detached from the cloud server and attached to another. In a few minutes I can set up an identical environment on another virtual server using OpenStack and Docker, attach the storage volume or a backup copy of it, and associate the IP address with the new server. Using the API interfaces I could in principle automatize this procedure into a command script, that I could run on my own laptop. Similarly, I could create scripts that allow increasing the resources of the server as the load increases and decrease them when the demand decreases.

The modular thinking extends even to individual pieces of software. WordPress was an easy choice for a content management system, because it is very modular and there are lots of plugins available for extending its functionalities. For this reason, it is possible to keep the basic system simple and offer it for free. Plugins can be bought to enable advanced features. For a business user it was a very simple and fast solution to install the basic WordPress system in a Docker container running on a virtual server, purchase an easily customizable theme, and enable automatic spam protection, search engine optimization and backups with suitable plugins. Graphical elements and layout instructions I purchased from a professional graphic designer. I took only a very little time to set up and customize the website, and I could focus on the actual content production, as I wished. When the site goes down, I get an automatic alert, and I can take an ssh connection to investigate and fix the situation immediately.

When developing my own system, I want to follow the kind of modular thinking as described here. On one hand, I want to make my own system modular, so it can be easily adopted and integrated with other systems. On the other hand, I want to use ready-made building blocks in its development, so I can dedicate as much time as possible into the development of the actual core pieces of the system.

Next, I will finally get into graph databases and some of the core topics of my project.

 

project startup

I am now starting a new series of articles, where I showcase my own internal development project. The purpose is to demonstrate my way of working and things I am able to do, and also to present new kinds of ideas for developing digital systems.

The goal of the project is to construct a learning data analysis system, that can be used on many different devices and also in web browsers. In short term, it will be used as a small-scale reference project, that could be used for presenting and trying out various ideas, but in longer term I hope to develop a marketable customer product from it. This would of course require growing the company or finding suitable partner companies. However, technology has reached the point where it is possible to create a prototype by myself.

Usually, when I’m starting a new project, I begin with an exploration phase, in which I evaluate various techniques and tools to find the suitable ones for getting the expected results. Usually I also perform some tests and build some quick prototypes to get an intuitive feeling of how the ideas work in practice. I try to avoid getting too attached with tools and techniques that I’m familiar with, but instead try to discover the most suitable options, since I’m not afraid of having to learn new things. Some constraints have to be placed, however, so I can get started quickly. It is not a good idea to do everything with completely new tools.

In this project, the most important constraints arise from my wish to use functional programming and graph databases in the core modules that process data. I will write more about these things later, but this decision was based on my prior experience of functional programming making the development of challenging algorithmic code faster and more manageable as compared with using more traditional programming languages. The data processing and analysis methods that I plan to use are graph-based, and the data to be processed will be very heterogeneous, so graph databases will make data processing easier and more intuitive.

The project must obviously contain server components, since the same data must be accessible from many different devices and web browsers. I do not wish to waste time configuring any hardware, so rented cloud servers are an easy solution. I want to use servers located in Finland, and the service must be scalable so it will be possible to grow the business in future. Because I work alone, I want to take advantage of readily available modules that have been tested and proven to work. For this reason, the cloud platform must support Docker that enables using such packaged modular software components.

Based on these constraints, the initial project environment looks like this: the cloud server provider will be Nebula, which is a Finnish company. They can provide me a scalable service based on OpenStack servers with out-of-the-box Docker support. As the graph database engine I have chosen neo4j, because it can scale well for future applications, it has existing interfaces to many other platforms and tools, and it has a Docker container available. I was also convinced by the architecture of their graph database, but this I will discuss in more detail in a future article.

Server-side programming I will do initially with Haskell and Yesod, because I’m familiar with them and I can get started quickly using them. For production-level applications I have to look for more mature tools, though, and I have been eyeing the Scala programming language and Apache Spark as the most promising options, but I will look into other tools as well. On browser side I want to keep things simple and light, and the among the tools I’m familiar with Bootstrap and jQuery look like the best options at the moment. On desktop and mobile applications I dont want to spend too much time, so Qt and Android are a natural choice since they are very common and I’m familiar with them.

The next step will be setting up the neo4j database on the server and developing a light prototype application for browsing the database and making small changes. I will report on my progress next week, and also discuss graph databases in general.

Review: The Master Algorithm (P. Domingos)

I recently finished reading  The Master Algorithm (Amazon) by professor Pedro Domingos, which is an overview of the current state and near future of machine learning research, targeted for general audience. It can be warmly recommended for everyone interested in machine learning and artificial intelligence. It requires no special background knowledge, although a basic understanding of programming, optimization and principles of artificial intelligence is helpful.

The author Domingos works as a professor in the University of Washingon in Seattle, and is a very respected authority in the field of machine learning. In my own upcoming doctoral thesis I cite many of his scientific articles. Comparing to many other experts on this field, his writing is very clear and easy to understand. Also The Master Algorithm, published a bit over a year ago, is very clearly written, and it is targeted to those not familiar with the field and its jargon. A lot of insight can be gained from the book, provided that the reader has the motivation for thinking through the ideas and thought experiments. The book has received some well-earned praise from such important experts as Judea Pearl and Sebastian Thrun. The focus on general audience means, that many complicated ideas on a difficult field have been simplified greatly, but there is an excellent collection of further reading for those who wish to study the details.

Domingos organizes the scholars of machine learning into five ‘tribes’ or schools of thought, and they are researchers and developers relying on either logical deduction, neural networks, Bayesian statistics, genetic algorithms or analogies for creating learning systems. The advantages and disadvantages of the different tribes and their algorithms are described intuitively, and finally the idea of developing a ‘master algorithm’, which combines the best ideas of the five tribes, is proposed and developed further. This kind of ‘general learner’ could accept any kind of data, learn the learnable from it, and apply the learning to anticipate similar situations in future. The proposed ideas and goals are somewhat realistic, although a bit simplified and overly optimistic.

Today, this topic is naturally very timely and important, and it’s difficult to avoid exposure to it, since the first truly intelligent systems are entering the public stage. Some examples include IBM’s Watson, Google’s AlphaGo, self-driving cars such as those sold by Tesla, and digital assistants such as Apple’s Siri, Google Now, Amazon’s Alexa and Microsoft’s Cortana. Some people probably wonder, what kinds of benefits and business opportunities could be enabled by these technologies, while others observe the progress more cautiously and worry about the possible threats and harmful effects.

In the conclusion of his book, also Domingos discusses in my opinion very thoroughly the social aspects of artificial intelligence. He arrives at very similar optimistic conclusions as myself: artificial intelligence will become an extension of us humans just like all other technologies, such as smartphones and search engines, have already become. Humans will work in cooperation with intelligent systems and devices. These will make our lives easier in many ways. Of course it is inevitable, that there will be less and less economically productive work left for humans to do. This is why we as a society must move towards softer values and be prepared to share more and pay extra for services provided by humans. Both Domingos and I support a basic income that guarantees a certain level of sustenance for everyone.

In my personal opinion, technology should be seen as an achievement of all of humanity. Even though some people may become rich as a result of their own work and entrepreneurship, no one can anymore claim to have achieved everything alone. Behind all success there is the work of generations of humans and the fruits of the global economy. Hard work and risk taking should be rewarded, of course, but wealthy people must be prepared to share more to others as well. The value of a human being cannot be measured by economical measures alone.

With his students, Domingos has developed a relational learner called Alchemy. My own dissertation research is close to this, and I hope to introduce this method in this blog later along with some kind of practical demonstration. The same I wish to do naturally with my own methods as well.

This is the first review article I have written, and it starts a new series of blog articles. I hope to keep publishing it monthly.

(Updates 18 Oct 2016 6:24 pm: clarification to sentence structure, some minor corrections)

next steps

During the last week, I have been mainly taking care of various practical things necessary when starting a business, and sketching in my mind a picture of my business practices. Now I thought about sitting down for a while to describe, what kinds of things I will discuss in this blog in the near future. My intention is, besides commenting on various current or otherwise important and interesting topics, also to showcase my own activities and skills in the form of various projects. In this article I will sketch out some kind of a map of the terrain I am planning to explore in the near future, and hopefully succeed in sparking the interest of my potential supporters and customers.

books and writings

Because the long term success of my business will be founded on special expertise, I have set aside some time for reading and thinking. The expertise must be maintained and further developed. Naturally also finishing my dissertation will require some time and energy, but this topic I will discuss later. During the following weeks, I am going to write some reviews and commentaries on some important books that I have read and that I can recommend to others as well. Some of these will be more suitable to the general audience, others more technical or mathematical.
To cite a few examples, I will present such authors as Pedro Domingos, David Vernon, David Mumford, Jeff Hawkins, Ray Kurzweil and Marcus Hutter. Central topics will be various ideas and techniques related to artificial intelligence and intelligent systems. I will also write about current topics, news and breakthroughs and comment on them.

projects and experiments

In my previous article I made a somewhat provocative and even boastful claim of being a person, who would in principle be able to implement alone a complete system from servers to several client applications. This kind of a person is often called a full-stack developer, though the typical meaning of this term is a bit more narrow and refers to a person with much more experience in such tasks. I do not market myself as a full-stack developer, but I do not wish for this claim to remain as light-hearted bragging, so I will prove it. My plan is to implement a system like the one I described, phase by phase, and document the process in this blog.
The goal of this series of article is, on the other hand, to make it clearer for someone who doesn’t know this field very well what kinds of things  are required to implement modern information systems. On the other hand, I want to describe the rapid development of technology during the last 20 years; after all, I have witnessed first hand the times when modern tools and frameworks were not available, and large parts of the systems had to be coded and configured manually with great effort. Nowadays we are in a situation, where it is actually possible for one person to carry out all steps on their own, in reasonable time no less.
In the articles I will discuss topics such as virtual machines, Docker, REST, JSON and Android. I will deviate from the mainstream in such a way, that I am going to use functional programmin, so I will write about such topics  as Haskell and Yesod. All this will, however, be just a framework for the main topic, which is of course learning and intelligent systems. This means, that I will write also about relational learning, in addition to neural networks and Bayes nets. Using technologies such as these in business requires a foundation and supporting framework, and I wish to demonstrate that I have sufficient understanding of the relevant techniques. Of course, I will require for my own purposes, such as technical demonstrations, some kind of basic system that is under my own control. I will describe the process of developing this framework in this blog.

what argomens is?

Matti Eskelinen

My name is Matti Eskelinen, and I have just founded this company after working for several years at the university in research and teaching positions. The goal of this short article is describing the background and purpose of my company, and to kick off publishing this company blog.

Ever since I was a child, I have been enthusiastic about technology and the possibilities it enables, and I have been very eager to adopt all kinds of new things. I have learned to program under the influence of Commodore 64 Basic, and to use internet at the time the connection was made using a modem and Winsock and the most popular browser was called Mosaic. I became proficient at discovering technical information when the best way of doing this was diligently digging through usenet newsgroup archives and following links on pages that the primitive search engine Lycos managed to uncover. I have used mobile internet with a Psion 5 connected to a GSM phone with a modem cable, and in my first software development job, during the dotcom boom, I developed a server software that transcoded web pages into wap pages for viewing on for instance a Nokia 7110 phone.

I have been developing my own blogging platform when the word ‘blog’ didn’t exist yet, and written smarphone applications when Symbian and Nokia 6600 were the top of the heap. I have shopped online before the turn of the millennium. I have had my Pebble smartwatch in daily use ever since I got my hands on it after the Kickstarter, and I have supported Oculus Rift since day 1. I have been following the Internet of Things boom for some years now and developed some small Arduino-based embedded devices. Raspberry Pi I have used since the first A-version. I’m very interested in wearable technology, mixing the virtual and real worlds, augmented reality, artificial intelligence and learning software.

I am a person who can, in principle, develop all alone an integrated system that works on desktop computers, smartphones and web browsers, reads information from a server software running inside a docker container on a virtual server, which in turn gets fed information from embedded sensors. I can write a program, that reads raw data from files, cameras or embedded devices, processes it, analyzes it and recognizes things from it, and makes decisions based on this analysis. I know how to implement logical reasoners, neural networks, SVM classifiers, Bayes nets, genetical algorithms and probabilistic grammars. I have waded through tens of thousands of pages of scientific articles and books about artificial intelligence and machine learning methods and their underlying philosophy.

I am a dreamer, an idea-maker, enthusiast and implementer. I am able to work independently and diligently until I can solve the problem I am working on, or fix the bug that my code is riddled with. I have knowledge and experience from a wide variety of things, and due to my researcher training I am capable of systematic and rational work. I just have never been in a situation, where I could concentrate fully on one challenging task at a time and get it finished. I also haven’t had a natural channel for expressing my ideas, and I am a reticent and reclusive person.

 

My company argomens was founded to provide me with the kind of job where I am at my best: as an idea-maker and a solver of challenging technical problems. Over time, I wish it will grow into a company that helps other companies in developing intelligent products that make people’s lives easier, and possible also employs others. I will work towards this goal by creating networks of companies and people in the business.

I will help companies in … nd solving them in an agile fashion one piece at a time. I am willing to discuss openly and confidentially all kinds of technical topics and … ideas to make them viable to implement in practice. I will charge only a fixed price that we agree upon beforehand, based on the expected amount of time I will spend. I want to make a full-time, professional software development service available in the Jyväskylä region, and all of the urban Finland.