Sending push notifications with Vertx and OneSignal

All good things come to an end: that’s what I thought when Parse announced it’s shutdown. Back then, Parse was the best “push notifications as a service”-solution that offered a REST-API to send and schedule cross platform push notifications through a backend service. Because we used it in one of our products, we had to look for a replacement and so I finally stumbled upon OneSignal.

Like Parse, it offers cross platform pushes, a ton of clientside SDKs to easily integrate it in various languages and a solid REST-API to trigger push notifications through a backend. And last but not least, it doesn’t cost any money*.

As you might have noticed in my recent posts I use Vertx a lot. So I thought it would be a good idea to write a library that allows you to send push notifications using OneSignal the Vertx-way. You can see this library as a wrapper around OneSignal’s REST-API that gives you compile-time validation instead of a trial and error approach.

Happy pushing!

* According to the documentation of OneSignal, OneSignal makes money from collecting data of your clients. If you’re concerned, they also offer paid service options.

Vertx loves jOOQ

I’ve recently published a library that connects two frameworks I used a lot in the past: Vertx and jOOQ. Vertx is a “reactive tool-kit for the JVM” that enables you to write non-blocking code and has some nice features built in, like webserver/socketserver-implementation, a message-bus that can run on one or more instances in one network and many more goodies you should check out. On the other hand, if you are coding in Java and you like SQL, there is no way around jOOQ. In my opinion, jOOQ’s two killerfeatures are the possibility to write typesafe SQL and the awesome code-generator that generates POJOs, DAOs and Table-implementations based on your database schema (schema first!). However jOOQ is using JDBC under the hood which blocks the calling thread until the database operation completes.

That is where my library hooks in: it provides a code-generator that adds non-blocking CRUD-methods to all generated DAOs and converter methods that allow you to convert from Vertx’ JsonObject into jOOQ’s POJOs and vice versa.

For code examples, please refer to the github-page.

Vertx + Pebble + Bootstrap = <3

Update


In between, there has been an official implementation of Pebble for Vertx. You should use this dependency in favor of my implementation.


It’s been a long time since I wrote something in here. That is mainly because I moved to a project where no Java was involved and I had to code PHP for quite some time. While I did this, I came closer to web development and the bootstrap-library from twitter. For a new project I decided to switch to Java again, because we had realtime requirements and – to be honest – because I wanted to code in Java again. A lot has happened in between though: Java 8 came out and the whole micro-services train started to move into direction from theoretical discussions to rock-solid technologies which are used in production. One of which is vertx which was released in the third version.

The Goal

In this blogpost I want to show you how you can write a simple chat application using the technologies mentioned above:

  • Vertx: „A polyglot tool-kit to build reactive applications on the JVM”
  • Bootstrap: „The most popular HTML, CSS, and JS framework for developing responsive, mobile first projects on the web“
  • Pebble: „A lightweight but rock solid Java templating engine”

In chat applications there is the requirement that messages have to be pushed from the server to the client. This makes a classic web server stack obsolete since the request-response pattern does not apply. I decided to use WebSockets which are built on top of the well-known HTTP protocol and can be easily implemented on client-side using Javascript.

The WebSocket Server

While I’ve been working on the PHP project, I realized that I don’t have to care about writing web servers. You just write your code and let application servers like apache or nginx do the rest. But we are back in the Java-World. And in the Java-World you have to take care of that. This post is not a deep-dive into every aspect of vertx and I will just explain the parts that are necessary for this post to be understood. If you’re hooked, I recommend you reading their docs. Please note that you don’t have to copy the code snippets one by one, because I’ve setup a github repo for this purpose.

Line 4: First of all, we need a Router. It is used to route the request coming from the vertx-Httpserver. We define a route and handler which is responsible for handling that route so whenever someone is calling „/ws/websocket/„ the handler will be called. The asterisk in this case means that the request could also be „/ws/websocket/asdf“ and would still be handled by that handler.

Line 5: This is how to upgrade a regular request to a WebSockets in vertx (for the sake of completeness: there are also other ways of handling WebSockets in vertx, but I liked this the most).

Line 7: Here we added a handler that is called whenever a message is received on the WebSocket. We expect messages to be binary encoded JSON. For every message we add a senderId-field with a unique identifier for each client to the message and send it to the event bus. The event bus is used to the send messages between several vertx instances – either locally or even clusterwide. But it also works only for one vertx instances like in our case. Every message needs to have an address which is in our case „chat.broadcast“. One thing worth mentioning: there is also a send-method on the eventbus which implies the call of ONE handler, whereas publish will signal all registered handlers.

Line 13: This is how the client will receive messages. We register a consumer on the eventbus that is notified whenever a message is published. We evaluate the notification and check for the senderId-field. If we are not the sender, the message is send to our client.

Line 21: If something unexpected happens, we want to log it.

Line 22: When the client is closed, we have to cleanup everything.

Line 27: Ramp up the HTTP server with the defined route.

That is all logic needed for handling the client-server communication. 

The web server

For convenience reasons, I decided to add the client into the same application. This basically means that we need to have a web server that delivers the HTML-document which has a java-script-client embedded. 

Line 4: We’re working again with a router here. To deliver static content (JavaScript-Libraries, CSS-Content and anything else ’static’) we need a StaticHandler which is part of the Vertx library. We need to define a base-path in which our files are stored. In our case it is “webroot/resources“ which I’ve created under „src/main/resources/„. (The Router is the same that we used for the WebSocket-server. This is important, because if we would create a Router in each Verticle some requests wouldn’t be served. This is because the HttpServer would call the Routers in a round-robin-fashion and therefore not all routes we defined would be known.)

Line 5: Our app will have a fancy favicon, this is how you deliver it.

Line 6: In this example we’re using a templating engine to render our HTML pages. Vertx comes with an out-of-the-box-support of four template engines but I wanted to give pebble a try. The main reason for using Pebble is that I like the templating syntax and it gives a nice performance (https://github.com/mbosecke/template-benchmark) too. Therefore I created my own PebbleTemplate-Engine and added it to the TemplateHandler. The templates are located under „webroot/templates” in this example. 

Line 9: Route every request that has not matched by any other route to our chat_client.html.

Line 10: Ramping up the HTTP server. But wait! Didn’t we use the same port for the WebSocket-server? Yes we did, but vertx is smart enough to combine the routes and handle everything accordingly.

Bring it

We create the server using the good old main-method and instantiate a Vertx-instance as shown. Vertx has the concept of Verticles that provide„a simple, scalable, actor-like deployment and concurrency model“. One benefit is that you can write code as if it was single threaded, because code is always called from the same eventloop. The whole example would totally have worked without Verticles, but creating a Router and everything directly in the main-method. Using Verticles however encourages you to separate your concerns and can be used as an entry-point for developing microservices. Another thing worth mentioning is, that Verticles are deployed asynchronously and therefore the Main-Thread will not block until they are deployed.

The Template

The client is written in JavaScript and embedded into an HTML-document so let’s first look at the document-layout. 

This is the skeleton from which every page will inherit and is basically the start-template provided by bootstrap, however divided into several parts. The include-keyword includes the content of the given source into this document like you would expect (one has to prepend a dot followed by a slash to the source-name in order to make the loading work (at least on my mac)). The HTML itself is pretty basic but I’ll explain the used element classes that come with bootstrap:

  • container-fluid describes a container spanning over the whole document
  • rows are used to define a horizontal group of columns
  • col-md-offset-1 describes that we’ll have 1 column padding on each side without content. The col-md-10 describes a column that has a size of 10/12 of the width of the enclosing row-div since rows are divided in 12 md-columns. If we would add two divs with each having the class col-md-5, we would end up having two columns in one row (plus the two offset-columns). Because bootstrap is mobile first, this is responsive and automatically scales well on all kinds of devices. We could also define the classes not using col-md-x but col-xs-x or col-s-x to have different layouts for devices with extra-small (phones) or small (tablets) devices. For further reading I recommend the bootstrap-docs (http://getbootstrap.com/css/#grid).
  • the last div has the starter-template class which basically moves the content below the navigation header.

Now we are coming to another interesting part of the template engine. We define a block called ‚content‘ which can be overridden by any page that is extending this document. (In our example we only have one site – the chat application – which makes the usage of templates a bit obsolete, but that is another story ;)). We could also add some default values but in this case it is not necessary. The key takeaway from this is that you can write all the boilerplate code in one file and when writing a new pages you just have to add the parts that are new. 

The Client

By using the extends keyword we tell Pebble to inherit from that document. We’re overwriting the content-block with straight-forward HTML. On the bottom of the page, the WebSocket-client logic is defined in Javascript. On Line 45 we’re opening a connection to localhost. Because the server expects and sends binary frames, we set the binaryType accordingly. The rest of the code is not very hard to understand and should be self-explainority. 

The Test

I hosted the complete project on GitHub. The project requires maven and Java 8 to be built properly. After you cloned the project, run Main.java in the IDE of your choice, open two tabs in your browser, point to localhost:8080, select a username and start chatting! As you can see, messages are instantly pushed to the other clients – the benefit of WebSockets. I left some space for modifications though: right now you can only broadcast messages. But what if you want to send a private message to just one recipient? Maybe you want to personalize your account by uploading avatars or securing them with a password on a separate page/template? If I find the time I will cover this in one of my next blog-posts, but until then: happy coding!

Fault tolerant systems

Recently I listened to a speech about Fault Tolerant Systems at the Berlin Expert Days held by Uwe Friedrichsen. I made myself some notes and this is what the today’s post is about.

First of all Friedrichsen separated different kind of failure-types:

Crash Failure: Pretty obvious this means a failure that causes a system or a part of it to crash.

Omission Failure: This kind of failure leads to a system that is unreachable. Although the effect to the user may be the same than the Crash Failure the system is still live.

Timing Failure: A good example of a timing failure is a timeout when some resources are requested by a client, e.g. the webserver does not respond within a given timeframe. A reason may be that the system is under load and cannot handle it. Clients reload the page, increase the load of your system and multiple Timing Failures may lead to a system crash.

Response Failure: A failure which returns an unexpected and wrong response to a given request.

Byzantine Failure: A failure which happens arbitrary and is hard to reproduce or happens only under specific constellations. A race-condition through concurrent access might be an example here.

Measure, don’t guess!

After this categorization he proceeded to two important metrics when it comes to failures: Mean Time To Failure (MTTF) and Mean Time To Recovery (MTTR). MTTF is the time that elapsed until a clean and normal behaving system starts to reproduces failures. If this number is very high your system can be classified as stable and you are doing a good job. On the other hand, if you know your MTTF it means that failures are a common to your system and it therefore cannot be stable. More important is the MTTR: how long does it take you to repair a broken system and bring it back to a stable state. Of course this depends on multiple factors like overall code quality, etc.

Keep it simple – it will fail

Friedrichsen then advices us to follow three principles when creating a system that should be tolerant against failures:

  • KISS
  • Design for failure
  • Design incrementally

KISS is an abbreviation for Keep it simple, Stupid! It basically means that you should design every part of your system as simple as possible. Yes, you are a skilled coder that can combine three different design patterns to solve the problem and keep it extensible. However this has (at least) two disadvantages: mostly you ain’t gonna need the extensibility (YAGNI – google it) and when it comes to failures one has to know why you implemented the way you did (you documented it – right?). The MTTR would have been shorter when the one that has to solve the problem would only have to look into one class and not into different packages, modules or even frameworks (small sidenote: I don’t advice you to write a system into one class with 10K lines of code)!

The Design For Failure-principle goes hand in hand with KISS: you can be certain that your system will fail. I’ll guarantee it will. When you respect that and design your system that way it will be easier to fix problems when they arise. It also means that you think about failures (in Java: Exceptions) and how to handle them, when they are thrown.

I think the last point in the list – Design Incrementally – means that you have to break the task into smaller parts, implement and test those. When you do it that way you can test subsystems early instead of a Pandora’s Box after several months of coding the whole application.

A critical part is also detecting errors. The larger your application and the more sensible your data is, it becomes more important to detect errors early. It starts with throwing and logging Exceptions in your code (and not silently ignoring them), automatic (email) notifications in case of failures and live monitoring of the system through humans. The more time and work you spend here, the less time it will take to recognize and eliminate the failure.

Handle it

Finally Friedrichsen showed some patterns concerning failures:

Redundancy: This is mostly on a hardware-level. Having your database replicated to another server in realtime makes it possible to recover in case of a defect very fast. A rollback of a huge MySQL on the other hand can take several hours in which your system will be unavailable. In times of always-on and availability this is mostly unacceptable. Another example comes from aviation: some critical parts in an aircraft are designed redundant. For example a calculation is done simultaneously by different systems (different software) to ensure correctness and availability.

Escalation: Back to the code level: when a subsystem (e.g. the database-layer) detects an error, like connection-loss to the database, it should escalate that also to the business-layer so it can react on the new situation, e.g. stop accepting requests and raise monitoring-events.

Error Handler: You can forward exceptions to an Error Handler that reacts on the error, like just logging the error, retry the operation, perform a rollback, rolling forward or just reset data.

Shed load: Main idea behind this is that a long response time is worse than rejecting some a request. Rejecting is fast and directly visible whereas a timeout can take a while and feels unresponsive. One can implement this via a gatekeeper that has to be passed by every request. The keeper decides whether or not to forward the request or reject it depending on plausible metrics.

Marked data: When you know that there was a failure and the data you’re currently working with is corrupt you can mark it with a dirty flag. Go ahead and handle it via routine maintenance (e.g. a script that fixes it or a cronjob).

Small patches: When you’ve fixed a bug, make atomic patches. In that way you minimize the risk of adding new bugs to the system.

Cost/Benefit: Sometimes it is very expensive fixing a bug that does not harm the system. When the tradeoff between the costs of fixing the failure and the damage caused by it is made, it sometimes makes sense to live with the failure without removing it. In my opinion this is a very dangerous approach because of the Broken Window Theory: once your system is broken, you tend to work less correct because – hey the system is already corrupt – why should I give my best?

Timeouts: Instead of waiting infinitely for a resource you should always consider using a timeout where possible. In Java-World, when working with locks, you can use tryLock(timeoutMillis) instead of just lock() which blocks an undefined timeframe in case the lock is held by another Thread.

Fail Fast: Some operations are expensive. In a setup where an expensive operation is requested by a client and after that a resource is required which may be unavailable it is a good idea to check the availability before computation of the expensive action. This can be implemented by a guard that checks the availability and reports it to the system that processes the client-requests. In case of unavailability reject the response or handle it in another way.

Fixing bugs

Depending on the quality of your code you’ll find yourself sitting in front of your computer more or less often fixing bugs. It doesn’t matter what kind of genius you are, sooner or later you will recognize that a program you wrote does not behave the way you expected. You can minimize the frequency with unit-tested code, code-reviews, experience and tools (like Findbugs). That will minimize the probability of producing bugs dramatically but like the 1/x-function which never touches the x-axis, the probability will never fall to zero.

Accept the “impossible” scenario

When you recognize a bug by finding Exception-stacktraces in your logfiles or users report strange, unexpected behavior, there are two possible scenarios may coming up your mind:
1.    You can identify the problem instantly because the Exception thrown tells you what happened or because of issues you had related to that bug. You know where and why the bug occurs and you know how to fix it.
2.    The other category of bug is the one that makes you think “It is impossible – that cannot be!”. A NullPointerException is thrown, where you are almost certain that Object XY can NEVER be null at this point. Impossible!!!
Resist that thought, because it is wrong, because it won’t help you fixing the problem. The program tells you right now that it failed. The impossible happened before your face!

Information is King

When facing a bug from category two and you don’t know directly where the root-cause of the bug lies, you often need to gather information. Information about the state of the objects involved for example. Don’t hesitate and add that information to your log or extend the Exception-message. If necessary make your code ugly to gain that information, deploy and execute it. It doesn’t matter – you’ll wash that code-smell away after you found the bug. Context-based logging might help here: many developers print almost anything on DEBUG-level, however when you lower the log-level in your live-environment the system will dramatically slow down under load because of IO. So you might think about context-based logging to set a log-level on a per-user-base (e.g. when one specific user faces problem XY).

Win back control

Anyways you have to communicate with your system on runtime and you should do this using JMX-API. When using JMX you can add relatively easy a MBean holding parameters you collected to hunt down the bug. Many third-party-libs also provide MBeans to see internal parameters or even change them (Hibernate-Statistics for instance). When you are using a Staged-Event-Driven-Architecture with some ExecutorServices you might be interested in the size of the task-queue or the number of running Threads: add a MBean to see their current values (or change them).

Know your enemy

All the information you gathered is useless when you can’t put them into the right context. So you have to know your system. Sounds easy. But is it? Many projects are the sum of the work of different programmers. Code from programmers that are currently in your team, code from inhouse-libraries, code from programmers that left the building before it collapsed. When joining an existent team with several thousand’s lines of code, this can really be an issue. In that case you have to communicate: ask for information, ask for help. Sadly this isn’t the end of the story. Almost every project relies on third-party libraries: IO, Database and Caching for instance. And if the worst comes to the worst it is good to know the internals of that libs also.

I can see dead people

Information can only be collected when the program is running. However when the program stops running, you have a problem. Perhaps there is a deadlock that prevents the execution of your program. Most deadlocks can be found via jstack. In short, jstack prints the stacktraces of all live threads, including held monitors and detected deadlocks. It helps you understand which monitor is held by which thread. jstack requires the process-id of your (running) Java-application. You can retrieve the PID by typing jps –m. Then you can type jstack –l PID > PATH_TO_STORE_YOUR_LOGFILE_INCLUDING_FILENAME to store the thread-dump into a file.
When a deadlock occurs, the system often hangs and cannot be shut down properly. In that case you must terminate the program regardless of the consequences. This can be done via the task-manager under windows or with the kill-command (don’t forget the -9 property) under Linux.

Last words

I’m pretty sure I didn’t cover all aspects of the topic. And, to be honest, it is impossible to do this. Because there are so many libraries, so many different ways of solving a problem, there is no magic trick that helps you fixing or avoiding bugs. However I recommend you to keep your code simple. The more complex and the more sophisticated your code gets, the more difficult it will be to fix bugs (KISS-principle). Always remember this when you’d like to reinvent the wheel again.

One does not simply fix a bug

Useful eclipse templates

When working as a coder, you have to now your tools. And the tool you’re using most is the IDE of your choice. Most Java-programmers choose Eclipse as their development-platform, because it is free and powerful (I remember the days when JBuilder 2.0 was state-of-the-art – I don’t miss them). Facing your daily business, you’ll often find yourself write the same (boilerplate) code-snippets:

  • getters and setters
  • equals and hashcode
  • make a class a Singleton
  • create a local logger-instance
  • etc.

As a coder you should hesitate when writing them repetetive and ask yourself: is there a way to automatize it? I mean automatization is one of the main reason why computers were invented and the programs you’re writing are doing the same things repetetive (and fast). Some of those tasks can be automatized with build-in Eclipse-features (Source – menu), but others not. At least by default.

Eclipse has a cool feature, calling “Code Templates” which ist accessible via Window -> Preferences. “Code Templates” allow you to store code-snippets and substitute the code with a meaningful name. Let’s start with a simple example, e.g. having a shortcut to Java’s currenttime:

  1. Press the New… – Button.
  2. Enter a meaningful name like _currenttime (I prefer adding underscore to the name, to group all my templates).
  3. Add a description if you like.
  4. Write System.currentTimeMillis(); into Pattern-Textbox.
  5. Press OK.

When you’re writing code the templates are part of the auto-completion-feature which is accessible pressing CTRL+SPACE. In future, when you’d like to know the system’s current time, just write _cur and press CTRL + SPACE and then ENTER. Eclipse will replace that with the pattern you’ve written into the template. Here are some useful Code Templates:

Description: Creates the singleton boilerplate-code.

Pattern:


private static ${enclosing_type} instance = new ${enclosing_type}();

private ${enclosing_type}(){}

public static ${enclosing_type} getInstance(){
return instance;
}

Description: Creates a static SLF4J-Logger-instance in current class.

Pattern:

${:import(org.slf4j.Logger,org.slf4j.LoggerFactory)}
private static final Logger logger = LoggerFactory.getLogger(${enclosing_type}.class);

Description: Creates a try-finally block for locking-purposes.

Pattern:

try{
lock.lock();${cursor}
}finally{
lock.unlock();
}

Stackoverflow has many suggestions on this topic. What are your favorite templates?

Dealing with InterruptedException

In my time as a Java-programmer I never thought (thoroughly) about the meaning of InterruptedException, although one is getting in touch with it very early. To be honest in the last week I had to deal with that exception because it used to happen at work: whenever I restarted one of our server that exception pops up once in a while (this is one of the errors that never happens on your local (or remote) test-environment but on live systems under load but thats another story ;-)). The exception was thrown by a low-level database-connection-pooling-library called c3p0 which meant we lost some data. Luckily this isn’t a matter of life and death because I’m working in the gaming-industry, but the results aren’t that good also: data-loss, inconsistency and perhaps broken user-accounts.

To understand why that exception has been thrown I have to give you a small glimpse into our system-architecture. We have two layers, each represented by a ThreadPool. The network- and gamelogic-layer (layer 1) decodes messages and executes them and the database-layer (layer 2) creates, updates and deletes entities as requested by layer 1. Additionaly, for each player-representation there exists a database-manager-class which collects all requested entity changes by layer 1 and sends them every X seconds to the database. However, when a player logs out or there is a connection-loss on client-side we force this database-manager to commit the changes immediately (because he may re-log-in instantly and would might get stale data then). Look at the following code-snippet taken from that database-manager:

@Override
public boolean await(boolean instant, long maxTimeoutMs)
throws RamaDatabaseAccessException {
if (!isScheduled()) {
return true;
}
if (instant) {
cancelAndExecute();
return false;
} else {
/*
* Depending on the timeout wait for result or call cancelAndExecute().
*/
...
}
}

@Override
public Object call(){
/*
* Invoke some CRUD-operations depending on the scheduled transactions.
*/
...
}

When the player logs out, the await()-method is called having the instant-flag set which means the scheduled execution of the database-manager should be canceled and instantly executed(invocation of cancelAndExecute()). I will show you two possible solutions of how to implement this method and then discuss both of them.

Solution A:

private void cancelAndExecute() throws Exception{
if (this.future.cancel(false)) {
this.call();
}
}

Solution B:

private void cancelAndExecute() throws Exception{
if (this.future.cancel(false)) {
this.getExecutor().submit(this).get();
}
}

Both solutions have in common that they cancel the scheduled operation and then execute it immediately. Solution A executes the operation in the current Thread and the second solution delegates the operation to an ExecutorService. Obviously A breaks the architecture because the database-operation is executed by the gamelogic-layer. However this was the solution I preferred because only one Thread is involved in that case, the caller Thread. Solution B on the other hand blocks a Thread from the ExecutorService and the caller Thread (because of calling get()).

As I mentioned in the beginning, the InterruptedException was thrown while the caller Thread was waiting for a database connection and the server was shutting down. While the shutdown-process is running, our programm starts shutting down the network-layer to stop comunication with the game-clients. In that process the releaseExternalResources()-method from the external network-framework netty is called which itselfs invokes shutdownNow() on its associated ExecutorService. shutdownNow() then invokes Thread.interrupt() on all it’s worker Threads.

So how did the two solutions behave in that scenario? Solution A will throw an InterruptedException when it cannot acquire a database-connection instantly (which happens under load). The database-operation will be interrupted and not be performed. Solution B however will also throw an InterruptedException but with minor impact: B is waiting for another Thread that does the job whereas A does the job itself. The critical task itself will still be executed (because it’s executed by another Thread) and only the process of waiting for the completion of that task (calling get() on the Future in this case) will be interrupted.

When catching an InterruptedException you should restore the interrupted-state of the Thread as suggested by this site, so I finally came to the following solution:

Solution C:


private void cancelAndExecute() throws Exception{
if (this.future.cancel(false)) {
try {
this.getExecutor().submit(this).get();
} catch(InterruptedException i){
/*
* May be thrown when netty shuts down. In this case just return and
* don't wait for the result. (Pending tasks were finished later
* in the shutdown-hook).
*/
Thread.currentThread().interrupt();
logger.info("TransactionManager interrupted while waiting for result. Return without waiting.");
}
}
}

Thanks for reading!