Configuring A Staged Java Application On AWS ElasticBeanstalk


Update

I’ve just blogged about a maven only solution.


In this blog post I’ll describe how to deploy and setup a maven driven Java application on AWS ElasticBeanstalk with different staging profiles. To understand what’s going on, you should have knowledge about AWS and how to setup an ElasticBeanstalk environment.

I have setup this github-repository which contains a very basic maven project. Let’s have a look at it’s pom:

No big surprises here: we set the compiler level to Java 8 and tell maven to build an executable fat-jar called app.jar during package-phase. The main itself just prints the passed arguments in the console as shown below:

Executing mvn clean install should generate the class files and a runnable jar file in the target-folder. One could take this file and upload it to ElasticBeanstalk. We’re done.

Just kidding.

Two problems may arise:

  1. ElasticBeanstalk will reject the file the next time you upload it, because it already has a file called app.jar.
  2. The application wouldn’t print any arguments, because you haven’t passed any to it.

Solving the first problem is quite easy: you can just append the current timestamp (or project version) to the jar. Either manually, or by changing the finalName-tag to something like

app-${project.version}

or

app-${maven.build.timestamp}

However this doesn’t solve the second problem. To tackle this, one must understand how the application is started after it was deployed. On the EC2-instance, next to the uploaded jar-file there is a file called Procfile, which is used to start the jar. By default, it consists of a one-liner:

web: java -jar app.jar

The only thing you need to know regarding the first word web is:

The command that runs the main JAR in your application must be called web, and it must be the first command listed in your Procfile.

Good news is, that a user can upload it’s own Procfile which leads us to the question: where to put the arguments, especially if you have different arguments for each staging profile. At first sight, a good fit are maven-profiles, but that solution has two major drawbacks: secret informations like passwords should not be stored in your pom. And secondly, each time you change the arguments, you have to redeploy the whole application.

A much better approach is to store that kind of data on the instance itself using ElasticBeanstalk’s configuration. To do so, go to your environment, select the application you are using (or create a new one based on Java SE-Platform), click on Configuration, Software Configuration and create two new environment variables: JAVA_OPTS and JAVA_ARGS. Set the value of JAVA_OPTS to something like -Dfile.encoding=utf8 and JAVA_ARGS to my_staging_property. The values are now available on the instance – time to create our own Procfile. To do so, I’ve created a bash-script (Windows users look here) like this:

  • Lines 3 and 4: I decided to append the timestamp to the build artifact to fix the redeploy issue.
  • Line 6, 7 and 8: Our Procfile just executes another shell script, called run.sh. That script uses the environment variables defined above to start the application.
  • Line 9: Zip everything into one file.

So instead of executing maven directly, run ./build.sh from the terminal. This will create a file with a name like app-2017-05-03_14-50-07.zip in the target/ – folder of your maven project which can then be deployed to any staging environment. Configuration is solely done by ElasticBeanstalk. E.g. set JAVA_OPTS to -Xmx256M on your dev-environment and -Xmx16G on production. Changing some parameters only requires to update your EB-configuration (e.g. tune some GC-settings during a load test) instead of redeploy everything.

Advertisements

The Future Is Here

Getting asynchronous operations right is hard. And when you have to pass a result of one function to the next, it can only get worse. Since today, vertx-jooq’s API for instance only allowed asynchronous operations with a callback handler, which leads to code fragments like this:

What annoys me about this is that I have to define in each handler if it succeeded and what to do if an exception occurred on the database layer. Especially when you have to nest three or more operations this looks ugly. Of course this problem is not new and there exists an alternative approach which is using and composing Java 8 java.util.concurrent.CompletableFuture. By doing so, the same code becomes easier and more readable:

But using CompletableFuture within the Vertx world leads to a problem: Vertx has it’s own threading model to achieve the performance it actually has. On the other hand, some methods of CompletableFuture, e.g. CompletableFuture.supplyAsync(Supplier), run tasks on the common ForkJoinPool which would break the Vertx contract. Open-source-software to the rescue, there is a solution to this problem: VertxCompletableFuture. This special implementation guarantees that async operations run on a Vertx context unless you explicitly specify an Executor in one of the overloaded xyzAsync-methods*.

And here comes even better news: starting from version 2, vertx-jooq also supports this way of dealing with asynchronous database operations by utilizing VertxCompletableFuture. Checkout the new vertx-jooq-future module and the according code generator to create your CompletableFuture-based DAOs.

* There have been discussions in the Vertx developer group about a CompletableFuture based API, e.g. here and especially here. The current status is that they do not provide such API officially, mostly because VertxCompletableFuture breaks the contract of the supplyAsync-methods, since it runs within the Vertx context and not the ForkJoinPool. Also when you pass this CompletableFuture subclass to code that expects a regular CompletableFuture, it breaks the Liskov substitution principle and OOP (thanks for pointing that out in the comments Julien). My opinion is, that if you are using Vertx you are aware of the special threading model and can tolerate that behavior. But, of course, it’s up to you.

Sending push notifications with Vertx and OneSignal

All good things come to an end: that’s what I thought when Parse announced it’s shutdown. Back then, Parse was the best “push notifications as a service”-solution that offered a REST-API to send and schedule cross platform push notifications through a backend service. Because we used it in one of our products, we had to look for a replacement and so I finally stumbled upon OneSignal.

Like Parse, it offers cross platform pushes, a ton of clientside SDKs to easily integrate it in various languages and a solid REST-API to trigger push notifications through a backend. And last but not least, it doesn’t cost any money*.

As you might have noticed in my recent posts I use Vertx a lot. So I thought it would be a good idea to write a library that allows you to send push notifications using OneSignal the Vertx-way. You can see this library as a wrapper around OneSignal’s REST-API that gives you compile-time validation instead of a trial and error approach.

Happy pushing!

* According to the documentation of OneSignal, OneSignal makes money from collecting data of your clients. If you’re concerned, they also offer paid service options.

Vertx loves jOOQ

I’ve recently published a library that connects two frameworks I used a lot in the past: Vertx and jOOQ. Vertx is a “reactive tool-kit for the JVM” that enables you to write non-blocking code and has some nice features built in, like webserver/socketserver-implementation, a message-bus that can run on one or more instances in one network and many more goodies you should check out. On the other hand, if you are coding in Java and you like SQL, there is no way around jOOQ. In my opinion, jOOQ’s two killerfeatures are the possibility to write typesafe SQL and the awesome code-generator that generates POJOs, DAOs and Table-implementations based on your database schema (schema first!). However jOOQ is using JDBC under the hood which blocks the calling thread until the database operation completes.

That is where my library hooks in: it provides a code-generator that adds non-blocking CRUD-methods to all generated DAOs and converter methods that allow you to convert from Vertx’ JsonObject into jOOQ’s POJOs and vice versa.

For code examples, please refer to the github-page.

Vertx + Pebble + Bootstrap = <3

Update


In between, there has been an official implementation of Pebble for Vertx. You should use this dependency in favor of my implementation.


It’s been a long time since I wrote something in here. That is mainly because I moved to a project where no Java was involved and I had to code PHP for quite some time. While I did this, I came closer to web development and the bootstrap-library from twitter. For a new project I decided to switch to Java again, because we had realtime requirements and – to be honest – because I wanted to code in Java again. A lot has happened in between though: Java 8 came out and the whole micro-services train started to move into direction from theoretical discussions to rock-solid technologies which are used in production. One of which is vertx which was released in the third version.

The Goal

In this blogpost I want to show you how you can write a simple chat application using the technologies mentioned above:

  • Vertx: „A polyglot tool-kit to build reactive applications on the JVM”
  • Bootstrap: „The most popular HTML, CSS, and JS framework for developing responsive, mobile first projects on the web“
  • Pebble: „A lightweight but rock solid Java templating engine”

In chat applications there is the requirement that messages have to be pushed from the server to the client. This makes a classic web server stack obsolete since the request-response pattern does not apply. I decided to use WebSockets which are built on top of the well-known HTTP protocol and can be easily implemented on client-side using Javascript.

The WebSocket Server

While I’ve been working on the PHP project, I realized that I don’t have to care about writing web servers. You just write your code and let application servers like apache or nginx do the rest. But we are back in the Java-World. And in the Java-World you have to take care of that. This post is not a deep-dive into every aspect of vertx and I will just explain the parts that are necessary for this post to be understood. If you’re hooked, I recommend you reading their docs. Please note that you don’t have to copy the code snippets one by one, because I’ve setup a github repo for this purpose.

Line 4: First of all, we need a Router. It is used to route the request coming from the vertx-Httpserver. We define a route and handler which is responsible for handling that route so whenever someone is calling „/ws/websocket/„ the handler will be called. The asterisk in this case means that the request could also be „/ws/websocket/asdf“ and would still be handled by that handler.

Line 5: This is how to upgrade a regular request to a WebSockets in vertx (for the sake of completeness: there are also other ways of handling WebSockets in vertx, but I liked this the most).

Line 7: Here we added a handler that is called whenever a message is received on the WebSocket. We expect messages to be binary encoded JSON. For every message we add a senderId-field with a unique identifier for each client to the message and send it to the event bus. The event bus is used to the send messages between several vertx instances – either locally or even clusterwide. But it also works only for one vertx instances like in our case. Every message needs to have an address which is in our case „chat.broadcast“. One thing worth mentioning: there is also a send-method on the eventbus which implies the call of ONE handler, whereas publish will signal all registered handlers.

Line 13: This is how the client will receive messages. We register a consumer on the eventbus that is notified whenever a message is published. We evaluate the notification and check for the senderId-field. If we are not the sender, the message is send to our client.

Line 21: If something unexpected happens, we want to log it.

Line 22: When the client is closed, we have to cleanup everything.

Line 27: Ramp up the HTTP server with the defined route.

That is all logic needed for handling the client-server communication. 

The web server

For convenience reasons, I decided to add the client into the same application. This basically means that we need to have a web server that delivers the HTML-document which has a java-script-client embedded. 

Line 4: We’re working again with a router here. To deliver static content (JavaScript-Libraries, CSS-Content and anything else ’static’) we need a StaticHandler which is part of the Vertx library. We need to define a base-path in which our files are stored. In our case it is “webroot/resources“ which I’ve created under „src/main/resources/„. (The Router is the same that we used for the WebSocket-server. This is important, because if we would create a Router in each Verticle some requests wouldn’t be served. This is because the HttpServer would call the Routers in a round-robin-fashion and therefore not all routes we defined would be known.)

Line 5: Our app will have a fancy favicon, this is how you deliver it.

Line 6: In this example we’re using a templating engine to render our HTML pages. Vertx comes with an out-of-the-box-support of four template engines but I wanted to give pebble a try. The main reason for using Pebble is that I like the templating syntax and it gives a nice performance (https://github.com/mbosecke/template-benchmark) too. Therefore I created my own PebbleTemplate-Engine and added it to the TemplateHandler. The templates are located under „webroot/templates” in this example. 

Line 9: Route every request that has not matched by any other route to our chat_client.html.

Line 10: Ramping up the HTTP server. But wait! Didn’t we use the same port for the WebSocket-server? Yes we did, but vertx is smart enough to combine the routes and handle everything accordingly.

Bring it

We create the server using the good old main-method and instantiate a Vertx-instance as shown. Vertx has the concept of Verticles that provide„a simple, scalable, actor-like deployment and concurrency model“. One benefit is that you can write code as if it was single threaded, because code is always called from the same eventloop. The whole example would totally have worked without Verticles, but creating a Router and everything directly in the main-method. Using Verticles however encourages you to separate your concerns and can be used as an entry-point for developing microservices. Another thing worth mentioning is, that Verticles are deployed asynchronously and therefore the Main-Thread will not block until they are deployed.

The Template

The client is written in JavaScript and embedded into an HTML-document so let’s first look at the document-layout. 

This is the skeleton from which every page will inherit and is basically the start-template provided by bootstrap, however divided into several parts. The include-keyword includes the content of the given source into this document like you would expect (one has to prepend a dot followed by a slash to the source-name in order to make the loading work (at least on my mac)). The HTML itself is pretty basic but I’ll explain the used element classes that come with bootstrap:

  • container-fluid describes a container spanning over the whole document
  • rows are used to define a horizontal group of columns
  • col-md-offset-1 describes that we’ll have 1 column padding on each side without content. The col-md-10 describes a column that has a size of 10/12 of the width of the enclosing row-div since rows are divided in 12 md-columns. If we would add two divs with each having the class col-md-5, we would end up having two columns in one row (plus the two offset-columns). Because bootstrap is mobile first, this is responsive and automatically scales well on all kinds of devices. We could also define the classes not using col-md-x but col-xs-x or col-s-x to have different layouts for devices with extra-small (phones) or small (tablets) devices. For further reading I recommend the bootstrap-docs (http://getbootstrap.com/css/#grid).
  • the last div has the starter-template class which basically moves the content below the navigation header.

Now we are coming to another interesting part of the template engine. We define a block called ‚content‘ which can be overridden by any page that is extending this document. (In our example we only have one site – the chat application – which makes the usage of templates a bit obsolete, but that is another story ;)). We could also add some default values but in this case it is not necessary. The key takeaway from this is that you can write all the boilerplate code in one file and when writing a new pages you just have to add the parts that are new. 

The Client

By using the extends keyword we tell Pebble to inherit from that document. We’re overwriting the content-block with straight-forward HTML. On the bottom of the page, the WebSocket-client logic is defined in Javascript. On Line 45 we’re opening a connection to localhost. Because the server expects and sends binary frames, we set the binaryType accordingly. The rest of the code is not very hard to understand and should be self-explainority. 

The Test

I hosted the complete project on GitHub. The project requires maven and Java 8 to be built properly. After you cloned the project, run Main.java in the IDE of your choice, open two tabs in your browser, point to localhost:8080, select a username and start chatting! As you can see, messages are instantly pushed to the other clients – the benefit of WebSockets. I left some space for modifications though: right now you can only broadcast messages. But what if you want to send a private message to just one recipient? Maybe you want to personalize your account by uploading avatars or securing them with a password on a separate page/template? If I find the time I will cover this in one of my next blog-posts, but until then: happy coding!

Fault tolerant systems

Recently I listened to a speech about Fault Tolerant Systems at the Berlin Expert Days held by Uwe Friedrichsen. I made myself some notes and this is what the today’s post is about.

First of all Friedrichsen separated different kind of failure-types:

Crash Failure: Pretty obvious this means a failure that causes a system or a part of it to crash.

Omission Failure: This kind of failure leads to a system that is unreachable. Although the effect to the user may be the same than the Crash Failure the system is still live.

Timing Failure: A good example of a timing failure is a timeout when some resources are requested by a client, e.g. the webserver does not respond within a given timeframe. A reason may be that the system is under load and cannot handle it. Clients reload the page, increase the load of your system and multiple Timing Failures may lead to a system crash.

Response Failure: A failure which returns an unexpected and wrong response to a given request.

Byzantine Failure: A failure which happens arbitrary and is hard to reproduce or happens only under specific constellations. A race-condition through concurrent access might be an example here.

Measure, don’t guess!

After this categorization he proceeded to two important metrics when it comes to failures: Mean Time To Failure (MTTF) and Mean Time To Recovery (MTTR). MTTF is the time that elapsed until a clean and normal behaving system starts to reproduces failures. If this number is very high your system can be classified as stable and you are doing a good job. On the other hand, if you know your MTTF it means that failures are a common to your system and it therefore cannot be stable. More important is the MTTR: how long does it take you to repair a broken system and bring it back to a stable state. Of course this depends on multiple factors like overall code quality, etc.

Keep it simple – it will fail

Friedrichsen then advices us to follow three principles when creating a system that should be tolerant against failures:

  • KISS
  • Design for failure
  • Design incrementally

KISS is an abbreviation for Keep it simple, Stupid! It basically means that you should design every part of your system as simple as possible. Yes, you are a skilled coder that can combine three different design patterns to solve the problem and keep it extensible. However this has (at least) two disadvantages: mostly you ain’t gonna need the extensibility (YAGNI – google it) and when it comes to failures one has to know why you implemented the way you did (you documented it – right?). The MTTR would have been shorter when the one that has to solve the problem would only have to look into one class and not into different packages, modules or even frameworks (small sidenote: I don’t advice you to write a system into one class with 10K lines of code)!

The Design For Failure-principle goes hand in hand with KISS: you can be certain that your system will fail. I’ll guarantee it will. When you respect that and design your system that way it will be easier to fix problems when they arise. It also means that you think about failures (in Java: Exceptions) and how to handle them, when they are thrown.

I think the last point in the list – Design Incrementally – means that you have to break the task into smaller parts, implement and test those. When you do it that way you can test subsystems early instead of a Pandora’s Box after several months of coding the whole application.

A critical part is also detecting errors. The larger your application and the more sensible your data is, it becomes more important to detect errors early. It starts with throwing and logging Exceptions in your code (and not silently ignoring them), automatic (email) notifications in case of failures and live monitoring of the system through humans. The more time and work you spend here, the less time it will take to recognize and eliminate the failure.

Handle it

Finally Friedrichsen showed some patterns concerning failures:

Redundancy: This is mostly on a hardware-level. Having your database replicated to another server in realtime makes it possible to recover in case of a defect very fast. A rollback of a huge MySQL on the other hand can take several hours in which your system will be unavailable. In times of always-on and availability this is mostly unacceptable. Another example comes from aviation: some critical parts in an aircraft are designed redundant. For example a calculation is done simultaneously by different systems (different software) to ensure correctness and availability.

Escalation: Back to the code level: when a subsystem (e.g. the database-layer) detects an error, like connection-loss to the database, it should escalate that also to the business-layer so it can react on the new situation, e.g. stop accepting requests and raise monitoring-events.

Error Handler: You can forward exceptions to an Error Handler that reacts on the error, like just logging the error, retry the operation, perform a rollback, rolling forward or just reset data.

Shed load: Main idea behind this is that a long response time is worse than rejecting some a request. Rejecting is fast and directly visible whereas a timeout can take a while and feels unresponsive. One can implement this via a gatekeeper that has to be passed by every request. The keeper decides whether or not to forward the request or reject it depending on plausible metrics.

Marked data: When you know that there was a failure and the data you’re currently working with is corrupt you can mark it with a dirty flag. Go ahead and handle it via routine maintenance (e.g. a script that fixes it or a cronjob).

Small patches: When you’ve fixed a bug, make atomic patches. In that way you minimize the risk of adding new bugs to the system.

Cost/Benefit: Sometimes it is very expensive fixing a bug that does not harm the system. When the tradeoff between the costs of fixing the failure and the damage caused by it is made, it sometimes makes sense to live with the failure without removing it. In my opinion this is a very dangerous approach because of the Broken Window Theory: once your system is broken, you tend to work less correct because – hey the system is already corrupt – why should I give my best?

Timeouts: Instead of waiting infinitely for a resource you should always consider using a timeout where possible. In Java-World, when working with locks, you can use tryLock(timeoutMillis) instead of just lock() which blocks an undefined timeframe in case the lock is held by another Thread.

Fail Fast: Some operations are expensive. In a setup where an expensive operation is requested by a client and after that a resource is required which may be unavailable it is a good idea to check the availability before computation of the expensive action. This can be implemented by a guard that checks the availability and reports it to the system that processes the client-requests. In case of unavailability reject the response or handle it in another way.

Fixing bugs

Depending on the quality of your code you’ll find yourself sitting in front of your computer more or less often fixing bugs. It doesn’t matter what kind of genius you are, sooner or later you will recognize that a program you wrote does not behave the way you expected. You can minimize the frequency with unit-tested code, code-reviews, experience and tools (like Findbugs). That will minimize the probability of producing bugs dramatically but like the 1/x-function which never touches the x-axis, the probability will never fall to zero.

Accept the “impossible” scenario

When you recognize a bug by finding Exception-stacktraces in your logfiles or users report strange, unexpected behavior, there are two possible scenarios may coming up your mind:
1.    You can identify the problem instantly because the Exception thrown tells you what happened or because of issues you had related to that bug. You know where and why the bug occurs and you know how to fix it.
2.    The other category of bug is the one that makes you think “It is impossible – that cannot be!”. A NullPointerException is thrown, where you are almost certain that Object XY can NEVER be null at this point. Impossible!!!
Resist that thought, because it is wrong, because it won’t help you fixing the problem. The program tells you right now that it failed. The impossible happened before your face!

Information is King

When facing a bug from category two and you don’t know directly where the root-cause of the bug lies, you often need to gather information. Information about the state of the objects involved for example. Don’t hesitate and add that information to your log or extend the Exception-message. If necessary make your code ugly to gain that information, deploy and execute it. It doesn’t matter – you’ll wash that code-smell away after you found the bug. Context-based logging might help here: many developers print almost anything on DEBUG-level, however when you lower the log-level in your live-environment the system will dramatically slow down under load because of IO. So you might think about context-based logging to set a log-level on a per-user-base (e.g. when one specific user faces problem XY).

Win back control

Anyways you have to communicate with your system on runtime and you should do this using JMX-API. When using JMX you can add relatively easy a MBean holding parameters you collected to hunt down the bug. Many third-party-libs also provide MBeans to see internal parameters or even change them (Hibernate-Statistics for instance). When you are using a Staged-Event-Driven-Architecture with some ExecutorServices you might be interested in the size of the task-queue or the number of running Threads: add a MBean to see their current values (or change them).

Know your enemy

All the information you gathered is useless when you can’t put them into the right context. So you have to know your system. Sounds easy. But is it? Many projects are the sum of the work of different programmers. Code from programmers that are currently in your team, code from inhouse-libraries, code from programmers that left the building before it collapsed. When joining an existent team with several thousand’s lines of code, this can really be an issue. In that case you have to communicate: ask for information, ask for help. Sadly this isn’t the end of the story. Almost every project relies on third-party libraries: IO, Database and Caching for instance. And if the worst comes to the worst it is good to know the internals of that libs also.

I can see dead people

Information can only be collected when the program is running. However when the program stops running, you have a problem. Perhaps there is a deadlock that prevents the execution of your program. Most deadlocks can be found via jstack. In short, jstack prints the stacktraces of all live threads, including held monitors and detected deadlocks. It helps you understand which monitor is held by which thread. jstack requires the process-id of your (running) Java-application. You can retrieve the PID by typing jps –m. Then you can type jstack –l PID > PATH_TO_STORE_YOUR_LOGFILE_INCLUDING_FILENAME to store the thread-dump into a file.
When a deadlock occurs, the system often hangs and cannot be shut down properly. In that case you must terminate the program regardless of the consequences. This can be done via the task-manager under windows or with the kill-command (don’t forget the -9 property) under Linux.

Last words

I’m pretty sure I didn’t cover all aspects of the topic. And, to be honest, it is impossible to do this. Because there are so many libraries, so many different ways of solving a problem, there is no magic trick that helps you fixing or avoiding bugs. However I recommend you to keep your code simple. The more complex and the more sophisticated your code gets, the more difficult it will be to fix bugs (KISS-principle). Always remember this when you’d like to reinvent the wheel again.

One does not simply fix a bug