Saturday, January 4, 2014

Deploying Node Apps on AWS EC2 with Nginx Reverse Proxy

I’m going to share my experiences in deploying the Node-based applications for production-ready environments.
Many of these are general concepts, but, different environment setups will require a little bit different configurations. For brevity, I’m going to show you some configuration examples for:
The ultimate goal of this post is to show you how to deploy, configure, and update your Node-base applications, and, eventually how to automate those repeating processes.

Why Reverse Proxy?

An event-driven, non-blocking I/O model of Node.js makes its applications perform very efficiently. But the problem is that Node based applications do not scale very well on multi-core systems because of such design natures. To address this limitation, the recent version of Node does provide the clustering feature, where the master process can fork the multiple worker processes. (See this discussion.)
Another solution available is the web reverse proxy technique. Actually, it is a better solution than the Node.js Cluster module for some reasons I will talk about. The reverse proxy-ing is more than just running multiple server applications in a single machine. You can configure it to proxy and distribute the load across multiple boxes in your internal network, but the core concept and its benefits are the same in both configurations. Here’s some of my reasons why I think web reverse proxy is better:
  • No code changes required in your application.
  • A lot simpler to configure and manage from an operational stand-point.
  • You get SSL (HTTPS) encryption and the compression (gzip) features just for free.
    • You don’t need to implement any of these by your application.
    • In case of Nginx, it’s simply a matter of adding a couple of lines to the configuration.
  • More efficient in serving the static resources.
  • You can hide all the details about your internal configurations.
Throughout this post, I’m going to use Nginx as a web reverse proxy configuration example. When it comes to reverse proxy, Nginx is known to be performing better than other web servers.

Deployment and Configuraitons

Step 1. Install Required Softwares

The first step is to install Nginx and Git command line tools. You will need to install Git command line tools only if your code is coming from Git repository like GitHub. You can install them in whichever easier for you. But, the most of Linux distributions come with a package manage utility like yum and apt-get. In Amazon Linux (or CentOS, Fedora, RHEL based distributions), you use yum to install them.
sudo yum install -y nginx git
Next thing you need to install is, of course, Node and NPM (Node package manager). This article explains how to install Node using package managers. In Aamzon Linux (or other Linux distributions where EPEL is available), you can just run yum install again.
sudo yum install -y --enablerepo=epel npm 
This is optional, but, you might also want to make sure that your application runs continuously even it’s terminated or crahsed for some reasons. There are many utilities available for Node-based applications, but, I’m going to use forever in this post.
sudo npm install -g forever

Step 2. Deploy Your Application

One of the cleaneset and easiest way of deploying your applications is to install them through the NPM. You write code and publish it to NPM repository. If your application is not open source, you can have your own private NPM repository. When the NPM-based deployment is possible, all you need to do is:
[sudo] yum install -g your_app
Also note that you can have Git URL based dependencies in NPM packages, which is useful when your code is on Git repository but not published on NPM.
If you don’t want to use NPM for application deployment for some reasons (e.g. it’s not public; you don’t private NPM), you can still deploy your code manually. Here’s an example Bash script that install or update your codes from GitHub repository.
SRC_ROOT=/usr/local/src
SRC_DIR=$SRC_ROOT/myapp
if [ -d "$SRC_DIR" ]; then 
  cd $SRC_DIR 
  git pull origin master 
  npm update
else 
  mkdir -p $SRC_ROOT 
  git clone https://github.com/yourname/myapp.git $SRC_DIR 
  cd $SRC_DIR 
  npm install
fi
A tip: if your GitHub repo is private and you have an application key, you can clone it without manually entering your credentials or the application key by running like:
git clone https://$GITHUB_KEY:@github.com/yourname/myapp.git $SRC_DIR 

Step 3. Configure Nginx

Now you need to configure Nginx to make it a good reverse proxy for your Node applications. You can configure Nginx by modifying its configuration files. The main Nginx configuration file is /etc/nginx/nginx.conf in Amazon Linux for example. FYI, changed configurations take effects after Nginx is restarted.
Here’s a minimal sample configuration file:
user nginx;
worker_processes 1;

error_log /var/log/nginx/error.log;
pid /var/run/nginx.pid;

events {
  # max_clients = worker_processes * worker_connections / 4
  worker_connections 1024;
}

http {
  include mime.types;
  default_type application/octet-stream;
  sendfile on;

  # backend applications
  upstream nodes {
    server 127.0.0.1:8000;
    server 127.0.0.1:8001;
    server 127.0.0.1:8002;
    server 127.0.0.1:8003;
    keepalive 64;
  }

  server {
    listen 80;
    server_name example.com;

    # static resources
    location ~ ^/(robots.txt|humans.txt|favicon.ico) {
      root /usr/share/nginx/html;
      access_log off;
      expires max;
    }

    # everything else goes to backend node apps
    location / {
      proxy_pass http://nodes;

      proxy_redirect off;
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Forwarded-Proto $scheme;
      proxy_set_header Host $host;
      proxy_set_header X-NginX-Proxy true;
      proxy_set_header Connection "";
      proxy_http_version 1.1;
    }
  }
}
What this sample does is:
  • Nginx listens on 80 port.
  • Four of your application instances listening on 8000 to 8003 are running behind Nginx.
  • All HTTP requests to Nginx will be redirected to one of the runnning apps, and, all HTTP responses from apps will also be redirected to Nginx. (User Req -> Nginx -> You App; You App Res -> Nginx -> User)
I’m not going to explain all the details here, but, key configurations in this sample are:
  • http.upstream
    • You are going to run 4 instances of your application.
    • They will listen on ports 8000 through 8003.
    • keepalive: the maximum number of idle connections between Nginx and your apps. See here.
  • http.server
    • listen: Nginx will listen on port 80 (HTTP default).
    • location
    • The first entry is for static files. You can modify regex to your needs.
    • The second entry is to configure all reverse-proxy settings.
      • proxy_pass specifies the upstream.
      • proxy_set_header entries are to help backend servers (your application) to understand the original requests.

Tip 1. HTTP Compression

HTTP compression is how you reduce bandwidth (and the operational costs). You can simply do support it by adding this to Nginx configuration file.
  gzip on;
  gzip_comp_level 6;
  gzip_vary on;
  gzip_min_length 1000;
  gzip_proxied any;
  gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;
  gzip_buffers 16 8k;
You need to add this under http section. See here for more Nginx compression configurations.

Tip 2. HTTPS SSL Encryption

You don’t need to implement anything for SSL encryption in your application. You just need to have Nginx to handle everything for you. And, traffic between Nginx and your applications does not need to be encrypted.
Here’s a sample configuration, which redirects all HTTP requests to HTTPS.
user nginx;
worker_processes 1;

error_log /var/log/nginx/error.log;
pid /var/run/nginx.pid;

events {
  # max_clients = worker_processes * worker_connections / 4
  worker_connections 1024;
}

http {
  include mime.types;
  default_type application/octet-stream;
  sendfile on;

  # backend applications
  upstream nodes {
    server 127.0.0.1:8000;
    server 127.0.0.1:8001;
    server 127.0.0.1:8002;
    server 127.0.0.1:8003;
    keepalive 64;
  }

  # rediect HTTP to HTTPS
  server {
    listen 80;
    server_name example.com;
    rewrite ^ https://$server_name$request_uri? permanent;
  }

  server {
    listen 443 ssl;
    ssl_certificate /path/to/your/cert.file;
    ssl_certificate_key /path/to/your/cert.key;
    ssl_protocols SSLv3 TLSv1;
    ssl_ciphers HIGH:!aNULL:!MD5;
    server_name example.com;

    # static resources
    location ~ ^/(robots.txt|humans.txt|favicon.ico) {
      root /usr/share/nginx/html;
      access_log off;
      expires max;
    }

    # everything else goes to backend node apps
    location / {
      proxy_pass http://nodes;

      proxy_redirect off;
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Forwarded-Proto $scheme;
      proxy_set_header Host $host;
      proxy_set_header X-NginX-Proxy true;
      proxy_set_header Connection "";
      proxy_http_version 1.1;
    }
  }
}
The only differences from the first sample are that now you have 2 server sections:
  • The first one configures Nginx to redirect all HTTP requests to HTTPS.
  • The second one is HTTPS server setup.
    • You need to specify the location of SSL certificate and key files in ssl_certificate and ssl_certificate_key respectively.
See here for more Nginx SSL configurations.

Step 4. Start Applications

Assuming that your application takes a command line argument for its listening port number, you start up the instances by running node command or your own command if your application was deployed using npm -g install.
node yourapp.js 8000
node yourapp.js 8001
node yourapp.js 8002
node yourapp.js 8003
If you want to make sure your application instance restarts automatically when it crashes or be terminated for some reasons, you can do so by using forever utility.
LOG_DIR=/var/log/nodes
MAX_RETRY=5
forever start -a -l $LOG_DIR/8000/fv.log -o $LOG_DIR/8000/out.log -e $LOG_DIR/8000/err.log -m MAX_RETRY app.js 8000
forever start -a -l $LOG_DIR/8001/fv.log -o $LOG_DIR/8001/out.log -e $LOG_DIR/8001/err.log -m MAX_RETRY app.js 8001
forever start -a -l $LOG_DIR/8002/fv.log -o $LOG_DIR/8002/out.log -e $LOG_DIR/8002/err.log -m MAX_RETRY app.js 8002
forever start -a -l $LOG_DIR/8003/fv.log -o $LOG_DIR/8003/out.log -e $LOG_DIR/8003/err.log -m MAX_RETRY app.js 8003
Here LOG_DIR is where all log files will be output, and, MAX_RETRY is the maximum number of restarts.

Step 5. Start Nginx

In Amazon Linux (and some other Linux distributions), you can start, stop, or restart Nginx using service command.
sudo service nginx start
sudo service ngins stop
sudo service nginx restart
And, if you want to make Nginx starts up every time the machine restarts, you can do so using chkconfig command.
sudo chkconfig nginx on
Deployment and configuration is complete by now, and, at this point, both Nginx and your apps are up and running.
To avoid repeating all these manual steps every time you bootstrap new boxes or update the code, we need to automate things as much as possible.

Automation

This is a sample Bash script to bootstrap and update your applications. This assumes that your application code is available at GitHub.
#!/bin/sh

info() { echo "INFO: $1"; }
die() { echo "ERROR: $1. Aborting!"; exit 1; }

SRC_ROOT=/usr/local/src
SRC_DIR=$SRC_ROOT/{{ proj_name }}
STATIC_DIR=/usr/share/nginx/html

# stop node apps if they're running
info 'Stopping running applications and Nginx'
forever stopall 2> /dev/null
service nginx stop 2> /dev/null

# install dependencies
info 'Updating yum repos and install dependencies.'
yum update -y
yum install -y --enablerepo=epel git nginx npm || die "failed to install dependencies."

# install forever if not installed
if ! type forever > /dev/null 2>&1; then
    info 'Installing "forever" node module.'
    npm install forever -g
fi

# deploy app code
if [ -d "$SRC_DIR" ]; then
 info "Updating source code."
 cd $SRC_DIR
    git pull origin master
    npm update
else
    info "Installing source code."
    mkdir -p $SRC_ROOT
 {% if app_token %}
 git clone https://{{ app_token }}:@github.com/{{ user_name }}/{{ proj_name }}.git $SRC_DIR
    {% else %}
    git clone https://github.com/{{ user_name }}/{{ proj_name }}.git $SRC_DIR
 {% endif %}
 cd $SRC_DIR
    npm install
fi

# configure and start nodes
{% for p in split(ports, ",") %}
mkdir -p /var/log/nodes/{{ p.value }}
forever start -a \
    -l /var/log/nodes/{{ p.value }}/fv.log \
    -o /var/log/nodes/{{ p.value }}/out.log \
    -e /var/log/nodes/{{ p.value }}/err.log \
    --minUptime=10000 --spinSleepTime=1000 \
    -m {{ max_retry }} \
    app.js {{ p.value }}
{% endfor %}

# setup and start reverse proxy
info "Starting up Nginx."
curl -L "https://gist.sh/8252838?ports={{ ports }}" > /etc/nginx/nginx.conf
chkconfig nginx on
service nginx start
The source is available at here.
Also note that this is not just a Bash script file. It has some Gist Script tags ({{ ... }}, {% for ... %} for example), which allows us to re-use the same script with different parameters depending on the requirements.
So, whenever you have to bootstrap a new box or to update existing boxes, all you need to do is to just run this:
curl -L "https://gist.sh/8252799?ports=8000,8001,8002,8003&user_name=d5&proj_name=helloworld.js&max_retry=5" | sudo sh
Then, the script will:
  • Try to stop your apps and Nginx if they’re running.
  • Install all required softwares: Git, Nginx, Node and NPM.
  • Install forever utility if needed.
  • Deploy your code from GitHub.
  • Configure and start up your apps.
  • Start up Nginx.
Basically this script just follows almost the same steps above.

Tip 1. Automation Parameters

Using Gist Script, you’re parameterizing the script. The sample script uses the following variables.
  • ports: port numbers that you want to use for your app instances. "8000,8001,8002,8003" means we want to run 4 instances on port number 8000 to 8003.
  • user_name and proj_name: your GitHub user name and project name.
  • max_retry: the maximum number of restarts for your apps.
  • app_token: not used in example above, but, you can use this param to pass your GitHub application token. This will be needed in case your repo is private.
The sample script shown here is just a demo. You can have a lot more flexibility if you want to. Check out Gist Script for more details.

Tip 2. Using Private GitHub Repo

If you try to deploy your private repo without app_token parameter, you will be prompt to enter your GitHub credential. Then, you can manually enter your user name and password, or, preferably, you can use your GitHub application key for this deployment purpose. If you don’t have one, you can easily create it in GitHub Settings. Just create a new ‘Personal Access Token’.
Once you get your app token, now you can use app_token parameter to deploy the private repos. But, one remaining security concern there is that the URL will expose your app token like this:
curl -L "https://gist.sh/8252799?ports=8000,8001,8002,8003&user_name=d5&proj_name=helloworld.js&max_retry=5&app_token=09b7808837164e673701d471957c5195e501bb96" | sudo sh
which is never a secured way.
You can address this kind of concerns by using permalink feature of Gist Script. Whenever you make a call to Gist Script, you can a unique permalink in HTTP response (field name: GS-Permalink), which you can use to get the same output in the future.
$ curl -i "https://gist.sh/8252799?ports=8000,8001,8002,8003&user_name=d5&proj_name=helloworld.js&max_retry=5&app_token=09b7808837164e673701d471957c5195e501bb96"
HTTP/1.1 200 OK
Server: nginx/1.4.3
Date: Sat, 04 Jan 2014 10:09:50 GMT
Content-Type: text/plain
Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Encoding
GS-Source-Location: https://gist.github.com/d5/8252799/raw
GS-Source-ETag: "c0f44b9c4c7c32cd75fd90aad34cd1f4"
GS-Source-Status-Code: 304
GS-Source-Content-Type: text/plain; charset=utf-8
GS-Permalink: /p/9490109c4082fcdee7b42ec1a277a9eabd1634ad
Actual permalink in your run can be different.
In this example, we got /p/9490109c4082fcdee7b42ec1a277a9eabd1634ad. So if you try to open https://gist.sh/p/9490109c4082fcdee7b42ec1a277a9eabd1634ad without any input parameters, you will get the same output script. Permalink will help you to hide your source and input parameters.