Saturday, January 4, 2014

Deploying Node Apps on AWS EC2 with Nginx Reverse Proxy

I’m going to share my experiences in deploying the Node-based applications for production-ready environments.
Many of these are general concepts, but, different environment setups will require a little bit different configurations. For brevity, I’m going to show you some configuration examples for:
The ultimate goal of this post is to show you how to deploy, configure, and update your Node-base applications, and, eventually how to automate those repeating processes.

Why Reverse Proxy?

An event-driven, non-blocking I/O model of Node.js makes its applications perform very efficiently. But the problem is that Node based applications do not scale very well on multi-core systems because of such design natures. To address this limitation, the recent version of Node does provide the clustering feature, where the master process can fork the multiple worker processes. (See this discussion.)
Another solution available is the web reverse proxy technique. Actually, it is a better solution than the Node.js Cluster module for some reasons I will talk about. The reverse proxy-ing is more than just running multiple server applications in a single machine. You can configure it to proxy and distribute the load across multiple boxes in your internal network, but the core concept and its benefits are the same in both configurations. Here’s some of my reasons why I think web reverse proxy is better:
  • No code changes required in your application.
  • A lot simpler to configure and manage from an operational stand-point.
  • You get SSL (HTTPS) encryption and the compression (gzip) features just for free.
    • You don’t need to implement any of these by your application.
    • In case of Nginx, it’s simply a matter of adding a couple of lines to the configuration.
  • More efficient in serving the static resources.
  • You can hide all the details about your internal configurations.
Throughout this post, I’m going to use Nginx as a web reverse proxy configuration example. When it comes to reverse proxy, Nginx is known to be performing better than other web servers.

Deployment and Configuraitons

Step 1. Install Required Softwares

The first step is to install Nginx and Git command line tools. You will need to install Git command line tools only if your code is coming from Git repository like GitHub. You can install them in whichever easier for you. But, the most of Linux distributions come with a package manage utility like yum and apt-get. In Amazon Linux (or CentOS, Fedora, RHEL based distributions), you use yum to install them.
sudo yum install -y nginx git
Next thing you need to install is, of course, Node and NPM (Node package manager). This article explains how to install Node using package managers. In Aamzon Linux (or other Linux distributions where EPEL is available), you can just run yum install again.
sudo yum install -y --enablerepo=epel npm 
This is optional, but, you might also want to make sure that your application runs continuously even it’s terminated or crahsed for some reasons. There are many utilities available for Node-based applications, but, I’m going to use forever in this post.
sudo npm install -g forever

Step 2. Deploy Your Application

One of the cleaneset and easiest way of deploying your applications is to install them through the NPM. You write code and publish it to NPM repository. If your application is not open source, you can have your own private NPM repository. When the NPM-based deployment is possible, all you need to do is:
[sudo] yum install -g your_app
Also note that you can have Git URL based dependencies in NPM packages, which is useful when your code is on Git repository but not published on NPM.
If you don’t want to use NPM for application deployment for some reasons (e.g. it’s not public; you don’t private NPM), you can still deploy your code manually. Here’s an example Bash script that install or update your codes from GitHub repository.
SRC_ROOT=/usr/local/src
SRC_DIR=$SRC_ROOT/myapp
if [ -d "$SRC_DIR" ]; then 
  cd $SRC_DIR 
  git pull origin master 
  npm update
else 
  mkdir -p $SRC_ROOT 
  git clone https://github.com/yourname/myapp.git $SRC_DIR 
  cd $SRC_DIR 
  npm install
fi
A tip: if your GitHub repo is private and you have an application key, you can clone it without manually entering your credentials or the application key by running like:
git clone https://$GITHUB_KEY:@github.com/yourname/myapp.git $SRC_DIR 

Step 3. Configure Nginx

Now you need to configure Nginx to make it a good reverse proxy for your Node applications. You can configure Nginx by modifying its configuration files. The main Nginx configuration file is /etc/nginx/nginx.conf in Amazon Linux for example. FYI, changed configurations take effects after Nginx is restarted.
Here’s a minimal sample configuration file:
user nginx;
worker_processes 1;

error_log /var/log/nginx/error.log;
pid /var/run/nginx.pid;

events {
  # max_clients = worker_processes * worker_connections / 4
  worker_connections 1024;
}

http {
  include mime.types;
  default_type application/octet-stream;
  sendfile on;

  # backend applications
  upstream nodes {
    server 127.0.0.1:8000;
    server 127.0.0.1:8001;
    server 127.0.0.1:8002;
    server 127.0.0.1:8003;
    keepalive 64;
  }

  server {
    listen 80;
    server_name example.com;

    # static resources
    location ~ ^/(robots.txt|humans.txt|favicon.ico) {
      root /usr/share/nginx/html;
      access_log off;
      expires max;
    }

    # everything else goes to backend node apps
    location / {
      proxy_pass http://nodes;

      proxy_redirect off;
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Forwarded-Proto $scheme;
      proxy_set_header Host $host;
      proxy_set_header X-NginX-Proxy true;
      proxy_set_header Connection "";
      proxy_http_version 1.1;
    }
  }
}
What this sample does is:
  • Nginx listens on 80 port.
  • Four of your application instances listening on 8000 to 8003 are running behind Nginx.
  • All HTTP requests to Nginx will be redirected to one of the runnning apps, and, all HTTP responses from apps will also be redirected to Nginx. (User Req -> Nginx -> You App; You App Res -> Nginx -> User)
I’m not going to explain all the details here, but, key configurations in this sample are:
  • http.upstream
    • You are going to run 4 instances of your application.
    • They will listen on ports 8000 through 8003.
    • keepalive: the maximum number of idle connections between Nginx and your apps. See here.
  • http.server
    • listen: Nginx will listen on port 80 (HTTP default).
    • location
    • The first entry is for static files. You can modify regex to your needs.
    • The second entry is to configure all reverse-proxy settings.
      • proxy_pass specifies the upstream.
      • proxy_set_header entries are to help backend servers (your application) to understand the original requests.

Tip 1. HTTP Compression

HTTP compression is how you reduce bandwidth (and the operational costs). You can simply do support it by adding this to Nginx configuration file.
  gzip on;
  gzip_comp_level 6;
  gzip_vary on;
  gzip_min_length 1000;
  gzip_proxied any;
  gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;
  gzip_buffers 16 8k;
You need to add this under http section. See here for more Nginx compression configurations.

Tip 2. HTTPS SSL Encryption

You don’t need to implement anything for SSL encryption in your application. You just need to have Nginx to handle everything for you. And, traffic between Nginx and your applications does not need to be encrypted.
Here’s a sample configuration, which redirects all HTTP requests to HTTPS.
user nginx;
worker_processes 1;

error_log /var/log/nginx/error.log;
pid /var/run/nginx.pid;

events {
  # max_clients = worker_processes * worker_connections / 4
  worker_connections 1024;
}

http {
  include mime.types;
  default_type application/octet-stream;
  sendfile on;

  # backend applications
  upstream nodes {
    server 127.0.0.1:8000;
    server 127.0.0.1:8001;
    server 127.0.0.1:8002;
    server 127.0.0.1:8003;
    keepalive 64;
  }

  # rediect HTTP to HTTPS
  server {
    listen 80;
    server_name example.com;
    rewrite ^ https://$server_name$request_uri? permanent;
  }

  server {
    listen 443 ssl;
    ssl_certificate /path/to/your/cert.file;
    ssl_certificate_key /path/to/your/cert.key;
    ssl_protocols SSLv3 TLSv1;
    ssl_ciphers HIGH:!aNULL:!MD5;
    server_name example.com;

    # static resources
    location ~ ^/(robots.txt|humans.txt|favicon.ico) {
      root /usr/share/nginx/html;
      access_log off;
      expires max;
    }

    # everything else goes to backend node apps
    location / {
      proxy_pass http://nodes;

      proxy_redirect off;
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Forwarded-Proto $scheme;
      proxy_set_header Host $host;
      proxy_set_header X-NginX-Proxy true;
      proxy_set_header Connection "";
      proxy_http_version 1.1;
    }
  }
}
The only differences from the first sample are that now you have 2 server sections:
  • The first one configures Nginx to redirect all HTTP requests to HTTPS.
  • The second one is HTTPS server setup.
    • You need to specify the location of SSL certificate and key files in ssl_certificate and ssl_certificate_key respectively.
See here for more Nginx SSL configurations.

Step 4. Start Applications

Assuming that your application takes a command line argument for its listening port number, you start up the instances by running node command or your own command if your application was deployed using npm -g install.
node yourapp.js 8000
node yourapp.js 8001
node yourapp.js 8002
node yourapp.js 8003
If you want to make sure your application instance restarts automatically when it crashes or be terminated for some reasons, you can do so by using forever utility.
LOG_DIR=/var/log/nodes
MAX_RETRY=5
forever start -a -l $LOG_DIR/8000/fv.log -o $LOG_DIR/8000/out.log -e $LOG_DIR/8000/err.log -m MAX_RETRY app.js 8000
forever start -a -l $LOG_DIR/8001/fv.log -o $LOG_DIR/8001/out.log -e $LOG_DIR/8001/err.log -m MAX_RETRY app.js 8001
forever start -a -l $LOG_DIR/8002/fv.log -o $LOG_DIR/8002/out.log -e $LOG_DIR/8002/err.log -m MAX_RETRY app.js 8002
forever start -a -l $LOG_DIR/8003/fv.log -o $LOG_DIR/8003/out.log -e $LOG_DIR/8003/err.log -m MAX_RETRY app.js 8003
Here LOG_DIR is where all log files will be output, and, MAX_RETRY is the maximum number of restarts.

Step 5. Start Nginx

In Amazon Linux (and some other Linux distributions), you can start, stop, or restart Nginx using service command.
sudo service nginx start
sudo service ngins stop
sudo service nginx restart
And, if you want to make Nginx starts up every time the machine restarts, you can do so using chkconfig command.
sudo chkconfig nginx on
Deployment and configuration is complete by now, and, at this point, both Nginx and your apps are up and running.
To avoid repeating all these manual steps every time you bootstrap new boxes or update the code, we need to automate things as much as possible.

Automation

This is a sample Bash script to bootstrap and update your applications. This assumes that your application code is available at GitHub.
#!/bin/sh

info() { echo "INFO: $1"; }
die() { echo "ERROR: $1. Aborting!"; exit 1; }

SRC_ROOT=/usr/local/src
SRC_DIR=$SRC_ROOT/{{ proj_name }}
STATIC_DIR=/usr/share/nginx/html

# stop node apps if they're running
info 'Stopping running applications and Nginx'
forever stopall 2> /dev/null
service nginx stop 2> /dev/null

# install dependencies
info 'Updating yum repos and install dependencies.'
yum update -y
yum install -y --enablerepo=epel git nginx npm || die "failed to install dependencies."

# install forever if not installed
if ! type forever > /dev/null 2>&1; then
    info 'Installing "forever" node module.'
    npm install forever -g
fi

# deploy app code
if [ -d "$SRC_DIR" ]; then
 info "Updating source code."
 cd $SRC_DIR
    git pull origin master
    npm update
else
    info "Installing source code."
    mkdir -p $SRC_ROOT
 {% if app_token %}
 git clone https://{{ app_token }}:@github.com/{{ user_name }}/{{ proj_name }}.git $SRC_DIR
    {% else %}
    git clone https://github.com/{{ user_name }}/{{ proj_name }}.git $SRC_DIR
 {% endif %}
 cd $SRC_DIR
    npm install
fi

# configure and start nodes
{% for p in split(ports, ",") %}
mkdir -p /var/log/nodes/{{ p.value }}
forever start -a \
    -l /var/log/nodes/{{ p.value }}/fv.log \
    -o /var/log/nodes/{{ p.value }}/out.log \
    -e /var/log/nodes/{{ p.value }}/err.log \
    --minUptime=10000 --spinSleepTime=1000 \
    -m {{ max_retry }} \
    app.js {{ p.value }}
{% endfor %}

# setup and start reverse proxy
info "Starting up Nginx."
curl -L "https://gist.sh/8252838?ports={{ ports }}" > /etc/nginx/nginx.conf
chkconfig nginx on
service nginx start
The source is available at here.
Also note that this is not just a Bash script file. It has some Gist Script tags ({{ ... }}, {% for ... %} for example), which allows us to re-use the same script with different parameters depending on the requirements.
So, whenever you have to bootstrap a new box or to update existing boxes, all you need to do is to just run this:
curl -L "https://gist.sh/8252799?ports=8000,8001,8002,8003&user_name=d5&proj_name=helloworld.js&max_retry=5" | sudo sh
Then, the script will:
  • Try to stop your apps and Nginx if they’re running.
  • Install all required softwares: Git, Nginx, Node and NPM.
  • Install forever utility if needed.
  • Deploy your code from GitHub.
  • Configure and start up your apps.
  • Start up Nginx.
Basically this script just follows almost the same steps above.

Tip 1. Automation Parameters

Using Gist Script, you’re parameterizing the script. The sample script uses the following variables.
  • ports: port numbers that you want to use for your app instances. "8000,8001,8002,8003" means we want to run 4 instances on port number 8000 to 8003.
  • user_name and proj_name: your GitHub user name and project name.
  • max_retry: the maximum number of restarts for your apps.
  • app_token: not used in example above, but, you can use this param to pass your GitHub application token. This will be needed in case your repo is private.
The sample script shown here is just a demo. You can have a lot more flexibility if you want to. Check out Gist Script for more details.

Tip 2. Using Private GitHub Repo

If you try to deploy your private repo without app_token parameter, you will be prompt to enter your GitHub credential. Then, you can manually enter your user name and password, or, preferably, you can use your GitHub application key for this deployment purpose. If you don’t have one, you can easily create it in GitHub Settings. Just create a new ‘Personal Access Token’.
Once you get your app token, now you can use app_token parameter to deploy the private repos. But, one remaining security concern there is that the URL will expose your app token like this:
curl -L "https://gist.sh/8252799?ports=8000,8001,8002,8003&user_name=d5&proj_name=helloworld.js&max_retry=5&app_token=09b7808837164e673701d471957c5195e501bb96" | sudo sh
which is never a secured way.
You can address this kind of concerns by using permalink feature of Gist Script. Whenever you make a call to Gist Script, you can a unique permalink in HTTP response (field name: GS-Permalink), which you can use to get the same output in the future.
$ curl -i "https://gist.sh/8252799?ports=8000,8001,8002,8003&user_name=d5&proj_name=helloworld.js&max_retry=5&app_token=09b7808837164e673701d471957c5195e501bb96"
HTTP/1.1 200 OK
Server: nginx/1.4.3
Date: Sat, 04 Jan 2014 10:09:50 GMT
Content-Type: text/plain
Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Encoding
GS-Source-Location: https://gist.github.com/d5/8252799/raw
GS-Source-ETag: "c0f44b9c4c7c32cd75fd90aad34cd1f4"
GS-Source-Status-Code: 304
GS-Source-Content-Type: text/plain; charset=utf-8
GS-Permalink: /p/9490109c4082fcdee7b42ec1a277a9eabd1634ad
Actual permalink in your run can be different.
In this example, we got /p/9490109c4082fcdee7b42ec1a277a9eabd1634ad. So if you try to open https://gist.sh/p/9490109c4082fcdee7b42ec1a277a9eabd1634ad without any input parameters, you will get the same output script. Permalink will help you to hide your source and input parameters.

41 comments:

  1. HI Daniel,
    This is a terrific post ! Have you tried deploying node.js apps for public use via REST interfaces on EC2 instances ? Do you have any pointers on how that works?


    Thanks! Mark

    ReplyDelete
  2. JUST WOW, everything I need is here in this post, really amazing job. Thanks buddy..

    ReplyDelete
  3. Following Google's disclosure of the POODLE vulnerability it is highly recommended that you not use SSLv3. Please update your article so those who don't know better aren't implementing an out-of-date and insecure protocol.

    "ssl_protocols SSLv3 TLSv1;"
    should be
    "ssl_protocols TLSv1 TLSv1.1 TLSv1.2;"

    This only affects support for IE6, an essentially ancient and unused browser in 2015.

    Thanks. :)

    ReplyDelete
  4. Hello Daniel,

    Have you configured nginx-passenger module in amazon linux ami and ubuntu 14.04 for node.js ?

    Actually using your blog I configured nginx and node.js but I want to use passenger also.

    Please can you help on this.

    Thanks!

    ReplyDelete
  5. Hey Daniel,
    I have a couple of questions.

    1. Where would an express application sit within this setup? I'm assuming an express application could be serving multiple pages that would contain static URI's.

    2. I've been thinking about the deploy step. Pulling source from GitHub is cool but there's normally another step such as a grunt build or compile. I'm thinking a release branch might be a good idea where only the minified code is pushed with the added benefit the source code never living on the production server but there's also an argument to just build on the production server. I would expect for a large project a build server would take care of these steps but interested in how others handle the final build steps.

    Thanks

    David.

    ReplyDelete
    Replies
    1. Build it somewhere else, zip the build and SFTP the zip to the EC2 instance. Unzip in your target directory, npm install and execute using Forever. You could zip up the node_modules, but I would only recommend this if you are building in a like environment (i.e. build on Linux and run on Linux).

      Same deal with Beanstalk.

      Delete
  6. After reading this blog I am very strong in this topics and this blog is really helpful to all.. Explanation are very clear so it is easy to understand.. Thanks for sharing this blog…
    AWS Training in Chennai | Python Training in Chennai | Big Data Analytics Training in Chennai

    ReplyDelete
  7. The blog gave me idea to deploy node on Amazon Web Service ec2 My sincere thanks for sharing this post and please continue to share this post
    Cloud Computing Training in Chennai

    ReplyDelete
  8. Really very informative and excellent post I had ever seen about AWS. Thank you for sharing such a wonderful blog to our vision. Learn AWS Training in Bangalore to know more details about this technology.

    ReplyDelete
  9. Replies
    1. Hi, Great.. Tutorial is just awesome..It is really helpful for a newbie like me.. I am a regular follower of your blog. Really very informative post you shared here. Kindly keep blogging. If anyone wants to become a Front end developer learn from Javascript Training in Chennai . or learn thru Javascript Training in Chennai. Nowadays JavaScript has tons of job opportunities on various vertical industry. JavaScript Training in Chennai

      Delete
  10. Needed to compose you a very little word to thank you yet again regarding the nice suggestions you’ve contributed hereAWS Training in Bangalore

    ReplyDelete
    Replies
    1. Hi, Great.. Tutorial is just awesome..It is really helpful for a newbie like me.. I am a regular follower of your blog. Really very informative post you shared here. Kindly keep blogging. If anyone wants to become a Front end developer learn from Javascript Training in Chennai . or Javascript Training in Chennai. Nowadays JavaScript has tons of job opportunities on various vertical industry. JavaScript Training in Chennai

      Delete
  11. Your good knowledge and kindness in playing with all the pieces were
    very useful. I don’t know what I would have done if I had not
    encountered such a step like this.


    AWS Training in Chennai


    AWS Training in Bangalore

    ReplyDelete
  12. The young boys ended up stimulated to read through them and now have unquestionably been having fun with these things.

    AWS Training in Bangalore|
    AWS Training in chennai|

    ReplyDelete
  13. Appreciating the persistence you put into your blog and detailed information you provide.

    It’s great to come across a blog every once in a while that isn’t the same out of date rehashed material. Fantastic read.

    AWS Training in Chennai

    ReplyDelete
  14. Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this.
    Amazon Web Services Training in Bangalore

    ReplyDelete
  15. This comment has been removed by the author.

    ReplyDelete
  16. Needed to compose you a very little word to thank you yet again regarding the nice suggestions you’ve contributed here. aws training in chennai

    ReplyDelete
  17. This comment has been removed by the author.

    ReplyDelete
  18. Nice Post i learned a lot From the Post Thanks for sharing,learn the most ON-DEMAND software Training in Best Training Institutions
    Instructor-LED Salesforce Online Training
    Professional Salesforce CRM Training
    Salesforce online training in india

    ReplyDelete