Think about it

... Or not

0 notes & Comment

statsd and Ubuntu Server 12.10

Following my previous post on setting up Graphite for Ubuntu here another tutorial on how to install statsd, a NodeJS application that can listen for UDP stream and foward it to Graphite.

Installing statsd

apt-get install node npm git
cd /opt/
git clone git://github.com/etsy/statsd.git
cd statsd/
cp exampleConfig.js localConfig.js

Edit “/opt/statsd/localConfig.js”, change graphiteHost to “127.0.0.1”.

Edit “/opt/graphite/conf/storage-schemas.conf” and add at the beginning:

[stats]
priority = 110
pattern = ^stats.*
retentions = 10s:6h,1m:7d,10m:1y

Edit “/opt/graphite/conf/storage-aggregation.conf” and replace everything by:

[min]
pattern = \.lower$
xFilesFactor = 0.1
aggregationMethod = min

[max]
pattern = \.upper$
xFilesFactor = 0.1
aggregationMethod = max

[sum]
pattern = \.sum$
xFilesFactor = 0
aggregationMethod = sum

[count]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum

[count_legacy]
pattern = ^stats_counts.*
xFilesFactor = 0
aggregationMethod = sum

[default_average]
pattern = .*
xFilesFactor = 0.3
aggregationMethod = average

PS: You will need to restart carbon (/etc/init.d/carbon restart)

Using statsd as a service

We’re going to use upstart and monit for using statsd as a service while being able to monitor it if it crash for any reason.

First we need to make sure that we have upstart and monit

apt-get install upstart monit

Create this file “/etc/init/statsd.conf” and add:

#!upstart
description "Statsd node.js server"
author "Nicolas"

start on startup
stop on shutdown

script
export HOME="/root"

echo $$ > /var/run/statsd.pid
exec sudo -u www-data /usr/bin/nodejs /opt/statsd/stats.js /opt/statsd/localConfig.js >> /var/log/statsd.log 2> /var/log/statsd.error.log
end script

pre-start script
# Date format same as (new Date()).toISOString() for consistency
echo "[`date -u +%Y-%m-%dT%T.%3NZ`] (sys) Starting" >> /var/log/statsd.log
end script

pre-stop script
rm /var/run/statsd.pid
echo "[`date -u +%Y-%m-%dT%T.%3NZ`] (sys) Stopping" >> /var/log/statsd.log
end script

You can now start/stop your statsd server with those commands

start statsd
stop statsd

We now need to setup monit, for this create this file “/etc/monit/conf.d/statsd” and add

#!monit
set logfile /var/log/monit.log

check process nodejs with pidfile "/var/run/statsd.pid"
    start program = "/sbin/start statsd"
    stop program  = "/sbin/stop statsd"

And then you can restart monit “/etc/init.d/monit restart”

You can now ping statsd via UDP on the port 8125 which will forward it to Graphite.

In the next post I will show you how to use statsd and Graphite.

Source

Tags: nodejs graphite ubuntu statsd graph monit

0 notes & Comment

Graphite and Ubuntu Server 12.10

Graphite is an amazing tool for visualizing all kind of data. I would advise you to read this post from Etsy (codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything) if you want to know what can you do with graphite.

This tutorial shows you how I installed Graphite on an Ubuntu Server 12.10.

First of all Graphite is built in Python on top of the framework Django and the application itself has 3 main parts:

  • Graphite-web, which generate graphs and handle dashboard, etc.
  • Carbon, which is a storage engine that aggregate data and store it in specialized database
  • Whisper, which is a specialized database

Moreover you will need a Webserver like apache to run the web app, you could be using anyserver you want, then if your application is node made in Python you will need a bridge-app that will relay any data from your application into Carbon, for this I used statsd (a simple nodeJs application).

Pre-Requirements

I would switch to a root user (sudo su) from now, or you can put sudo infront each command.

apt-get update
apt-get install --assume-yes apache2 apache2-mpm-worker libapr1 libaprutil1 libaprutil1-dbd-sqlite3 libapache2-mod-wsgi libaprutil1-ldap memcached python-cairo python-cairo-dev python-django python-ldap python-memcache python-pysqlite2 sqlite3 ssh libapache2-mod-python python-setuptools build-essential python-dev

easy_install zope.interface
easy_install twisted
easy_install txamqp
easy_install django-tagging

cd /root/
wget http://launchpad.net/graphite/0.9/0.9.10/+download/graphite-web-0.9.10.tar.gz
wget http://launchpad.net/graphite/0.9/0.9.10/+download/carbon-0.9.10.tar.gz
wget http://launchpad.net/graphite/0.9/0.9.10/+download/whisper-0.9.10.tar.gz
tar -zxvf graphite-web-0.9.10.tar.gz
tar -zxvf carbon-0.9.10.tar.gz
tar -zxvf whisper-0.9.10.tar.gz
rm carbon-0.9.10.tar.gz
rm graphite-web-0.9.10.tar.gz
rm whisper-0.9.10.tar.gz

Setting-up Carboon, Whisper and Graphite-web

cd /root/whisper-0.9.10/
python setup.py install

cd /root/carbon-0.9.10/
python setup.py install

cd /opt/graphite/conf
cp carbon.conf.example carbon.conf
cp storage-schemas.conf.example storage-schemas.conf
cp storage-aggregation.conf.example storage-aggregation.conf

This part will install graphite-web

cd /root/graphite-web-0.9.10/
// make sure dependencies are met
python check-dependencies.py
python setup.py install

// Copy wsgi script for apache
cp /opt/graphite/conf/graphite.wsgi.example /opt/graphite/conf/graphite.wsgi
cd /opt/graphite/webapp/graphite/
cp local_settings.py.example local_settings.py

Then edit “/opt/graphite/webapp/graphite/local_settings.py”, i did:

  • Set timezone
  • Set memcache servers (to 127.0.0.1 in my case)
  • Set database, in my case just uncomment the one for Django 1.2

Now we can initialize the DB:

cd /opt/graphite/webapp/graphite/
python manage.py syncdb
// create a user
// set a password

If this last step went through without any error, that mean you deserve a snack.

Setting-up Apache 2

cd /opt/graphite/examples/
cp example-graphite-vhost.conf /etc/apache2/sites-available/graphite
cd /etc/apache2/sites-enabled/
ln -s ../sites-available/graphite graphite

I would edit “/etc/apache2/sites-available/graphite”, and change the following:

  • Servername to something you want
  • Logs to:
ErrorLog /var/log/apache2/graphite.error.log
CustomLog /var/log/apache2/graphite.access.log common
  • "WSGISocketPrefix run/wsgi" into "WSGISocketPrefix /var/run/apache2/wsgi" very important or you will get an error "Unable to connect to WSGI daemon process ‘graphite’"

You now need to change permission on the storage or you web server won’t work you can do:

chown -R www-data:www-data /opt/graphite/storage/
// Restart apache
service apache2 restart

Starting Carbon as a service

Carbon need to be started in order to save data into it, you can start carbon in command line like this:

cd /opt/graphite
sudo ./bin/carbon-cache.py start

However this is not efficient, since if you reboot you server you will have to start carbon manually. To fix this we can create a service script.
Create a new file “/etc/init.d/carbon” and add the following into it:

#! /bin/sh
# /etc/init.d/carbon

# Some things that run always
touch /var/lock/carbon

GRAPHITE_HOME=/opt/graphite
CARBON_USER=www-data

# Carry out specific functions when asked to by the system
case "$1" in
start)
echo "Starting script carbon "
su $CARBON_USER -c "cd $GRAPHITE_HOME"; su $CARBON_USR -c "$GRAPHITE_HOME/bin/carbon-cache.py start"
;;
stop)
echo "Stopping script carbon"
su $CARBON_USER -c "cd $GRAPHITE_HOME"; su $CARBON_USR -c "$GRAPHITE_HOME/bin/carbon-cache.py stop"
;;
*)
echo "Usage: /etc/init.d/carbon {start|stop}"
exit 1
;;
esac

exit 0

Then change it’s permission to executable (“chmod 777 /etc/init.d/carbon”) and also register this script with Ubuntu “update-rc.d carbon defaults”. Now you can start carbon with this command line “/etc/init.d/carbon start” and if you server reboot, Carbon will be started automatically.

Testing

Finally you should be able to access Graphite Dashboard at the following address “http://yourServerName/” where you should be able to access the basic Graph that Carbon generate for itself.

You can also run this command “python /opt/graphite/examples/example-client.py” which will send some data to Graphite about your system.

In my next post I will show you how to install Statsd and how to use it with your web application.

Source

Tags: graphite nodejs ubuntu server python

1 note & Comment

Synology and PHP

I was bored during a weekend and decided to consume Synology's API in PHP.

Synology is a company that offers NAS solution for Personal use or for Large business. I love their products, I have a two bay NAS where I store most of my important files. It’s also a perfect Media Storage. Their software (DiskStation) is the best I’ve used so far, you can control anything on your NAS, download other Apps and even have direct shell access via SSH.

In their latest version of DiskStation they offer a Web REST API that allow you to connect to the NAS and do all sort of things including managing other “apps”. I was really interested into managing Download Station and here we go, I created a simple Library that do just that. You can find this library on Github here: https://github.com/zzarbi/synology

Tags: synology php library

0 notes & Comment

Let’s Hadoop now

I’ve been playing with Hadoop for few weeks now. Hadoop is an open source Apache product inspired by Google MapReduce and Google File System. It allows your application to work with thousand of nodes and petabytes of data. It’s written in Java and run on commodity hardware. I’m not going to write a tutorial but I will tell you how did I start.

Here a list of the few tutorial that I used:

Basically I followed the tutorial of Michael G. Noll but I use the Cloudera packages for debian/ubuntu. Few thing you need to know:

  • When you use the Cloudera package it will automatically create the user hdfs and the group hadoop.
  • Hadoop instantiate a lots of servers such as the namenode, those servers are bind to the IPs of their lookup name. Which mean that if your namenode is called “namenode01” and that it’s assign to 127.0.0.1 when Java will spawn the server, it will listen exclusively on this IP. It’s imperative to assign this lookup name to the external IP.
  • Namenode is the single point of failure of HDFS but one thing I didn’t get until reading the full documentation of Hadoop, is that the namenode has one file that maps every files block on the cluster. If you loose this file it doesn’t matter how many nodes you have or how redundant there are, you will loose everything.
  • The config file /etc/hadoop/conf/master doesn’t designate the master. This file actually designate the secondary namenode which is not a slave or a backup to the namenode either. The master is the local machine you use to start Hadoop.
  • HDFS will use any byte available on the system. In the config file hdfs-site.xml you need to define “dfs.datanode.du.reserved" to reserve some space for the system.
  • Certain MapReduce job may died because they run out of memory, you can/should define an appropriate value by defining Java options with “mapred.child.java.opts
  • You can use bin/start-df.sh to start HDFS and use bin/start-mapred.sh to start the MapReduce service. (HDFS should start first, then MapReduce AND MapReduce should stop first then HDFS). You can also use bin/start-all.sh and bin/stop-all.sh

The main problem I ran into was to add some redundancy to the namenode. On small cluster, the namenode is running on the same server than the secondary namenode and the first datanode. Also the namenode and secondary namenode consume a lot of resources, especially memory.

The namenode distribute data accross your cluster, he also take care of redundancy of the data. If one datanode goes down and that you have a redundancy set to 2. The namenode will replicate the data lost from datanodes (where the data is not lost) to a datanode available to keep that redundancy. The namenode does that by keeping record of every action in two files, called fsImage and edits. fsImage is the current file namespace and edits contain every modification to the current file namespace. For performance optimization the namenode will merge those two files only when it starts. Which mean that if you never restart you namenode, the edits file will grow large and at the next restart it will take some time to merge those two files. That’s why the secondary namenode exist, it will contact the namenode every 1 hour (this is configurable), retrieve those two files, merge them and return them back to the namenode. Merging those two files are resource extensive on large cluster, that’s why you should run this services on a different server than the namenode. The secondary namenode doesn’t backup anything, also it will note replace the namenode if the namenode fail.

You can you can tell the namenode to save those two files to different location (another disk or a remote disk), by defining “dfs.name.dir” in hdsf-site.xml like so:

	<property>
		<name>dfs.name.dir</name>
		<value>/data/dfs/name/,/disk2/backup/name,/mnt/nfs/backup/name</value>
	</property>

In my case I configured my secondary namenode on a server with the same specification of the namenode and in its hdfs-site.xml I defined “dfs.name.dir” to use the remote backup and I configured the master/slaves files. If the namenode goes down, I still have to manually turn on the secondary namenode into a namenode but in this case everything is ready (Caution: You will need to transfer the IP of the namenode to the secondary namenode).

Finally as a quick band-aid by reading the documentation I found out that you can download those two files from the namenode via its webservice, you can get fsImage with “http://NAMENODE:50070/getimage?getimage=1" and edits with “http://NAMENODE:50070/getimage?getedit=1”. I wrote a bash script to retrieve those files and back them up on each datanode at different time of the day, at least you wont loose everything.

I hope this will help you start to setup your Hadoop cluster, one last thing you could do is to read the Hadoop documentation.

0 notes & Comment

4 Basic features Netflix should add asap

From all the online streaming services Netflix is the one I use the most, in the same time it’s the one that frustrate me the most. Here my list of feature that they should add asap:

A real search engine:

Indeed right now you can do a basic search that return movies but it doesn’t return real results… What do I mean? well if you search “robots” it will return a list of movies with “robots” in the title and if each of those movie are not available it will give you two similar titles that have little to nothing to do with what you search. There is no way to search by tags, by categories or by date. Also no way to order by ratings. Search by date is really important when I want to search for new content.

Watch preview/trailer:

Actually I had to browse around until I found one title that actually had a trailer. Since they store every movie why don’t they store a tiny piece of each? Or just a YouTube link to the actual movie trailer… Or why not partnered with IMDB? I mean right now it’s what I manually do anyway.

Order comments:

Lets say I found a movie or a documentary. Netflix have lots of comments, you can rate the movie and rate the comments but there is no way to order them by ratings or by comment helpfulness. Amazon is doing a great job with that!

New arrival ?:

On my home page there is two new list “Recently Added” and “New Released”. As a developer I can see that both are redirected to the same script called “newReleases” but with different parameters. One page just get a list ordered by date added (I guess) and the second one ordered by released date (I guess, because it doesn’t seems to be the case). Both of those lists have nothing to do with what normal people consider new content, at least if they are supposed to be new content, those list are bugging. In “Recently added” I have a movie from 1993 and in “New released” I have movie from 2009.

Those are not big features from a UI standpoint, a product standpoint or a development standpoint. The search could use a solr index with a cache layer. The trailer part could be easily plug with IMDB. Ordering comments can however be quite complicated depending on how they implemented. The last feature is the easiest to implement since I think they are just ordering out of a wrong field.

1 note & Comment

PHP, Node.JS, Mysql and Mongo

I’ve spent few weeks looking at performance between Apache/PHP vs Node.JS and Mysql vs MongoDB.

All test were run on my local computer with ApacheBench :

- Intel Core 2 Duo E8500 @ 3.16GHz

- 8Gb of Ram

- Ubuntu 11.10 Desktop 64bits

- Apache 2.2.20

- PHP 5.3.6

- Node.JS 0.6.6

- Mongodb 1.8.2

- Mysql 5.1.58

The first graph comparing PHP and Node.JS is based on a simple return “Hello world” application. You can see that Node.Js perform a bit better than Apache/PHP which event fail to give any result in high concurrency test (PS: There is actually a lots of good request but too much of them failed to get any accurate reading). This test is just a brute force representation on the capability of Node.js.

The second graph represent the performance between PHP/Mysql vs PHP/Mongodb vs Node.Js/Mysql vs Node.js/Mongodb. This is a basic application which just retrieve 1 random row out of the database of 100,000 rows and return “Good” on success. You can quickly notice that the degradation of the performance of Node.js when you have to connect to a database at each server request but if we keep the connection open, Node.Js will perform better. I will have to to the same test with PHP and a persistent connection.

Obviously those test for Mongo vs Mysql were made to test a specific use case. I didn’t test performance when retrieving more than one row, or ordering a set of rows or querying a field without index or even looking through a 100Million rows.

I did test writing in MonogB compare to writing in MySQL. For example writing 10,000 rows in MongoDB took 0.6 second where the same code with MySQL took 7 minutes (I was inserting one row at the time). MongoDB end-up being faster here because it’s actually not writing to the disk right away.

Those technologies should be use accordingly, MongoDB is really fast in writing, so you could use it for storing user activity on a social network or login page views. Node.JS is really fast to handle a request, some people even use Node.JS as a load balancer.

Source code:

0 notes & Comment

Hollywood need to grow up

First of all I’m going to go quickly through all the current solution:

Cable:

Well everybody know that you pay $100 for maybe 5 channels that you would like to watch and they package it with lots of garbage that you would never watch.

Digital TV:

Yes you can still plug an antenna in recent TVs and get a digital signal. I leave in Hollywood and I have only 2 working channel. I don’t how how to explain that but it’s pretty bad.

Netflix:

I use it when I don’t know watch to watch but I’m in the mood to watch something. I just turn it on, browse until I find something. Sometimes if I know watch I want, I’ll check to see if it’s available but most of the time it’s not available at least not in streaming mode.

Hulu:

This one is a pretty good joke. You can either watch TV show for free with lots of advertising, right after they are broadcast on TV or you can pay a monthly fee and watch the same TV show with the same amount of advertising, only difference this time you can use your TV instead of your computer screen.

DVDs:

If you don’t care about quality, this is still a pretty good way to watch a movie. You need a DVD player which you can get for really cheap nowadays.

Bluray:

You need an updated Bluray player, which is pretty hard to get if you didn’t buy a PS3 (PS: Just buy a PS3). I had a player for 3 days until I couldn’t update it to watch a new movie, so I returned it. Also I could use my computer which has an HDMI output with a Full HD cable (whatever that means…) and my Full HD TV. In theory it should work, in practice it’s a nightmare. I end up buying a software that remove protection on the Bluray so I can actually watch it on my TV.

Theaters:

The best choice according to me but I won’t pay $12 for any kind of movie, also you always end-up spending over $25.

iTunes:

Fairly cheap TV show and movies but stuck with DRM and Apple codec, which mean if I buy a media on iTunes, I can’t watch it on my TV except if I have a Apple TV or I’m willing to plug my computer to my TV.

Amazon:

Same thing that iTunes, you need a device that can connect to it.

Ultraviolet:

UV is supposed to be the answer to the crisis, I think the idea is pretty good. It’s really close to what I’m thinking about. First of all it’s limited to newly acquired media and there is not a lot of device than can work with it (Actually just the Flixter app).

Illegal downloads/Streamings:

The Antichrist of Hollywood. There was a survey not so long ago saying that every body under 25 most likely downloaded at least one illegal movie, mp3 or TV show.

But let’s be honest, this is the most convenient way to consume media. It’s on demand, fairly cheap (Still have to pay for Internet), large directory. I’m not saying you should download, I’m just saying it exist and it’s pretty hard to ignore it as a “solution”. Personally I hate streaming, it don’t even know how people consider that as viable solution.

The problem is that all legal solutions are really inconvenient. We are supposed to be in the digital age, everything is supposed to be connected but for some reason Hollywood doesn’t want to grow up. Today under my TV there is only one box, it’s a Western Digital player which has access to Hulu, Netflix, my local network and even Youtube.

This box however doesn’t read DVD or Bluray, so all my collection of DVDs/Bluray is pretty useless and right now I have to rip/encode those DVDs one by one so I can watch them on my TV, but this is still illegal. Indeed in US and most developed countries, it is illegal to bypass a security system such as the one installed on DVDs and Bluray. The thing I don’t understand is at the beginning of the age of Digital Music, it wasn’t illegal to rip a CD, actually even software like iTunes were doing it. Why does it need to be different with videos? You can even rip of your olds records but it’s illegal to do so with a DVD.

I dream of a day where I can just type/scan the bar-code of my DVD collection and make them available for download or streaming to any device that I have. I literally want iCloud for videos! iCloud is a new service from Apple that allow you to sync your local music library to Apple cloud storage. You can then access all your music from Apple device (*only inconvenient). Amazon and Google have a similar product.

In the same time Hollywood need a new way to survey its viewers. There should be a website where I can go and officially vote for a TV Show or why not use some donation system like flattr.com or use the good old text messaging system to vote. Anything would be better than just surveying 300people per city.

0 notes & Comment

The Facebook effect

I’m sure that you all know how Facebook improve its product:

  • "Find" Idea
  • Implement
  • Deploy
  • "Check user reaction"
  • Possible rollback (but most likely not)

Facebook is using this “process” since the beginning, we all know about it because they either change their design, or some privacy control, or totally change one part of the product (Message) or not so long ago change the chat and you hated it. This process allow quick implementation of potential great idea but Facebook is so big and still growing that they can afford to loose few people, who will probably come back anyway since Facebook is/was a unique product.

So in some sense you can say that they care more about the product than their users. In some sense because the original point of this shortcut to production is to add value to the product.

The Facebook effect is when you have an user user say something like “Wow this new feature/design is really bad, please bring back the old one" and a week to a few days to an hour later the same user will totally forgot why he wasn’t happy. I’m not quite sure if this is because the change was actually a genius idea, luck or just the user getting tired of complaining… This is not the point, the point is Facebook did it for a long time and got away with it.

The problem is now a other companies are envious. They think they can do the same thing without doing any test, or checking if it’s a useful feature, or checking statistics, or even listening to their user after deployment. They think that they can copy Facebook process and that the user will definitely react but get over it after (This is actually an argument that I heard). There is a saying like “You don’t have to create what the user want, you have to create something that the user doesn’t know he wants”. This is quite dangerous when you actually already have a business, just look at Digg (Reddit on Digg) or the new design of Gizmondo (Lost 15%).

Yes there is a possibility that this could work but there is more example of it failing. Facebook is an exception, it’s what help them grown bigger than MySpace but now that there is new competitors like Google+, they cannot afford it except for tiny tweaks. Moreover I’m not sure that Facbook decide on those features without actually having at least some number backing them up. Their latest feature called Timeline was in beta-test for months and now that it’s out you actually have to opt-in.

Anyway there is different way of improving your products without alienating users such as A/B testing and statistics, you don’t have to bet your company each time you want to improve something.

0 notes & Comment

A new Mobile age in France

Few month ago after a trip in France, I wrote US lags in broadband services. I explain how the French government stimulated the market by poking competition with a stick or a carrot (which ever picture your prefer).

I talk about free.fr one of the top 3 French ISPs and how they remodeled the industry in 2002, also that they were working on they own Phone Network which should launch at the beginning of 2012.

Well this day was today Tuesday 10th, Free unveiled there new offer to the rest of the world. Actually just to France, since I didn’t see this news on any other tech blog who are probably busy with CES 2012.

I watched the press conference (in French) where funder Xavier Niel explain why they had to create a new phone operator and how his competitor were abusing their position. He listed a few points against them such as:

  • Contract for at least one year but in general two years (even when not buying a new phone)
  • Long and really complex contract (plenty of hidden cost)
  • Too many offers and too complex
  • Fees to call European country
  • Expansive text/media messages
  • Limited access to the internet (just mail and web) and limited in quantity (1gb)

A plan with Unlimited call nation wide, Unlimited Text message (extra cost for media message), 1Gb of limited internet cost in France today between 49.99Euros and 85Euros.

Solutions of Free to the previous problem are:

  • Month to month contract
  • One single page contract
  • One unique plan
  • Unlimited call to 40 destinations (Including USA/Canada)
  • Unlimited text/media messages
  • Unlimited access to the internet (voip, p2p, newsgroup, …) until 3Gb

And this for only 19.99Euros ($28) twice cheaper than the cheapest competitor and 4 times as cheap as the two major competitor. If you already have Free as an Internet Provider the price of the plan drop to 15.99Euros ($22).

Few month ago the French government decided that there was a need for a cheap plan for the lower class, because they saw the phone as a necessity. After talking to the 3 operator they came out with a 50minutes, and 50 text message for 10euros ($14) per month. Well Free decided that this was “bullshit” (literally) and offer now 1 hour and 60 text message for 2Euros ($2.8) per month.

The only inconvenience with the Free offer according to the first critics is that they do not subvention phones. You will have to buy the phone you want the full price which for an iPhone 4S is around 629euros. However you have the possibilities to pay in 12, 24 or 36 months. Even with this Free is still way cheaper than its competitor.

Right after this event competitors rushed over Twitter and other social network to announce that they will have new plans really soon but this time the customer will be the one to decide what the market price should really be.

I really wish that the US wakes up to shake its own market.