Finding Technologies

From Metalcon development
Jump to: navigation, search

Contents

Web server

I would love to see some results of benchmarking.

node.js

I am currently looking at the advantages of node.js: http://nodejs.org/about/ they refer an advanced level talk of fefe explaining their concept of non blocking I/O. There is a nice article at beginner's level explaining the concept of event driven and non blocking I/O. but I really wonder how expensive it is to set up a call back and wait for it. Also this article focues very much on Comet which is nice to have but not the reason to choose a particular technology.

There is another very nice discussion on stack overflow: when to use node.js from which I derived this excelent article and a link to an alternativ vert.x which follows the same principles as node.js but is not only javascript based (which might be nice for us)

Upshot: Right now my feeling is the metalcon like button which creates a ton of http requests per second could probably be build on top of node.js. also I was thinking first (but not sure anymore) that node.js could be really a nice entry point for our application arhitecture since it can qucikly distribute our requests and pass them forward to various services. There is also a list of node modules which helps to see what node can do.

nginx

apache2

tomcat

fefe's server

ruby eventmachine

Wiki/Documentation

Eventmachine: A Ruby Framework for Ruby (and JRuby).

It provides clients for

  • HTTP
  • Memcached
  • SMTP
  • Socks and numerous other protocols.

It supports epoll (Linux), kqueue (BSD/OS X) and /dev/poll (Solaris)

data base

General

MySQL

neo4j

  • does not really scale beyond one machine. So if the graph grows over a certain size we are game over.
  • good use case for recommender systems especially real time recommender
  • pay attention to super nodes (which a social network certainly has)
  • basically have to develop with embedded java graph data base since other drivers are to slow
  • really perfect if one wants to keep track of the social network of a user.

MongoDB

Documentation: http://docs.mongodb.org/manual/

  • Highly distributed database build upon JSON-style documents
  • Auto-sharding for seamingless horizontal scalabilty
    • Sharding may be useful for write-intensive IO on the database-infrastructure. Otherwise it is not 100% necessary
  • Features GridFS which can be used as module for nginx
    • Enables fragmentation of BSON documents into chunks of 256kb. Filesize of BSON documents is limited to 16MB without usage of GridFS.
  • Replication (Master/Slave)
    • ReplicationSets for redundancy

cassandra

Riak

Documentation: http://docs.basho.com/riak/latest/

  • Highly distributed database with internal Key-Value store
  • Auto-sharding
  • build-in Map-Reduce
  • support for PHP, Ruby, Python, Node.js, Java ...
  • consistend hashing
  • What is Riak?
  • Slides for brief introduction to Riak
  • Riak in a productive environment

caching / indices

memcached

  • easy to scale horizontally just add more machines.
  • hard to think of persistance with data base layer.
  • needs a certain caching layer (could be integrated on various layers)

Redis

Documentation: http://redis.io/documentation

  • Key/Value Store
  • performs well with massive amount of requests (also small ones) that rely on volatile data
  • Small proof of concept of building an autocompletion "service" with redis

Benchmark: For the nerds http://redis.io/topics/benchmarks

The main question will be: do we have technologies (like cassandra or our own services) where we don't need caching or where we can cache without network requests?

write queue

solr / Lucene

I am pretty sure that Apache solr will be used to power our search infra structure. Reading this quora discussion one can see that there seems to exist only one reasonable alternative which is elastic search. From my understanding solr is more mature but it would be nice to find some solid benchmarks (which also relate to our usecase)

  • How many documents will index?
  • how many concepts / queries for auto suggest?
  • how many GB of data?
  • How many search queries to we expect per second
  • How will we integrate personalized rankings?

Especially since we also want to index pages from external sites we should be able to parse html there is apache nutch which builds on solr, provides crawling, link database and HTML parsing. There exists an introduction to nutch

An example use of solr can be found here:

web frame works / programming languages

ruby on rails

Checking out the ruby on rails site you quickly come to the screencast section. I directly found a pretty good series on scaling rails. Most of the implemented techniques will work with other frameworks too but it was really nice to see how easily they could be integrated in a rails application. Also it was good to see some talks about tools one can use for load testing server applications.

You really want to get started here

Recent updates:

  • improved turbolink (reponsible for updating webpages partially)

grails

gwtp on gwt

  • good since all is java
  • bad since result is an entire application
  • bad since rather big technology stack
  • not clear how long google will maintain the project

PHP / hip hop

Symfony

Documentation: & http://symfony.com/doc/current/index.html

python / djiango / giotto

HTML5

Websockets

Documentation: http://www.w3.org/TR/websockets/

  • May be useful to establish low-latency connections and real-time capabilities with remote ressource.
  • Avoids HTTP overhead and long polling
  • Uses a single TCP connection

Web Worker

Documentation: http://www.w3schools.com/html/html5_webworkers.asp

  • Website stays responsive while a script is running

misc

thrift

thrift seems like a very useful tool to create cross language RPC modules. Need to dig deeper into this

tornado

asynchronous network I/O library for python http://www.tornadoweb.org/en/stable/ Maybe there exist something like that for other languages too.

Twitter Bootstrap

Documentation: http://twitter.github.io/bootstrap/

Based on this HTML5/CSS framework, we will be able to do something awesome. Good starting point!!

  • easy to extend
  • various forks (see github)
  • very good semantic structure

Visual tools and snippets:

lesefoo

Scalability

Master thesis on how to build a scalable webcrawler. IMHO a good introduction to this domain and recommended reading. Tons of external references mentioned.

Webservices

Quick Introduction to Webservices and it's designated protocols and specifications: http://www.w3schools.com/webservices/default.asp

Interface design and user interaction

Design Patterns is a good knowledgebase on how to design user interfaces. For sure, all concepts have to be evaluated!

Security

Short introduction on some topics of security in context of distributed systems: http://www.w3.org/standards/xml/security

Personal tools