Tuesday, February 14, 2012

Datameer Analytics on Big Data

This allows visualization on Hadoop and seems to be the best out there doing this

Saturday, February 11, 2012

Pentaho ETL software for Hadoop

Pentaho ETL tool now makes it really easy to build MapReduce jobs for Hadoop

http://www.youtube.com/watch?v=KZe1UugxXcs&feature=colike

Tuesday, February 7, 2012

Open Source ETL comparision

I agree to some of the analysis here. The future for open source integration looks bright, the competition is good. Pentaho and Talend are front runners. Talend has the advantage of being deployed onto any Java server but Pentaho has more users and is slightly easier to use.

Saturday, January 28, 2012

Thursday, January 26, 2012

Hadoop Links


thanks to Sameer Farooqui from Linkedin..

++ MapReduce Framework ++
Great 1 hour video introduction: http://nosqltapes.com/video/understanding-mapreduce-with-mike-miller

Read the famous 2004 paper from Google that kicked off the MapReduce revolution. This is a very readable paper that can be digested in about 2 - 3 hours:http://research.google.com/archive/mapreduce.html

Here's a 33 minute video on what kinds of simple things you can do with MapReduce:
http://www.cloudera.com/videos/mapreduce_algorithms

Google's MapReduce course:
http://code.google.com/edu/parallel/mapreduce-tutorial.html


++ Beginner Hadoop ++
Excellent beginner's video on understanding Hadoop, MapReduce and HDFS:
http://www.cloudera.com/protected/?resource=introduction-to-apache-mapreduce-and-hdfs

Understanding the Hadoop ecosystem:
http://www.cloudera.com/protected/?resource=apache-hadoop-ecosystem


++ HDFS ++
An easy 2-3 hour read about Hadoop's distributed File System:
http://www.aosabook.org/en/hdfs.html


++ Labs ++
Install VirtualBox on your laptop, get an Ubuntu Virtual Machine going and follow this excellent tutorial to install your first Hadoop node:
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

Then use this to scale your cluster to multiple nodes:
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

Run a MapReduce job in Python on your cluster:
http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/


++ Bonus ++
Dynamo/Cassandra is a good alternative to Hadoop/HBase and is worth as least being familiar with: http://nosqltapes.com/video/understanding-dynamo-with-andy-gross

Friday, January 13, 2012

Hadoop real time

http://www.slideshare.net/parallellabs/sigmod-realtime-hadooppresentation

http://www.rhipe.org/home


Oracle Strategy on Endeca


This is a good presentation on strategy...


  • Endeca allows business analyst to explore data
  • Endeca has hooks to Hadoop and has a smart cube builder
  • Oracle bought this last year and has plans to integrate these tools...


http://www.oracle.com/us/corporate/acquisitions/endeca/general-presentation-517133.pdf

Thursday, January 12, 2012