Wednesday, February 29, 2012
Tuesday, February 14, 2012
Datameer Analytics on Big Data
This allows visualization on Hadoop and seems to be the best out there doing this
Datameer Analytics Solution
View more presentations from templedf
Saturday, February 11, 2012
Pentaho ETL software for Hadoop
Pentaho ETL tool now makes it really easy to build MapReduce jobs for Hadoop
http://www.youtube.com/watch?v=KZe1UugxXcs&feature=colike
http://www.youtube.com/watch?v=KZe1UugxXcs&feature=colike
Tuesday, February 7, 2012
Open Source ETL comparision
I agree to some of the analysis here. The future for open source integration looks bright, the competition is good. Pentaho and Talend are front runners. Talend has the advantage of being deployed onto any Java server but Pentaho has more users and is slightly easier to use.
Open Source ETL vs Commercial ETL
View more presentations from JonathanL
Sunday, January 29, 2012
Saturday, January 28, 2012
Thursday, January 26, 2012
Hadoop Links
thanks to Sameer Farooqui from Linkedin..
++ MapReduce Framework ++
Great 1 hour video introduction: http://nosqltapes.com/video/understanding-mapreduce-with-mike-miller
Read the famous 2004 paper from Google that kicked off the MapReduce revolution. This is a very readable paper that can be digested in about 2 - 3 hours:http://research.google.com/archive/mapreduce.html
Here's a 33 minute video on what kinds of simple things you can do with MapReduce:
http://www.cloudera.com/videos/mapreduce_algorithms
Google's MapReduce course:
http://code.google.com/edu/parallel/mapreduce-tutorial.html
++ Beginner Hadoop ++
Excellent beginner's video on understanding Hadoop, MapReduce and HDFS:
http://www.cloudera.com/protected/?resource=introduction-to-apache-mapreduce-and-hdfs
Understanding the Hadoop ecosystem:
http://www.cloudera.com/protected/?resource=apache-hadoop-ecosystem
++ HDFS ++
An easy 2-3 hour read about Hadoop's distributed File System:
http://www.aosabook.org/en/hdfs.html
++ Labs ++
Install VirtualBox on your laptop, get an Ubuntu Virtual Machine going and follow this excellent tutorial to install your first Hadoop node:
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
Then use this to scale your cluster to multiple nodes:
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
Run a MapReduce job in Python on your cluster:
http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
++ Bonus ++
Dynamo/Cassandra is a good alternative to Hadoop/HBase and is worth as least being familiar with: http://nosqltapes.com/video/understanding-dynamo-with-andy-gross
Great 1 hour video introduction: http://nosqltapes.com/video/understanding-mapreduce-with-mike-miller
Read the famous 2004 paper from Google that kicked off the MapReduce revolution. This is a very readable paper that can be digested in about 2 - 3 hours:http://research.google.com/archive/mapreduce.html
Here's a 33 minute video on what kinds of simple things you can do with MapReduce:
http://www.cloudera.com/videos/mapreduce_algorithms
Google's MapReduce course:
http://code.google.com/edu/parallel/mapreduce-tutorial.html
++ Beginner Hadoop ++
Excellent beginner's video on understanding Hadoop, MapReduce and HDFS:
http://www.cloudera.com/protected/?resource=introduction-to-apache-mapreduce-and-hdfs
Understanding the Hadoop ecosystem:
http://www.cloudera.com/protected/?resource=apache-hadoop-ecosystem
++ HDFS ++
An easy 2-3 hour read about Hadoop's distributed File System:
http://www.aosabook.org/en/hdfs.html
++ Labs ++
Install VirtualBox on your laptop, get an Ubuntu Virtual Machine going and follow this excellent tutorial to install your first Hadoop node:
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
Then use this to scale your cluster to multiple nodes:
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
Run a MapReduce job in Python on your cluster:
http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
++ Bonus ++
Dynamo/Cassandra is a good alternative to Hadoop/HBase and is worth as least being familiar with: http://nosqltapes.com/video/understanding-dynamo-with-andy-gross
Friday, January 13, 2012
Hadoop real time
http://www.slideshare.net/parallellabs/sigmod-realtime-hadooppresentation
http://www.rhipe.org/home
http://www.rhipe.org/home
Oracle Strategy on Endeca
This is a good presentation on strategy...
- Endeca allows business analyst to explore data
- Endeca has hooks to Hadoop and has a smart cube builder
- Oracle bought this last year and has plans to integrate these tools...
http://www.oracle.com/us/corporate/acquisitions/endeca/general-presentation-517133.pdf
Thursday, January 12, 2012
Subscribe to:
Posts (Atom)