Howto managing tweets saved in #Hadoop using #Apache #Spark SQL

2015-01-15 | #Me

Instead of using the old Hadoop way (map/reduce), I suggest using the newer and faster way (Apache Spark on top of Hadoop Yarn): in few lines you can open all tweets (zipped json files saved in several subdirectories hdfs://path/to/YEAR/MONTH/DAY/*gz) and query them in a SQL like language``` sc = SparkContext(appName=“extraxtStatsFromTweets.

Continue reading 


#Hadoop Search with Apache #Solr

2014-10-03 | #Me

The two top Hadoop distributions (Cloudera and Hortonworks but remember that Hadoop is a Free Software and many companies do not pay anything for using it!) include Apache Solr as Hadoop search tool See apache-solr-hadoop-search article and the following two presentations from the two vendors [slideshare id=35810888&doc=solr-2-140612162029-phpapp02] [slideshare id=24255985&doc=hadoopplussolrbigdatasearch-130715115557-phpapp02] See also the Natural Language Processing and Sentiment Analysis for Retailers using HDP and ITC Infotech Radar article

Continue reading 