Thursday, 16 July 2015

Hadoop with R

R and Hadoop

Introduction

Apache Hadoop provides a robust and economic platform for storing and process big data. R programming language is used by many data analysts for statistical analysis. In this article, I talk about putting these two together to form a powerful platform for big data analysis.

Apache Hadoop


Apache Hadoop has become synonymous with Big Data. Nobody talks about Big Data without doing something with Hadoop.  Hadoop helps to complete your job faster by distributing the computations to a cluster of commodity machines. This makes it possible for organizations to cut their data management costs by as much as 90% and yet build a fault-tolerant data processing system.
Hadoop has two core components, HDFS for distributed storage and Mapreduce for distributed processing. The hadoop architecture can be represented by the below diagram.




Hadoop Architecture
Hadoop Architecture

Hadoop cluster consists of two types of nodes (or machines). There is one master node and multiple worker nodes. Name node and data node are processes that are part of HDFS, Hadoop distributed file system. Job tracker and Task tracker are part of Map reduce – the distributed processing system of hadoop. User jobs are divided into two types of tasks, mappers and reducers. Mappers do the filtering of data and convert the data into key value pairs. Reducers process each key and produce an aggregated output. Mappers take the input from HDFS and store their output in local file system. Reducers get the output of mappers and store the final output in HDFS. Since all mappers and reducers have a share nothing architecture, hadoop provides a very highly scalable parallel processing architecture.

R Programming language

R programming language has been used for statistical computing. With the increased interest in data analytics, usage of R has increased significantly. It is estimated that more than 70% of the data scientists use R for statistical analysis. R is an opensource product and is free. It is supplied as part of the GNU public license. R has outperformed many of the expensive and paid products for statistical processing. The R language itself is easy to learn and provides many libraries that provide functions to model and analyze data. R also provides extensive libraries for prediction as well as machine learning.
R provides many built in functions for machine learning as well as prediction modeling. One of my favorite is Holt-Winters model that provides time series modeling of data with some randomness, trend as well as seasonality. It is also called the triple exponential model. For example ,if you have data in a file  that has a single column as sales per day for the last five years for a particular store, then you can build a Holt Winters model like below:
>salesTS <- ts(sales,frequency=52,start=c(2010,1))
>hw<-HoltWinters(salesTS, seasonal=”add”, alpha=0.3,beta=0.2,gamma=12)
>p<-predict(hw,8,prediction.interval=TRUE)
>plot(p)
You will get the graph like below which gives the 8 future points along with upper and lower bounds:



Prediction plot of R
R Prediction plot

R-Hadoop

Now, can we put the power of hadoop and convenience of R together? R-hadoop is one such attempt. You write your mapper and reducer functions in R and the jobs are submitted to Hadoop which in turn distributes the work to R running on each machine in the cluster. The architecture can be represented in the below diagram:



R-Hadoop Architecture
R-Hadoop Architecture

You can initiate your map-reduce job through the R-hadoop server.  R-hadoop server submits the job to Job Tracker. Job tracker schedules the map and reduce tasks on task trackers running on each worker node. The map and reduce tasks execute the tasks by running the mapper code on the R-hadoop on the worker node. The R-hadoop mapper gets the input as keys and values, processes the data and stores them again as keys and values for the reducer. The reducer task collects the keys and values and calls the reduce function on R-hadoop for each key with a list of values. R-hadoop does not parallelize the algorithm itself. It distributes the work so that keys and values are distributed. Suppose you have to execute the above prediction for 200 stores and for each store it takes 10 minutes, then you can distribute this work on your hadoop cluster so that all the 200 stores can be processed within an hour.

Installation and setup

Though R-hadoop is not difficult to setup, it takes lot of trial and error to make it work properly.  Following steps need to be followed to set this up correctly:
On each machine in the cluster:
  1. Do a package installation of R (on Ubuntu, you can add the line deb http://ftp.osuosl.org/pub/cran/bin/linux/ubuntu precise/
to /etc/apt/sources.list and then use apt-get to install r-base and r-base-dev)
2. Start R with sudo R and add the following packages:
(“codetools”, “Rcpp”,”plyr”,”stringi”,”magrittr”,”stringr”,”reshape2″,”caTools”,”functional”, “digest”, “RJSONIO”)
  1. Quit to Linux command line and download the rmr package from any of the mirrors. Following is one of the mirrors:
wget http://github.com/RevolutionAnalytics/rmr2/releases/download/3.3.1/rmr2_3.3.1.tar.gz
  1. Install the package using the below command:
sudo R CMD INSTALL rmr2_3.3.1.tar.gz

If it throws up any errors that some package is missing or outdated, reinstall that package and try again.
Following steps need to be executed on the master node only:
  1. Install R studio server on the master node. I found following instructions for installing R studio server that is very useful:
sudo apt-get install gdebi-core
sudo gdebi rstudio-server-0.99.464-i386.deb

This automatically starts Rstudio server, so after we make the configuration changes, you will have to restart the server.
6. This is an important step for connecting the server to hadoop. Find out your hadoop installation path and hadoop streaming jar file and set two environment variables as below:
Edit the/etc/R/Renviron.site file and add the below lines at the end:
#following required for R-Hadoop
HADOOP_CMD=/usr/local/hadoop/bin/hadoop
HADOOP_STREAMING=/usr/local/hadoop/contrib/streaming/hadoop-streaming-1.2.1.jar
Your hadoop path and jar file may be different based on your hadoop version.
7. Now you need to restart Rstudio server for above changes to take effect:
sudo rstudio-server restart
8.  You can connect to the Rstudio server from a browser on any machine with the ip address of the server machine and port 8787. I use firefox and it comes up fine. It asks for an id and password that will be a user id and password on the Linux system.
9. You will get a screen like below when you login and you are all set to use R-hadoop:



Rstudio initial screen
Rstudio initial screen

I have used the Linux user id spider that was created using sudo adduser spider.

Running a sample program From Rstudio

we can submit hadoop jobs from Rstudio. We need to write a mapper function and a reducer function and then call the mapreduce function in rmr2 package to submit the job to hadoop.
Step 1.
Create the input files in HDFS. I will use a file with retail sales data for multiple stores with the below format as the input file:StoreId,date of sale,total daily salesNYT1,2010-01-01,1221NYT1,2010-01-02,1206
NYT1,2010-01-03,1001
NYT1,2010-01-04,1193
NYT1,2010-01-05,1067
NYT1,2010-01-06,1077
NYT1,2010-01-07,1131
NYT1,2010-01-08,1250
NYT1,2010-01-09,1261
NYT1,2010-01-10,1009
hadoop fs –mkdir data/in
hadoop fs -put sales.csv data/in/sales.csv
Step 2.
Write the mapper in Rstudio:
library(rmr2)
mapper = function(k, line) {
line[[1]]<-lapply(line[[1]],as.character)         # <-this is to remove any factors
keyval(line[[1]],  line[[3]])                  #  <- create keyvalue pair output from mapper
}
Note that the mapper gets one input split of data as a list. So line above is not a single line but a list of lines. Since R is good a t vector processing, it makes sense not to call mapper for each line of input.
Step 3.
Write the reducer in Rstudio. We will use my favorite HoltWinters for triple exponential smoothing and prediction.
reducer = function(key, sales.list) {
# Reject  lists that are too small for the algorithm
if( length(sales.list) < 100 ) return;
valTS <-ts( as.numeric(sales.list), frequency=7,start=c(2010,1))   #<- convert to time series data
myModel<-HoltWinters(valTS, seasonal=”add”,alpha=0.3,beta=0.2, gamma=7)   #<- model using HoltWinters. Gamma represents seasonality.
predictSales<-predict(myModel,7,prediction.interval=TRUE)      #<- predict next 7 day sales with upper and lowerbounds
keyval(key, predictSales)             #output the predicted values along with key
}
 Step 4.
Finally submit the mapreduce job: mapreduce(input=”/user/spider/data/in”,          input.format=make.input.format(“csv”, sep = “,”,mode=”text”),          output=”/user/spider/data/out”,          output.format=make.output.format(“csv”, sep = “,”,mode=”text”),          map=mapper,          reduce=reducer    )
Note that absolute paths are specified for the input and output.
You will see mapreduce job executing like below:



Hadoop job in R
Hadoop job in R

You can check the output using hadoop. For just one store, it will look like below:
hadoop fs –cat data/out/part*
NYT1,2338.51949369753,2432.89119831428,2244.14778908078
NYT1,2116.48417153055,2216.78491253985,2016.18343052126
NYT1,2251.52104468871,2359.36936564803,2143.67272372938
NYT1,2183.65383703299,2300.62907811683,2066.67859594915
NYT1,2193.71659308069,2321.31048815327,2066.12269800812
NYT1,2228.88574017237,2368.47932443543,2089.2921559093
NYT1,2330.63829440411,2483.49715245174,2177.77943635647
The lines contain the store id, predicted value, upper bound and the lower bounds for the next 7 days. If there are multiple stores in the input, then each store will have 7 lines each.

Behind the scenes

The map reduce job is submitted to hadoop by the Rstudio server. Hadoop in turn uses the streaming jar with the mapper and reducer functions. The mapper function is run on R (Separate instance from Rstudio server) and the output key values are sent to reducer. Reducer function is run on another instance of R. The input for the reducer is consolidated from the mappers and all the key values are grouped together. Finally the reducer output is stored back into HDFS.

Advantages of R-hadoop

R-hadoop distributes your R jobs on multiple machines on the cluster. This enables parallel processing if similar R functions have to be run on multiple keys. For example if same analysis has to be done on 10000 customers of a bank, 5000 stores of a retail chain, thousands of credit card customers, millions of customer transactions etc. Though individual algorithm is not distributed, each key can be processed in parallel, leading to significant time savings.

Disadvantages of R-hadoop

Since each map or reduce task runs on separate R instances, the overhead per task is more. Also if you have an algorithm that runs on large amount of data for hours, then R-hadoop does not help in parallelization of the algorithm.

R-hadoop and EMR

EMR is the elastic map reduce service provided by Amazon Web Services. EMR allows one to provision a hadoop cluster on demand and release the resources once the job is done. EMR provides bootstrap scripts that enable you to install any required software before the mapreduce job is started. Using the bootstrap scripts, one can set up R-hadoop on the cluster including the R-server and submit the jobs automatically or through the browser. We did this for an enterprise so that they could also install graphic analysis libraries along with R and run R jobs on hadoop to get the analysis results through EMR.

Conclusion

R-hadoop is very convenient for distributing your analysis using R so that processing for multiple keys can be distributed across the cluster. Those data-scientists who are well versed with R will find it very easy to use R-hadoop. For cases where algorithm itself has to be parallelized, R-hadoop may not be useful and other alternatives like Spark machine learning library may be used.

Saturday, 11 July 2015

BigData Visualization Tools

Big Data is more valuable when visualized and analyzed

Data visualizations are everywhere today. From creating a visual representation of data points to impress potential investors, report on progress, or even visualize concepts for customer segments, data visualizations are a valuable tool in a variety of settings. When it comes to big data, weak tools with basic features don’t cut it. The following 39 tools (listed in no particular order) are some of the best, most comprehensive, sophisticated-yet-flexible visualization tools available — and all are capable of handling big data.
Many of these tools are Open-Source, free applications that can be used in conjunction with one another or with your existing design applications, using JavaScript, JSON, SVG, Python, HTML5, or drag-and-drop functionality with no programming required at all. Others are comprehensive business intelligence platforms capable of sophisticated data analysis and reporting, complete with a multitude of ways to visualize your data. Whether you need to analyze data and determine the best ways to present it to clients or partners, or you have a visual layout in mind and need a tool to bring your concept to life — there’s a tool on this list to serve your needs.

ProfitBricks Cloud Computing – IaaS – Optimized for Big Data

ProfitBricks Cloud Computing –  IaaS is the best platform for all of your big data workloads and projects.  Every cloud server instance has dedicated CPU cores, dedicated RAM, and 80Gbps connections between servers and servers and storage – enabling the best price/performance ratio in the industry.  With predictable performance, and low latency, your jobs will finish in record time – every time.   Trust your next big data workload with ProfitBricks.  Try us for free with our 14-day free trial. Signup today – no credit card is required.
1. Polymapspolymaps
Need to display complex data sets over maps? Polymaps is a free JavaScript library and a joint project from SimpleGeo and Stamen. This complex map overlay tool can load data at a range of scales, offering multi-zoom functionality at levels ranging from country all the way down to street view.
Key Features: 
  • Uses Scalable Vector Graphics (SVG)
  • Show data at country, state, city, neighborhood, and street views
  • Basic CSS rules control design
  • Imagery in sphericalMercatorr tile format
Cost: FREE
2. NodeBox
Key Features: 
  • Integrates with standard design applications
  • Cross-platform, node-based GUI
  • NodeBox1 – Mac app for Python-coded, 2D visuals
  • Import data in a variety of formats, including Excel
  • Animation-capable
  • Build generative designs with minimal programming skills
Cost: FREE
3. Flot

flot

A JavaScript plotting library for jQuery, Flot is a browser-based application compatible with most common browsers — including Internet Explorer, Chrome, Firefox, Safari and Opera. Flot supports a variety of visualization options for data points, interactive charts, stacked charts, panning and zooming, and other capabilities through a variety of plugins for specific functionality.
Key Features: 
  • Supports lines, plots, and filled areas in any combination
  • Use combinations of display elements in the same data series
  • Plot categories and textual data
  • Add HTML with standard DOM manipulation
  • Produce interactive visualizations with a toggling series
  • Direct canvas access for drawing custom shapes
Cost: FREE

processing

Processing was originally created as a means to teach computer fundamentals in a visual context, but is now used by students, designers, researchers, artists and hobbyists to create learning modules, prototypes and for actual production. Users can create simple or complex images, animations, and interactions.
Key Features: 
  • 2D, 3D and PDF output
  • Interactive programs
  • Open GL integration
  • More than 100 libraries for add-on functionality
  • Create interactions, textures, motion and animation
Cost: FREE
processingjs
The sister site of Processing, Processing.js is the tool you need to transition your complex data visualizations, graphics, charts and other visuals to a viable web format without any extensions or plugins. That means you can write code using the standard Processing language and insert it into your website, while Processing.js makes it functional without additional coding requirements.
Key Features: 
  • Allows Processing code to be run by any HTML5 browser
  • Integrate animated and interactive visualizations into any web page
  • No major additional coding necessary
Cost: FREE
6. Tangle

tangle
Tangle is a JavaScript library and tool that takes visualizations beyond the visual, allowing designers and developers to create reactive programs that provide a deeper understanding of data relationships. For example, a web-based conversion calculator that converts currency or measurements.
Key Features: 
  • Allow readers to change parameters
  • Based on defining variables, formats and classes
  • Create charts, graphs and other data visualizations using Tangle classes
  • Capable of creating dynamic displays
  • Create controls and views using multiple variables simultaneously
Cost: FREE
7. D3.jsd3js
A JavaScript library for creating data visualizations with an emphasis on web standards. Using HTML, SVG and CSS, bring documents to life with a data-driven approach to DOM manipulation — all with the full capabilities of modern browsers and no constraints of proprietary frameworks.
Key Features: 
  • Bind arbitrary data to DOM
  • Create interactive SVG bar charts
  • Generate HTML tables from data sets
  • Variety of components and plugins to enhance capabilities
  • Built-in reusable components for ease of coding
Cost: FREE

ffchartwell
FF Chartwell transitions simple strings of numbers into editable data visualizations for further customization using OpenType features. It’s an extension that can be used with a standard design suite, such as Adobe Creative Suite, to simplify the process of designing charts and graphs. 
Key Features: 
  • Use simple data strings to generate charts and graphs
  • Useful for creating components of a larger infographic
  • No-code functionality saves time
  • Integrates with design applications
  • Multiple types of visualizations
Cost: 
  • All 7 weights – $129
  • Individual weights – $25 each (bars, vertical, lines, pies, radar, rings, rose)

googlemap

Google Maps offers several APIs for developers, such as Google Earth, Google Maps Images, and Google Places. These tools enable developers to build interactive visual mapping programs for any application or website.
Key Features: 
  • Embed maps into web pages
  • Pull data about establishments, places of interest and other locations
  • Enable web visitors to utilize Google Earth within the constraints of your site
Cost: Contact for a quote

sas
SAS Visual Analytics is a tool for exploring data sets of all sizes visually for more comprehensive analytics. With an intuitive platform and automatic forecasting tools, SAS Visual Analytics allows even non-technical users to explore the deeper relationships behind data and uncover hidden opportunities.
Key Features: 
  • Deploy on-premise or in a public or private cloud
  • Drag-and-drop autocharting chooses the best layout for data
  • Pop-up boxes identify potentially important correlations
  • Scenario analysis enables predictions based on variable changes
  • Save views as reports, images or SAS mobile apps
  • Create web-based, interactive reports
  • Easy integration of action elements for users to manipulate data
Cost: 
  • Free demo with full features (no ability to save reports between sessions)
  • Call for a quote
11. Raphael

raphael

A JavaScript library for creating vector graphics on the web, Raphael uses SVG and VML so that every graphic created is also a DOM object. Raphael’s goal is to enable vector graphics creation with cross-browser compatibility.
Key Features: 
  • Include Raphael.js in a web page for functionality
  • Create a variety of charts, graphs and other data visualizations
  • Multi-chart capabilities
Cost: FREE
12. Inkscape

inkscape

Inkscape offers functionality similar to that of more expensive applications, such as Corel Draw and Illustrator, yet it’s an Open Source editor for vector graphics. Inkscape supports many advanced SVG features for ease of use and encourages developer collaboration in a community environment.
Key Features: 
  • Handles complex graphic tasks similar to standard software
  • Native SVG format
  • Create website mockups
  • Bitmap import and display capabilities
  • Files stored as vector graphics
Cost: FREE
13. Leaflet

leaflet

An Open-Source JavaScript library, Leaflet is a tool for creating mobile-friendly, interactive maps. Developed by Vladimir Agafonkin and a team of contributors, Leaflet was designed with the goals of simplicity, performance and usability.
Key Features: 
  • Works on all major desktop and mobile browsers
  • Various plugins for extended capabilities
  • Incorporate interactive features
  • Multiple available map layers
  • CSS3 features for streamlined user interaction
  • Eliminates tap delay on mobile devices
Cost: FREE

crossfilter

Exploring large multivariate data sets in a browser is made possible by Crossfilter, a JavaScript library that’s capable of handling data sets with more than a million records. Crossfilter uses semantic versioning and creates data visualizations easily using values, objects and other components and commands for customization. It was actually built to power analytics for Square Register to enable merchants to manipulate sales and purchase data.
Key Features: 
  • Uses semantic versioning
  • Explore large multivariate datasets
  • Fast incremental filtering and reducing
  • Improves performance of live histograms
Cost: FREE
openlayers
Insert a dynamic map on any web page with OpenLayers. It implements a JavaScript API for building web-based geographic applications and works in most modern web browsers with no server-side dependencies. It’s an open-source software with a new edition in the works, OpenLayers 3, which incorporates the most recent HTML5 and CSS features and enhance 3D capabilities.
Key Features: 
  • Works in most modern web browsers
  • No server-side dependencies
  • Creates embeddable, dynamic maps
  • Functional zoom, geo-location and dozens of other functions
Cost: FREE
16. Kartographkartograph
A Python library and JavaScript library in one, Kartograph caters to developers who want to create Illustrator-friendly SVG maps and interactive maps that will work across all major browsers.
Key Features:
  • Two libraries: Python and JavaScript
  • Kartograph.js creates interactive maps in minutes
  • Stand-alone; no server required
  • Kartograph.py creates compact SVGs using Visvalingam simplification
  • Layer data sets on maps for multi-layer visualization
Cost: FREE
17. Microsoft Excelexcel
Microsoft Excel is widely noted for its data manipulation and analysis capabilities, but it’s often used to create powerful data visualizations. The latest edition of Excel is packed with visualization tools, including recommended charts, quick analysis of the different ways to display your data, and a multitude of control options to change the look and layout of your visualizations.
Key Features: 
  • Perform data analysis and create visualizations in the same program
  • Compare various ways to represent your data
  • Change tile, layout and other format options
  • Excel recommends the best visualization for your data
  • Compatible with Microsoft Office products
Cost: 
  • Stand-alone – $109.99
  • Complete Office Home & Professional Suite – $219.99
  • Complete Office Professional 2013 Suite – $399.99
 18. Modest Mapsmodestmaps
A free, extensible library for developers who want to incorporate interactive maps into their applications, Modest Maps is a collaborative project by Stamen, Bloom and MapBox.
Key Features: 
  • Used as the foundation for building mapping tools
  • Used with several extensions, such as MapBox.js, HTMAPL, and Easey
  • Designed to provide basic controls
Cost: FREE
19. CartoDBcartodb
Visualize hundreds to millions of data points with CartoDB, which allows you to upload data and visualize it within minutes. It also enables geospatial analysis to explore, refine and obtain insights from your data.
Key Features: 
  • Explore data and get insights
  • Edit data directly on maps
  • Compatible with PostGIS for more powerful analysis
  • CartoCSS for advanced styling
  • Supports raster and vector data
Cost: 
  • Newbie Server – Free (up to 5 tables)
  • Magellan Plan – $29 per month (up to 10 tables)
  • John Snow Plan – $49 per month (up to 20 tables)
  • Coronelli Plan – $149 per month (unlimited tables)
20. Google Chartsgooglecharts
Google Charts offers a variety of data visualization formats, ranging from simple scatter plots to hierarchical treemaps. Visualizations are fully customizable, and you can connect to your data in real time through dynamic data.
Key Features: 
  • Take advantage of the same charts Google uses
  • Assemble multiple charts into intuitive dashboards
  • Cross-browser compatibility
  • Cross-platform portability (iOS and Android devices)
  • Choose from a variety of charts
Cost: FREE
21. Gephi

gephi

Gephi is an Open-Source application that runs on Windows, Linux and Mac OS. The platform allows users to both visualize and explore data, including complex analysis of links, social networks, and more for a greater understanding of data relationships.
Key Features: 
  • Plugins for greater customization
  • Deep data analysis to examine relationships
  • Built-in 3D rendering engine
  • Real-time visualization
  • Dynamic filtering
  • Intuitive interface with built-in workflow organization
Cost: FREE
22. Flare

flare
An ActionScript library for creating data visualizations that run in Adobe Flash Player, Flare is an Open-Source application that’s been used by multiple well-known organizations and publishers to create powerful visualizations, including Slate, the IBM Visual Communication Lab, and ABC News.
Key Features: 
  • Capable of complex, interactive graphics
  • Supports data management, visual encoding, animation and interaction
  • Variety of visualization formats from timelines to multi-layer graphs illustrating relationships
Cost: FREE
23. Envision.jsenvisionjs
Create fast and interactive HTML5 visualizations with Envision.js, a library capable of displaying real-time data, time series, finance visualizations, AJAX-driven financial charts and custom visualizations, including fractals.
Key Features: 
  • Built-in templates for various charts and graphs
  • Incorporate Visualizations, Interactions and Components for customization
  • Custom flotr chart types
Cost: FREE
24. Misomiso
An Open-Source tool in development, Miso incorporates Datasets, Storyboards and d3.charts for interactive storytelling and data visualization. Miso is a joint project between The Guardian and Bocoup, with support from GlobalDevelopment and The Bill and Melinda Gates Foundation.
Key Features: 
  • High-quality interactive storytelling
  • Data visualization content
  • JavaScript client-side data management and transformation library
  • Create reusable charts with D3.js
Cost: FREE
25. The R Projectrproject
The R Project for Statistical Computing runs on UNIX, Windows and Mac OS. Designed for statistical computing and graphics, it’s considered a different implementation of S and contains some native S code that remains unaltered within R, although there are some significant differences.
Key Features: 
  • Data manipulation, calculation and graphical display
  • Integrated tools for instant analysis
  • Conditions, loops, user-defined recursive functions, and input and output facilties
  • Define new functions for increased capabilities
Cost: FREE

tableau
Tableau is an easy-to-use tool for creating interactive data visualizations quickly and embed them on your website. Designed to be used by developers and non-developers alike, Tableau is used by bloggers, journalists, researchers, advocates, professors and students.
Key Features: 
  • Once online, others can download and manipulate visualizations
  • Desktop application but completed graphics are stored on a public server
  • Store up to 50MB of data (with free plan)
  • Drag-and-drop interface; no programming skills required
Cost: 
  • Public Edition – Free
  • Personal Edition – $999
  • Professional Edition – $1,999
@knightlab
timeline

Build interactive timelines in 40 different languages with Timeline JS, an Open-Source tool capable of pulling in media from multiple sources. With built-in support for Twitter, Flickr, Google Maps, YouTube, Vine and other applications, Timeline JS has a lot of functionality — which can be extended further by those with JSON capabilities for custom installations.
Key Features: 
  • Build timelines using Google Spreadsheet data
  • Simply upload a spreadsheet and generate embed code
  • Embed audio and video in timelines from 3rd-party apps
  • WordPress plugin
  • Feed data from a database with JSON
Cost: FREE

quadrigram

Quadrigram allows users to create completely customized visualizations using their own data and various components from a built-in library of everything from charts and graphs to quadrification and stacked flow. Based on a Visual Programming Language (VPL), Quadrigram can pull multiple data sources to create endless variations of prototypes and data visualizations.
Key Features: 
  • Complete library of interactive visualizations
  • Build animations, dashboards and more
  • Sketch ideas and create rapid prototypes
  • Cloud-based computing for quick data processing
  • Server-side integration of R and Gephi
  • Leverage multiple publicly-available datasets
Cost (prices converted from Euros): 
  • Academic – $8.09 per month (1 user, 10MB storage)
  • Personal – $25.63 per month (1 user, 1GB storage)
  • Professional – $79.60 per month (1-2 users, 5GB storage)
  • Workgroup – $335.93 per month (1-10 users, 50GB)
  • Enterprise – Contact for a quote
29. Prefuseprefuse
Prefuse is a data visualization tool that has been used by the IBM Visual Communication Lab to create visualizations for its Many Eyes tool. The Prefuse toolkit provides a visualization framework for JavaScript, while the Prefuse Flare toolkit offers visualization and automation tools for ActionScript and Adobe Flash Player.
Key Features: 
  • Data modeling, interaction and visualization
  • Optimized data structures for a variety of visual layouts
  • Supports animation, dynamic search and database connectivity
  • Uses Java 2D graphics library
Cost: FREE
30. Many Eyesmanyeyes
Many Eyes is an experiment created by IBM Research and the IBM Cognos Software Group. This tool provides a platform for creating a variety of visualizations to illustrate data point relationships, compare sets of values, create line and stack graphs, analyze text or view the various parts of a whole in a pie chart or treemap.
Key Features: 
  • Choose from a multitude of ways to display data
  • Upload data sets for public use
  • Displays data using Java and Flash
  • Get feedback through user ratings
  • Full control to delete your data sets and visualizations
  • Use existing data sets from other users or use your own
Cost: FREE
31. Cytoscape

cytoscape

Visualize complex networks and integrate with any type of attribute data with Cytoscape. With special features for specific areas of analysis, such as bioinformatics, semantic web, social network analysis, Cytoscape is packed with features to create fascinating graphic representations of data relationships.
Key Features: 
  • Apps for problem domains
  • Advanced analysis and modeling using apps
  • Visualize human-curated pathway datasets
  • Visualize social networks for interpersonal relationships
  • Use in combination with other tools (e.g. R, NetworkX)
Cost: FREE
32. NetworkXnetworkx
NetworkX is based on the Python programming language, capable of creating graphs, digraphs and multigraphs based on data sets comprised of multiple media formats. Python is a multi-platform language for creating more cross-compatible data visualizations.
Key Features: 
  • Study the structure, dynamics and functions of complex networks
  • Nodes can contain any media type, such as images and XML
  • Edges capable of holding arbitrary data, such as weights or a time-series
  • Generators for various graph types – classic graphs, random graphs, synthetic networks
Cost: FREE
33. Arbor.jsarborjs
Arbor is built with web workers and jQuery, creating a data visualization tool for use with canvas, SVG, or positioned HTML elements. Arbor is designed to enable developers to create code that emphasizes the uniqueness of their data sets rather than the physics required for various layouts.
Key Features: 
  • Capable of handling real-time color and value tweens
  • Force-directed layout album plus abstractions
  • Actual screen-drawing is up to the user
Cost: FREE
34. iCharts

icharts2
iCharts is a web-based application capable of producing compelling data visualizations for the web. Incorporate charts and graphs into a website or application or distribute completed visualizations through social media or the iCharts ChartChannel.
Key Features: 
  • Brand visualizations with your company logo
  • Add tags and descriptions for better discovery
  • Enable 3rd-party sites to re-embed visualizations to expand reach
  • Enable social sharing
  • Create interactive, explorable charts
  • Activate custom forms for lead generation
  • Analytics reports on chart views, shares and embeds
Cost: 
  • Basic – Free (public charts only)
  • Gold – $25 per month (private charts)
  • Platinum – $75 per month (branded charts)
  • Enterprise – Contact for a quote (full features)
35. DataboarddataboardOne of the latest tools from Google, Databoard is a part of Google’s Think platform, geared to business owners. Explore insights directly from Google research studies to find data quickly, and create custom infographics to embed in your website or share on your social networks.
Key Features: 
  • Explore Google research studies for data
  • Instantly generate graphic components
  • Build custom graphics by incorporating multiple components
  • Focused primarily on mobile data
Cost: FREE

qresearch
A powerful database for both research and data visualizations, Q Research Software is a valuable tool for preparing market research reports complete with targeted accompanying visualizations. Export to Word, Excel and PowerPoint in graphic format, CSV files, PDFs and choose from dozens of tools and components for complete customized visualizations.
Key Features: 
  • Editable Office graphics
  • Multiple chart types (line, bubble, pie, column, etc.)
  • Histograms and scatter plots
  • Update tables with real-time data
  • Create variables, apply filters and perform statistical testing
Cost: 
  • Standard License – $1,499 per year (all features)
  • Transferable License – $4,497 per year (install on multiple computers)
37. Dapresy

dapresy

Designed for research analysts, Dapresy allows users to build infographics for slides and dashboards with a drag-and-drop interface for ease of use. It’s a comprehensive platform that handles the entire reporting process from data analysis to visually appealing presentation tools and dashboards.
Key Features: 
  • Simply import fieldwork, Dapresy handles data processing
  • Charts, tables, cross-tabulations comprehensive statistical analysis
  • Build dynamic elements for the marketing dashboard
  • Pack data from a 200-slide presentation into a few dynamic Dapresy slides
  • Idea Box for inspiration
Cost: Contact for a quote
38. Visualize Freevisualize3
Based on the commercial visualization tool InetSoft, Visualize Free is a free alternative for sifting through multiple data sets and variants to identify trends and manipulate data with a few simple clicks.
Key Features: 
  • Upload your data in Excel or CSV format
  • Drag-and-drop components to build visualizations
  • Sandboxes for analysis and sales data
  • Share publicly or privately
Cost: FREE

jolicharts
Embed charts and graphs into your applications with Jolicharts, which is compatible with multiple data sources and can handle complexity of connecting multiple sources. With integrated elastic calculation capabilities, Jolicharts can handle big data with ease.
Key Features: 
  • Drag-and-drop interface to create stunning dashboards
  • Export dashboards to XLS, PDF or JPG formats
  • Filter to securely separate user data
  • REST-based API for compatibility with any application
  • Cloud-based application keeps your data and visualizations accessible
  • HTML5 dashboards for accessing data from any device
Cost (prices converted from Euros): 
  • Forever Free Plan (5 databases per dashboard, up to 50MB calculation power)
  • 2GB – $39.12 per month
  • 10GB – $86.34 per month
  • 50GB – $295.45 per month
  • 250GB – $565.27 per month