daru (Data Analysis in RUby) is a library for storage, analysis, manipulation and visualization of data in Ruby.
daru makes it easy and intuitive to process data predominantly through 2 data structures:
Daru::DataFrame
and Daru::Vector
. Written in pure Ruby works with all ruby implementations.
Tested with MRI 2.5.1 and 2.7.1.
daru-view is for easy and interactive plotting in web application & IRuby notebook. It can work in any Ruby web application frameworks like Rails, Sinatra, Nanoc and hopefully in others too.
Articles/Blogs, that summarize powerful features of daru-view:
This gem extends support for many Import and Export methods of Daru::DataFrame
. This gem is intended to help Rubyists who are into Data Analysis or Web Development, by serving as a general purpose conversion library that takes input in one format (say, JSON) and converts it another format (say, Avro) while also making it incredibly easy to getting started on analyzing data with daru. One can read more in SciRuby/blog/daru-io.
$ gem install daru
daru exposes two major data structures: DataFrame
and Vector
. The Vector is a basic 1-D structure corresponding to a labelled Array, while the DataFrame
- daru's primary data structure - is 2-D spreadsheet-like structure for manipulating and storing data sets.
Basic DataFrame intitialization.
data_frame = Daru::DataFrame.new(
{
'Beer' => ['Kingfisher', 'Snow', 'Bud Light', 'Tiger Beer', 'Budweiser'],
'Gallons sold' => [500, 400, 450, 200, 250]
},
index: ['India', 'China', 'USA', 'Malaysia', 'Canada']
)
data_frame
Load data from CSV files.
df = Daru::DataFrame.from_csv('TradeoffData.csv')
Basic Data Manipulation
Selecting rows.
data_frame.row['USA']
Selecting columns.
data_frame['Beer']
A range of rows.
data_frame.row['India'..'USA']
The first 2 rows.
data_frame.first(2)
The last 2 rows.
data_frame.last(2)
Adding a new column.
data_frame['Gallons produced'] = [550, 500, 600, 210, 240]
Creating a new column based on data in other columns.
data_frame['Demand supply gap'] = data_frame['Gallons produced'] - data_frame['Gallons sold']
Condition based selection
Selecting countries based on the number of gallons sold in each. We use a syntax similar to that defined by Arel, i.e. by using the where
clause.
data_frame.where(data_frame['Gallons sold'].lt(300))
You can pass a combination of boolean operations into the #where
method and it should work fine:
data_frame.where(
data_frame['Beer']
.in(['Snow', 'Kingfisher','Tiger Beer'])
.and(
data_frame['Gallons produced'].gt(520).or(data_frame['Gallons produced'].lt(250))
)
)
Plotting
Daru supports plotting of interactive graphs with nyaplot. You can easily create a plot with the #plot
method. Here we plot the gallons sold on the Y axis and name of the brand on the X axis in a bar graph.
data_frame.plot type: :bar, x: 'Beer', y: 'Gallons sold' do |plot, diagram|
plot.x_label "Beer"
plot.y_label "Gallons Sold"
plot.yrange [0,600]
plot.width 500
plot.height 400
end
In addition to nyaplot, daru also supports plotting out of the box with gnuplotrb.
Docs can be found here.
Pick a feature from the Roadmap or the issue tracker or think of your own and send me a Pull Request!
For details see CONTRIBUTING.
Copyright (c) 2015, Sameer Deshmukh All rights reserved