Monday, June 20, 2016

Complete guide to create a Time Series Forecast (with Codes in Python)

http://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/?utm_content=buffer28897&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer

How to Rank 10% in Your First Kaggle Competition

https://dnc1994.com/2016/05/rank-10-percent-in-first-kaggle-competition-en/

Thursday, June 9, 2016

Tuesday, May 31, 2016

Decision tree example

http://www.patricklamle.com/Tutorials/Decision%20tree%20python/tuto_decision%20tree.html

http://www.patricklamle.com/Tutorials/Decision%20tree%20R/Decision%20trees%20in%20R%20using%20C50.html

http://gormanalysis.com/decision-trees-in-r-using-rpart/

https://triangleinequality.wordpress.com/2013/09/05/a-complete-guide-to-getting-0-79903-in-kaggles-titanic-competition-with-python/

Friday, May 20, 2016

BERKELEY UNDERGRAD DATA SCIENCE COURSE AND TEXTBOOK

https://data-8.appspot.com/sp16/course

http://www.inferentialthinking.com/chapter1/what-is-data-science.html

Thursday, May 19, 2016

machine learing

http://www.cs.cmu.edu/~tom/10701_sp11/lectures.shtml

http://www.cs.ubc.ca/~nando/540-2013/lectures.html




Suess
http://www.sci.csueastbay.edu/~esuess/classes/Statistics_6620/sta6620.htm

Thursday, March 24, 2016

20 short tutorials all data scientists should read (and practice)

http://www.datasciencecentral.com/profiles/blogs/17-short-tutorials-all-data-scientists-should-read-and-practice

Thursday, February 18, 2016

Google Spreadsheet Add Ons for Data Analysis

Google spreadsheets are rapidly replacing excel for some types of data analysis. Here are some useful Google spreadsheet add ons for data analysis.
Blockspring helps people scrape websites, get product prices from Amazon, search Bing, save files to Dropbox, automate Twitter and outbound emails, find sales leads, run advanced text analysis, and much, much more.  It turns Google Sheets into a playground for APIs. Irrespective of the technical background, add-on enables one to automate work and build advanced tools.
The Text Analysis add-on provides an easy way to analyze any text  as in links, tweets or documents in Google Spreadsheets. It provides various Natural Language Processing and Machine Learning tools.
The addon can be used to perform various tasks like
Sentiment analysis on Social Media streams
Extract mentions of entities and concepts
Summarize long chunks of text and articles
Detect the language of a document or tweet
Classify your documents or links into more than 500 categories
Extract the full text of an article, as well as its author name, embedded media etc
With this add-on one can analyze any kind of text and perform different types of Text Analysis and perform sentiment analysis on content in different languages.
It supports several languages like Italian, English, French, German and Portuguese.
The add-on helps enhance columns of text by automatically extracting keywords and named entities and linking them to Wikipedia.
Translate my sheet add-on  helps translate your spreadsheet cell by cell or fully in one click in to  more than 100 languages.
The Add-on helps in finding duplicate or unique values between two tables or in one sheet in 5 simple steps.
This Add-on provides one-click solutions for daily tasks like splitting cells, removing duplicates, changing case, finding and cleaning up data, working with formulae & more. Power Tools add-on cuts the clicks on repeated  tasks and brings features for  organizing and unifying data in Google Sheets.
Smart Autofill add-on uses Machine Learning via the Google Prediction API to fill missing values in a column of data, based on other data in the column and the data in adjacent columns.
It will look for patterns in the rows of data where there are values present in the column, and apply them to fill in the missing values of the column.
The Add-on helps in plotting your own data onto a Google Map directly from Google Sheets.
The Google Analytics spreadsheet add-on brings you the power of the Google Analytics API to Google Spreadsheets. With this tool, you can:
- Query data from multiple views (profiles).
- Create custom calculations from your report data.
- Create dashboards with embedded data visualizations.
- Schedule reports to run automatically so your data is always current.
- Easily control who can see these data and visualizations by leveraging Google Spreadsheet’s existing sharing and privacy features.
This add-on is a great assistant for correcting all fuzzy matches and removing partial duplicates from your sheet.

Thursday, January 21, 2016

5 major differences between Naïve Bayes and Logistic Regression.



1.       Purpose or what class of machine leaning does it solve? 
Both the algorithms can be used for classification of the data. Using these algorithms, you could predict whether a banker can offer a loan to a customer or not or identify given mail is a Spam or ham

2.       Algorithm’s Learning mechanism
Naïve Bayes: For the given features (x) and the label y, it estimates a joint probability from the training data. Hence this is a Generative model
Logistic regression: Estimates the probability(y/x) directly from the training data by minimizing error. Hence this is a Discriminative model

3.       Model assumptions
Naïve Bayes: Model assumes all the features are conditionally independent .so, if some of the features are dependent on each other (in case of a large feature space), the prediction might be poor.
Logistic regression: It the splits feature space linearly, it works OK even if some of the variables are correlated

4.       Model limitations
Naïve Bayes: Works well even with less training data, as the estimates are based on the joint density function
Logistic regression: With the small training data, model estimates may over fit the data

5.       Approach to be followed to improve the results
Naïve Bayes: When the training data size is less relative to the features, the information/data on prior probabilities help in improving the results 
Logistic regression: When the training data size is less relative to the features, Lasso and Ridge regression will help in improving the results.