Monday, June 20, 2016
Complete guide to create a Time Series Forecast (with Codes in Python)
http://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/?utm_content=buffer28897&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer
How to Rank 10% in Your First Kaggle Competition
https://dnc1994.com/2016/05/rank-10-percent-in-first-kaggle-competition-en/
Monday, June 13, 2016
Thursday, June 9, 2016
Wednesday, June 1, 2016
Tuesday, May 31, 2016
Decision tree example
http://www.patricklamle.com/Tutorials/Decision%20tree%20python/tuto_decision%20tree.html
http://www.patricklamle.com/Tutorials/Decision%20tree%20R/Decision%20trees%20in%20R%20using%20C50.html
http://gormanalysis.com/decision-trees-in-r-using-rpart/
https://triangleinequality.wordpress.com/2013/09/05/a-complete-guide-to-getting-0-79903-in-kaggles-titanic-competition-with-python/
http://www.patricklamle.com/Tutorials/Decision%20tree%20R/Decision%20trees%20in%20R%20using%20C50.html
http://gormanalysis.com/decision-trees-in-r-using-rpart/
https://triangleinequality.wordpress.com/2013/09/05/a-complete-guide-to-getting-0-79903-in-kaggles-titanic-competition-with-python/
Friday, May 20, 2016
BERKELEY UNDERGRAD DATA SCIENCE COURSE AND TEXTBOOK
https://data-8.appspot.com/sp16/course
http://www.inferentialthinking.com/chapter1/what-is-data-science.html
http://www.inferentialthinking.com/chapter1/what-is-data-science.html
Thursday, May 19, 2016
machine learing
http://www.cs.cmu.edu/~tom/10701_sp11/lectures.shtml
http://www.cs.ubc.ca/~nando/540-2013/lectures.html
Suess
http://www.sci.csueastbay.edu/~esuess/classes/Statistics_6620/sta6620.htm
http://www.cs.ubc.ca/~nando/540-2013/lectures.html
Suess
http://www.sci.csueastbay.edu/~esuess/classes/Statistics_6620/sta6620.htm
Friday, May 13, 2016
Pulse Publish a post Doing Data Analysis and Data Science in Python with pandas
https://www.linkedin.com/pulse/doing-data-analysis-science-python-pandas-ali-syed
Thursday, March 24, 2016
20 short tutorials all data scientists should read (and practice)
http://www.datasciencecentral.com/profiles/blogs/17-short-tutorials-all-data-scientists-should-read-and-practice
Thursday, February 18, 2016
Google Spreadsheet Add Ons for Data Analysis
Google
spreadsheets are rapidly replacing excel for some types of data
analysis. Here are some useful Google spreadsheet add ons for data
analysis.
Blockspring
helps people scrape websites, get product prices from Amazon, search
Bing, save files to Dropbox, automate Twitter and outbound emails, find
sales leads, run advanced text analysis, and much, much more. It turns
Google Sheets into a playground for APIs. Irrespective of the technical
background, add-on enables one to automate work and build advanced
tools.
The
Text Analysis add-on provides an easy way to analyze any text as in
links, tweets or documents in Google Spreadsheets. It provides various
Natural Language Processing and Machine Learning tools.
The addon can be used to perform various tasks like
Sentiment analysis on Social Media streams
Extract mentions of entities and concepts
Summarize long chunks of text and articles
Summarize long chunks of text and articles
Detect the language of a document or tweet
Classify your documents or links into more than 500 categories
Extract the full text of an article, as well as its author name, embedded media etc
Classify your documents or links into more than 500 categories
Extract the full text of an article, as well as its author name, embedded media etc
With
this add-on one can analyze any kind of text and perform different
types of Text Analysis and perform sentiment analysis on content in
different languages.
It supports several languages like Italian, English, French, German and Portuguese.
The add-on helps enhance columns of text by automatically extracting keywords and named entities and linking them to Wikipedia.
It supports several languages like Italian, English, French, German and Portuguese.
The add-on helps enhance columns of text by automatically extracting keywords and named entities and linking them to Wikipedia.
Translate my sheet add-on helps translate your spreadsheet cell by cell or fully in one click in to more than 100 languages.
The Add-on helps in finding duplicate or unique values between two tables or in one sheet in 5 simple steps.
This
Add-on provides one-click solutions for daily tasks like splitting
cells, removing duplicates, changing case, finding and cleaning up data,
working with formulae & more. Power Tools add-on cuts the clicks on
repeated tasks and brings features for organizing and unifying data
in Google Sheets.
Smart
Autofill add-on uses Machine Learning via the Google Prediction API to
fill missing values in a column of data, based on other data in the
column and the data in adjacent columns.
It will look for patterns in the rows of data where there are values present in the column, and apply them to fill in the missing values of the column.
It will look for patterns in the rows of data where there are values present in the column, and apply them to fill in the missing values of the column.
The Add-on helps in plotting your own data onto a Google Map directly from Google Sheets.
The
Google Analytics spreadsheet add-on brings you the power of the Google
Analytics API to Google Spreadsheets. With this tool, you can:
- Query data from multiple views (profiles).
- Create custom calculations from your report data.
- Create dashboards with embedded data visualizations.
- Schedule reports to run automatically so your data is always current.
- Easily control who can see these data and visualizations by leveraging Google Spreadsheet’s existing sharing and privacy features.
This add-on is a great assistant for correcting all fuzzy matches and removing partial duplicates from your sheet.- Query data from multiple views (profiles).
- Create custom calculations from your report data.
- Create dashboards with embedded data visualizations.
- Schedule reports to run automatically so your data is always current.
- Easily control who can see these data and visualizations by leveraging Google Spreadsheet’s existing sharing and privacy features.
Thursday, January 21, 2016
5 major differences between Naïve Bayes and Logistic Regression.
1. Purpose or what class of machine leaning does it solve?
Both the algorithms can be used for classification of the data. Using these algorithms, you could predict whether a banker can offer a loan to a customer or not or identify given mail is a Spam or ham
2. Algorithm’s Learning mechanism
Naïve Bayes: For the given features (x) and the label y, it estimates a joint probability from the training data. Hence this is a Generative model
Logistic regression: Estimates the probability(y/x) directly from the training data by minimizing error. Hence this is a Discriminative model
3. Model assumptions
Naïve Bayes: Model assumes all the features are conditionally independent .so, if some of the features are dependent on each other (in case of a large feature space), the prediction might be poor.
Logistic regression: It the splits feature space linearly, it works OK even if some of the variables are correlated
4. Model limitations
Naïve Bayes: Works well even with less training data, as the estimates are based on the joint density function
Logistic regression: With the small training data, model estimates may over fit the data
5. Approach to be followed to improve the results
Naïve Bayes: When the training data size is less relative to the features, the information/data on prior probabilities help in improving the results
Logistic regression: When the training data size is less relative to the features, Lasso and Ridge regression will help in improving the results.
Subscribe to:
Posts (Atom)