Publications by Matteo Pancaldi, Riccardo Ventura
Text Mining: Tripadvisor Hotel Reviews
The database that we are going to use in this projet consists of Tripadvisor reviews related to Hotels. The data is made up of 20491 reviews for 2 variables: Review = full text of review Rating = rating from 1 to 5 for each review In order to make the dataset usable and more easy to work on, we decide to extract a random sample of 1000 observat...
6272 sym R (7320 sym/29 pcs) 9 img 2 tbl
PCA on Music and Movie preferences
Introduction The aim of this project is to reduce the music and movie preferences of people supervised to investigation usign Principal Component Analysis. Dataset and first view The dataset used in this project is made up of 1010 observations where each row represents a person to whom the survey was administered and 150 different columns. The v...
6129 sym R (20724 sym/39 pcs) 20 img
Campus Recruitment: Advanced Visualisation in R
Campus Recruitment Factors influencing Employability The database that we are going to use in this projet consists of placement data of students in a campus. It includes secondary and higher secondary school percentage and specialisation. It also includes degree specialisation, type and Work experience and salary offers to the employed students....
13239 sym R (1966 sym/6 pcs) 44 img 2 tbl
Customer Clustering: K-means/PCA
Introduction The aim of this project is to segregating customers into groups based on common characteristics. The method of segmenting or segregating clients into classes based on similar features is called consumer segmentation. This process is very usefull in order to get a better knowledge of the tastes and desires of each group and adapt the ...
5343 sym R (5399 sym/51 pcs) 16 img 1 tbl
Churn Prediction
Description of dataset The dataset for our classification Machine Learning Project consists of all relevant Telco customer data and it is taken from Kaggle and stems from the IBM sample data set collection. (https://www.kaggle.com/blastchar/telco-customer-churn) Each row of this data represents a customer, each column contains customer’s attrib...
10773 sym R (29787 sym/50 pcs) 21 img 2 tbl
Text Mining: Amazon Plus vs Amazon Show
Amazon products Review The database that we are going to use in this projet consists of 5000 consumer reviews for Amazon products. The variables that we want to include in our task are: name = name of Amazon product review.rating = rating from 1 to 5 for each review review.text = full text of each review Amazon_P <- read.csv('/Users/panca97/Des...
7755 sym R (11615 sym/37 pcs) 14 img
Market Basket Analysis
Association rules is an unsupervised learning technique which aims to describe and discover regularities between items in transaction data. It is often used in basket analysis in sales to check if there are some general patterns in customers behaviour. If customer buys X, he also tends to buy Y This is the statement that advice the sale departm...
4110 sym R (7714 sym/19 pcs) 7 img