Publications by Matteo Pancaldi, Riccardo Ventura

Text Mining: Tripadvisor Hotel Reviews

08.02.2021

The database that we are going to use in this projet consists of Tripadvisor reviews related to Hotels. The data is made up of 20491 reviews for 2 variables: Review = full text of review Rating = rating from 1 to 5 for each review In order to make the dataset usable and more easy to work on, we decide to extract a random sample of 1000 observat...

6272 sym R (7320 sym/29 pcs) 9 img 2 tbl

PCA on Music and Movie preferences

08.02.2021

Introduction The aim of this project is to reduce the music and movie preferences of people supervised to investigation usign Principal Component Analysis. Dataset and first view The dataset used in this project is made up of 1010 observations where each row represents a person to whom the survey was administered and 150 different columns. The v...

6129 sym R (20724 sym/39 pcs) 20 img

Campus Recruitment: Advanced Visualisation in R

08.02.2021

Campus Recruitment Factors influencing Employability The database that we are going to use in this projet consists of placement data of students in a campus. It includes secondary and higher secondary school percentage and specialisation. It also includes degree specialisation, type and Work experience and salary offers to the employed students....

13239 sym R (1966 sym/6 pcs) 44 img 2 tbl

Customer Clustering: K-means/PCA

08.02.2021

Introduction The aim of this project is to segregating customers into groups based on common characteristics. The method of segmenting or segregating clients into classes based on similar features is called consumer segmentation. This process is very usefull in order to get a better knowledge of the tastes and desires of each group and adapt the ...

5343 sym R (5399 sym/51 pcs) 16 img 1 tbl

Churn Prediction

08.02.2021

Description of dataset The dataset for our classification Machine Learning Project consists of all relevant Telco customer data and it is taken from Kaggle and stems from the IBM sample data set collection. (https://www.kaggle.com/blastchar/telco-customer-churn) Each row of this data represents a customer, each column contains customer’s attrib...

10773 sym R (29787 sym/50 pcs) 21 img 2 tbl

Text Mining: Amazon Plus vs Amazon Show

08.02.2021

Amazon products Review The database that we are going to use in this projet consists of 5000 consumer reviews for Amazon products. The variables that we want to include in our task are: name = name of Amazon product review.rating = rating from 1 to 5 for each review review.text = full text of each review Amazon_P <- read.csv('/Users/panca97/Des...

7755 sym R (11615 sym/37 pcs) 14 img

Market Basket Analysis

08.02.2021

Association rules is an unsupervised learning technique which aims to describe and discover regularities between items in transaction data. It is often used in basket analysis in sales to check if there are some general patterns in customers behaviour. If customer buys X, he also tends to buy Y This is the statement that advice the sale departm...

4110 sym R (7714 sym/19 pcs) 7 img