Please use this as a guideline instead of following it literally. With changing datasets, the numbers may be off. Using your judgement instead of blindly following instructions is an important trait. Creativity with justification is encouraged. User Based Association SQL commands: To get frequently used milestones: SELECT `milestone_name`, count(*) as frequency FROM `rawdataDec15` group by milestone_name order by frequency desc Getting list of features for each user: create table userMilestoneDec15 as SELECT user_id, milestone_name FROM `rawdataDec15` order by user_id,milestone create table userDistinctMilestoneDec15 as select distinct * from userMilestoneDec15 Having difficulty with phymyadmin with export: downloading used command line: mysql -u cs4477200 -p < temp.sql > userDistinctMilestoneDec15.txt temp.sql: select * from cs4477203.userDistinctMilestoneDec15; vi the file convert tabs to spaces, delete column names Create file for association java flattenForRassociation userDistinctMilestoneDec15.txt | \ grep ',' > userFeatureAssociationRDec15.csv R commands: install.packages("arules") library("arules") tr=read.transactions("userFeatureAssociationRDec15.csv",format="basket",sep=",") rules<- apriori(tr, parameter= list(supp=0.4, conf=0.5)) itemsets=unique(generatingItemsets(rules)) write(rules,file=”userRules.txt”) write(itemsets) #To get maximal sets maximal.sets<- apriori(tr, parameter= list(supp=0.4, conf=0.5, target="maximally frequent itemsets")) write(maximal.sets) Miscellaneous vi command to get precison of 2 %s/\([0,1]\)\.\([0-9][0-9]\)[0-9][0-9]*/\1\.\2/g Session based Association
SQL commands: Getting list of features for each session: create table sessionMilestoneDec15 as SELECT user_id,date, milestone_name FROM `rawdataDec15` order by user_id,date,milestone_name create table sessionDistinctMilestoneDec15 as select distinct * from sessionMilestoneDec15 Having difficulty with phymyadmin with export: downloading used command line: mysql -u cs4477200 -p < temp.sql > sessionDistinctMilestoneDec15.txt temp.sql: select * from cs4477203.sessionDistinctMilestoneDec15; Tricky processing vi the file delete column names convert first tab to -, so user-id and date are collapsed into single key. change second tab to space Create file for association java flattenForRassociation sessionDistinctMilestoneDec15.txt | \ grep ',' > sessionFeatureAssociationRDec15.csv R commands: install.packages("arules") library("arules") tr=read.transactions("sessionFeatureAssociationRDec15.csv",format="basket",sep=",") #Note had to use lower support level rules<- apriori(tr, parameter= list(supp=0.2, conf=0.5)) itemsets=unique(generatingItemsets(rules)) write(rules,file=”sessionRules.txt”) write(itemsets) #To get maximal sets maximal.sets<- apriori(tr, parameter= list(supp=0.2, conf=0.5, target="maximally frequent itemsets")) write(maximal.sets) Miscellaneous vi command to get precios of 2 %s/\([0,1]\)\.\([0-9][0-9]\)[0-9][0-9]*/\1\.\2/g