-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
141 lines (141 loc) · 7.4 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta name="generator" content="pandoc" />
<title></title>
<style type="text/css">code{white-space: pre;}</style>
<link rel="stylesheet" href="slides/production/common.css" type="text/css" />
</head>
<body>
<h1 id="data-mining-290">Data Mining 290</h1>
<h3 id="description">Description</h3>
<p>Learn how to obtain, clean, visualize, understand, model, and predict the world around you using data. Grading will consist of homework (30%), a midterm (30%), and a project (40%).</p>
<h3 id="instructor">Instructor</h3>
<p>Jimmy Retzlaff <jretz@ischool></p>
<h3 id="gsi">GSI</h3>
<p>Shreyas <shreyas@ischool></p>
<h3 id="textbook">Textbook</h3>
<p>Han, J., Kamber, M., & Pei, J. (2011). <em>Data Mining: Concepts and Techniques</em>, Third Edition <em>(3rd ed.)</em>. Morgan Kaufmann.</p>
<h3 id="course-discussion">Course Discussion</h3>
<p><a href="https://piazza.com/berkeley/spring2014/info290t03">Info 290T: Data Mining on Piazza</a></p>
<hr />
<h1 id="syllabus">Syllabus</h1>
<p>DM[0-9]+ indicates chapters from the text, <em>Data Mining</em>.</p>
<table>
<thead>
<tr class="header">
<th align="left">Date</th>
<th align="left">Readings</th>
<th align="left">Slides</th>
<th align="left">Homework / Project</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left">Jan 23</td>
<td align="left"><a href="http://try.github.com">Try Github</a> ; <a href="http://www.dataists.com/2010/09/a-taxonomy-of-data-science/">A Taxonomy of Data Science</a></td>
<td align="left"><a href="slides/2014-01-23-Intro.html">Class Intro</a> ; <a href="https://speakerdeck.com/seekshreyas/introduction-to-git-and-github">Tools Intro</a> by GUEST: Shreyas</td>
<td align="left"><a href="slides/2014-01-23-Lab.html">Git Intro</a></td>
</tr>
<tr class="even">
<td align="left">Jan 30</td>
<td align="left">DM1 ; <a href="http://hbswk.hbs.edu/item/6836.html">The Yelp Factor: Are Consumer Reviews Good for Business?</a></td>
<td align="left"><a href="slides/2014-01-30-CaseStudies.html">Case Studies</a> ; <a href="slides/2014-01-30-Obtaining-Data.html">Obtaining Data</a></td>
<td align="left"><a href="slides/2014-01-30-Lab.html">Obtain & Explore Data</a></td>
</tr>
<tr class="odd">
<td align="left">Feb 6</td>
<td align="left">DM2, DM3</td>
<td align="left"><a href="slides/2014-02-06-Probability.html">Probability</a> ; <a href="slides/2014-02-06-Preprocessing.html">Preprocessing</a></td>
<td align="left"><a href="slides/2014-02-06-Lab.html">Data Stats</a></td>
</tr>
<tr class="even">
<td align="left">Feb 13</td>
<td align="left">DM4, <a href="http://www.youtube.com/watch?v=SS27F-hYWfU">Apache Hadoop: Petabytes and Terawatts</a> (<a href="http://prezi.com/u0ukvqzpyh5p/apache-hadoop-petabytes-and-terawatts/">slides</a>); <a href="http://packages.python.org/mrjob/">mrjob docs</a> (for homework)</td>
<td align="left"><a href="slides/2014-02-13-Data-Warehouse.html">Data Warehouse</a> ; <a href="slides/2014-02-13-MapReduce.html">MapReduce</a></td>
<td align="left"><a href="slides/2014-02-13-Project.html">Project Details</a> ; <a href="slides/2014-02-13-mrjob.html">mrjob</a></td>
</tr>
<tr class="odd">
<td align="left">Feb 20</td>
<td align="left">DM8</td>
<td align="left"><a href="slides/2014-02-20-Decision-Trees.html">Decision Trees</a>; <a href="slides/2014-02-20-Bayes.html">Naive Bayes</a></td>
<td align="left"><a href="slides/2014-02-20-Gini.html">Gini Index</a></td>
</tr>
<tr class="even">
<td align="left">Feb 27</td>
<td align="left">DM[9.1-9.3], 9.5 ; <a href="http://scott.fortmann-roe.com/docs/BiasVariance.html">Understanding the Bias-Variance Tradeoff</a></td>
<td align="left"><a href="slides/2014-02-27-SVM.html">SVM</a> ; <a href="slides/2014-02-27-Neural-Network.html">Neural Networks</a></td>
<td align="left"><a href="slides/2014-02-27-Lab-NN.html">Neural Network Back Propagation</a></td>
</tr>
<tr class="odd">
<td align="left">Mar 6</td>
<td align="left">DM10</td>
<td align="left"><a href="slides/2014-03-06-Clustering.html">Clustering - Partitioning</a> ; <a href="slides/2014-03-06-Hierarchical.html">Clustering - Hierarchical & Density</a></td>
<td align="left"><a href="slides/2014-03-06-k-means.html">K-Means</a></td>
</tr>
<tr class="even">
<td align="left">Mar 13</td>
<td align="left">DM11.1</td>
<td align="left"><a href="slides/2014-03-13-Review.html">Review</a></td>
<td align="left">prepare 1 cheat sheet</td>
</tr>
<tr class="odd">
<td align="left">Mar 20</td>
<td align="left">1 cheat sheet</td>
<td align="left"><em>Midterm</em></td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Mar 27</td>
<td align="left">HOLIDAY</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Apr 3</td>
<td align="left">DM6</td>
<td align="left"><a href="slides/2014-03-13-Advanced-Cluster.html">Advanced Clustering</a> ; <a href="slides/2014-04-03-Frequent-Pattern.html">Frequent Patterns</a></td>
<td align="left"><a href="slides/2014-04-03-AWS.html">AWS</a> ; Project Proposal due April 9</td>
</tr>
<tr class="even">
<td align="left">Apr 10</td>
<td align="left">DM11.3; <a href="http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf">PageRank</a>; <a href="http://arxiv.org/pdf/1106.5321">Uncovering Social Network Sybils in the Wild</a></td>
<td align="left"><a href="slides/2014-04-10-Graphs.html">Graphs</a>; <a href="slides/2014-04-10-PageRank.html">PageRank</a></td>
<td align="left"><a href="slides/2014-04-10-AdjacencyRepresentations.html">Adjacency Representations</a></td>
</tr>
<tr class="odd">
<td align="left">Apr 17</td>
<td align="left">DM12; <a href="http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf">Shazam Audio Search</a></td>
<td align="left"><a href="slides/2014-04-17-Outliers.html">Outliers</a>; <a href="slides/2014-04-17-Multimedia.html">Images & Audio</a></td>
<td align="left"><a href="slides/2014-04-17-Midterm-HW.html">Midterm Review</a></td>
</tr>
<tr class="even">
<td align="left">Apr 24</td>
<td align="left"><a href="https://groups.google.com/group/gsofgs/attach/2f1cdd7a999c3ad8/embedded-plots.pdf?part=2&authuser=0">Embedded Plots</a> ; <a href="http://vis.stanford.edu/files/2011-D3-InfoVis.pdf">Data-Driven Documents</a></td>
<td align="left"><a href="slides/2014-04-24-Visualization.html">Visualization</a> ; <a href="slides/2014-04-24-Yelp-Visualization.html">Yelp's Visualizations</a></td>
<td align="left"><a href="http://vogievetsky.github.io/IntroD3/">D3 Intro</a>; <a href="slides/2014-04-24-D3.html">D3 Lab</a></td>
</tr>
<tr class="odd">
<td align="left">May 1</td>
<td align="left"><a href="http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf">A Few Useful Things to Know about Machine Learning</a> ; <a href="http://www.cs.uvm.edu/~icdm/algorithms/10Algorithms-08.pdf">Top 10 Algorithms in Data Mining</a></td>
<td align="left"><a href="slides/2014-05-01-Real-World.html">In Real Life</a></td>
<td align="left">Project Data and Presentation due May 8th</td>
</tr>
<tr class="even">
<td align="left">May 8</td>
<td align="left"></td>
<td align="left">Final Presentation</td>
<td align="left">Project Code & Papers due May 14th</td>
</tr>
<tr class="odd">
<td align="left">May 15</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">Bye!</td>
</tr>
</tbody>
</table>
</body>
</html>