21 hours (usually 3 days including breaks)
Understanding of traditional data management and analysis methods like SQL, data warehouses, business intelligence, OLAP, etc… Understanding of basic statistics and probability (mean, variance, probability, conditional probability, etc….)
If you try to make sense out of the data you have access to or want to analyse unstructured data available on the net (like Twitter, Linked in, etc…) this course is for you.
It is mostly aimed at decision makers and people who need to choose what data is worth collecting and what is worth analyzing.
It is not aimed at people configuring the solution, those people will benefit from the big picture though.
During the course delegates will be presented with working examples of mostly open source technologies.
Short lectures will be followed by presentation and simple exercises by the participants
Content and Software used
All software used is updated each time the course is run, so we check the newest versions possible.
It covers the process from obtaining, formatting, processing and analysing the data, to explain how to automate decision making process with machine learning.
- Data Sources
- Minding Data
- Recommender systems
- Target Marketing
- Structured vs unstructured
- Static vs streamed
- Attitudinal, behavioural and demographic data
- Data-driven vs user-driven analytics
- data validity
- Volume, velocity and variety of data
- Building models
- Statistical Models
- Machine learning
- kGroups, k-means, the nearest neighbours
- Ant colonies, birds flocking
- Decision trees
- Support vector machine
- Naive Bayes classification
- Neural networks
- Markov Model
- Ensemble methods
- Benefit/Cost ratio
- Cost of software
- Cost of development
- Potential benefits
- Data Preparation (MapReduce)
- Data cleansing
- Choosing methods
- Developing model
- Testing Model
- Model evaluation
- Model deployment and integration
Overview of Open Source and commercial software
- Selection of R-project package
- Python libraries
- Hadoop and Mahout
- Selected Apache projects related to Big Data and Analytics
- Selected commercial solution
- Integration with existing software and data sources