Software Development – Bluechip AI Asia, AI Development Company

Bazel Training Course

Posted on December 14, 2023 by admin

Introduction

Overview of Bazel
Understanding the Bazel architecture

Getting Started

Installing the Bazel runtime and launcher
Understanding the Bazel UI

Understanding the Project Structure and Basic Building Blocks

Project building block
Project structure
Concepts of a build logic

Invoking a Target From the Command Line

Executing a target
Commonly-used commands
Output and cache directories

Understanding the Bazel Build Lifecycle

Phases of the Bazel lifecycle
Configuration file
Programming language rules

Using Bazel Basic Automation For Java

Setting up a Java project
Building a Java project
Running the build from the command line
Inspecting the generated artifact
Deploying the Java project
Driving Bazel from the IDE
Using Bazel in IntelliJ

Bazel Dependency Management

Modeling fine-grained package granularity and dependencies
Declaring external dependencies
Declaring an external library and using it in a code
Declaring the JUnit dependency
Publishing a JAR to a Maven repository
Publishing a Java library to local Maven

Testing Automation

Performing automated tests
Executing JUnit tests

Advanced Bazel

Extension concepts
Writing and executing a genrule
Remote caching and execution
Build stamping
Bazel query
Java toolchains

Troubleshooting

Summary and Next Steps

Contemporary Development Principles and Practices Training Course

Posted on December 8, 2023 by admin

Module 1: Traditional Development Approaches

1.1 Overview of Sequential, Predictive Development Approaches
- Description of sequential, predictive ‘Waterfall’ approaches
- Timeline of evolution of Waterfall approaches
1.2 Strawman Waterfall
- Dr Winston Royce’s Waterfall model
- Benefits of Waterfall for controlling projects
- Royce’s “Inherent risks”
1.3 V-Model
- Early verification and validation
- Benefits of V-model
1.4 Incremental Models
- Example of Rational Unified Process
- Incremental delivery
- Breaking down scope and managing risk
1.5 When to Use Waterfall
- Defined process control

Module 2: Prince2 Overview

2.1 What is Prince2?
- Definition and origins
- Prince2 Certifications: Foundation, Practitioner, Agile
- Benefits of Prince2
2.2 Prince2 Methodology
- Roles – Project manager, customer, user, supplier, project board
- Management Techniques – Project assurance, project support
- Scope – Interaction with contracts and contractual management
- Controlling Change – Risk, quality, and change management
2.3 Prince2 Process Model
- Directing a project
- Starting up a project
- Initiating a project
- Managing stage boundaries
- Controlling a stage
- Managing product delivery
- Closing a project
- Planning

Module 3: Agile Overview

3.1 Historical Overview
- Timeline of evolution of ‘Agile’ ideas 90s to present
- Early Agile approaches – Scrum, XP, DSDM
- Agile Developments – Kanban, BDD, DevOps, Scaling
3.2 The Agile Manifesto
- Background to creating the Manifesto
- Agile Manifesto overview
  - Individuals and interactions over processes and tools
  - Working software over comprehensive documentation
  - Customer collaboration over contract negotiation
  - Responding to change over following a plan

Module 4: Agile Principles

4.1 The 12 Agile Principles
- Group discussion on each principle
4.2 Summary of Agile concepts
- Iterative planning and development
- Continuous improvement
- Continuous learning
- Collaboration and face-to-face communication
- Collective accountability
- Cross-functional teams

Module 5: Agile Project Management with Scrum

5.1 The Scrum Framework
- Overview – Scrum Guide 2016
- Scrum roles and responsibilities – Scrum Master, Product Owner, Team
- Scrum events – Sprint, Sprint Planning, Review, Retrospective, Daily Scrum
- Scrum artefacts – Product Backlog, Sprint Backlog, Product Increment
5.2 Agile Project Management Principles
- Empirical Process Control
- Iterative planning and reporting
- Continuous Improvement and retrospection
- Resource management and teams

Module 6: Software Testing

6.1 Testing Fundamentals
- The Fundamental Test Process
  - Planning, Analysis & Design, Execution, Evaluation, Closure
- Test levels – unit, integration, system, user acceptance
- Test approaches – requirements-based, risk-based, experience-based
- Test design techniques – white-box, black-box techniques
6.2 Agile Testing
- Agile Testing Quadrants overview – test strategy, planning
- Test-driven development
- Test automation principles – test automation pyramid
6.3 Test Types
- Technology-facing tests that guide development
  - Unit testing, TDD, smoke tests
- Business-facing tests that guide development
  - Story tests, examples, acceptance testing
- Business-facing tests that critique the product
  - Exploratory testing, Alpa/Beta testing, UAT
- Technology-facing tests that critique the product
  - Performance testing, usability, quality attributes

Module 7: Traditional Business Analysis

7.1 What is Business Analysis?
- Business analysis and the business analyst
- Levels of business analysis – enterprise, project, operational
- Business Analysis principles
7.2 IIBA BA Book of Knowledge – Knowledge Areas
- Business Analysis Planning and Monitoring
- Elicitation and Collaboration
- Requirements Life Cycle Management
- Strategy Analysis
- Requirements Analysis and Design Definition
- Solution Evaluation

Module 8: Agile Business Analysis

8.1 Agile Business Analysis Considerations
- Iterative development
- Cross-functional teams
- Collaboration between business and technology areas
8.2 Behaviour-Driven Development Overview
- Origins in TDD and recent developments
- Definitions – BDD, ATDD, Specification by Example
8.3 BDD Activities
- Focus on features that deliver business value
- Derive features collaboratively
- Keep options open
- Use concrete examples to illustrate features
- Write executable specifications
- Create living documentation
8.4 Agile BA Techniques & Tools
- Business value definition
- Personas
- Impact Mapping
- Real options
- User Stories and acceptance criteria
- Relative estimation
- Given-When-Then template
- Tool support for BDD

Harmony OS for Developers Training Course

Posted on November 28, 2023November 28, 2023 by admin

Introduction

Overview of Harmony OS Features and Architecture

Setting up the Development Environment

Downloading the IDE
Setting up the compilation toolchain
Setting up the device development tool (HUAWEI DevEco Device Tool)
Setting up the application development tool (HUAWEI DevEco Studio)
Obtaining the source code

Developing Connection Software

Setting up the Hi3861 environment
Developing an application that connects via WLAN
Implementing LED blinking
Integrate Vendor SDKs

Developing Device Software

Developing an Hi3516 driver
Controlling the screen on a device
Controlling a camera

Developing Application Software

Navigating HUAWEI DevEco Studio
Using JavaScript framework, components, and interfaces
Developing a vision application

Developing an IoT application

Using the camera module without a screen

Developing the Kernel

Understanding the HarmonyOS kernel
Working with functions, file system, libraries, and commissioning functions
Using the HDF driver framework, driver platform, and peripheral functions

Developing components

Understanding components
Define a component based on specifications
Developing a HarmonyOS component and distribution

Exploring the Security Mechanisms

Understanding hardware, system, data, device interconnection, and application security.
Recommended practices

API Testing with JavaScript and Cypress 10

Posted on June 23, 2023 by admin

Common HTTP methods

How to execute web requests

Basic and advanced API automation

Setup, Run Test and API Implementation

Requirements

Basic understanding of JavaScript

Description

API testing with JavaScript is an important part of the software development process. It ensures that the application is working as expected and is able to facilitate communication between different components.

By running tests on the API, developers can ensure that the functionality of the application is intact and that any changes made to the system don’t affect the overall performance of the application. JavaScript makes it easy to write and execute tests, enabling developers to quickly and easily verify that their software is working correctly.

With the right testing tools, developers can ensure that the APIs they are working with are reliable and secure. This is essential for ensuring that the application is able to provide the desired results.

In this new tutorial, you will learn how to implement automated API testing with Cypress 10.

Ready to start? Check out the full curriculum and jump into the tutorial.

What our students are saying?

“I appreciate the time the instructor put on this course as well as the opportunity to get familiar with TestProject free of charge. Well explained, however, if you are using Windows and you are new in Automation Testing, you might find it a bit challenging with adding the SDK Token in your system environment since the instruction used MAC which is completely a different way with windows. Other than that… I appreciate a lot this free course …. thank you so much”

“I like the fundamental approach used by the author. Will see:) To prepare for such a course – it’s a really hard and big job. Respect and thank you.” – Serhii Kovalenko

“Wonderful content and things explained in a nutshell. Overwhelmed by Author’s dedication to putting things in such a way that any novice or manual tester can follow and understand and definitely be on-boarded as a Selenium Automation Engineer the next day at work. Thanks a million times for creating these courses! One Stop for Automation.” – Rupashree Geethaaviji Ananthakrishna

“I am familiar with Nikolay from a course I saw on TestAutomation and have the highest regard for him. Glad to see him on Udemy.” – Annamalai Viswanathan

Who this course is for:

Intermediate developers looking to learn how to implement API testing with JavaScript.

Course content

6 sections • 28 lectures • 1h 10m total lengthExpand all sections

Introduction1 lecture • 2min

Introduction01:30

Run Test and Setup2 lectures • 6min

Run Test01:19
Setup04:40

Cypress Test Exercises and Solution2 lectures • 3min

Cypress Test Exercise01:32
Cypress Test Solution01:32

API Testing, Exercises and Solutions16 lectures • 44min

API Testing Overview03:52
What Is API Testing and Its Advantages04:50
Automating a GET Exercise01:40
Automating a GET Solution00:51
API GET Exercises01:02
API GET Request Solution02:56
Implementing GET Method (String) Options04:24
API POST Exercise01:03
API POST Solution07:38
501 Json Placeholder Exercise01:09
501 Json Placeholder Solution03:23
Automating A POST Exercise00:55
Automating A POST Solution04:08
POST Exercise02:09
POST Solution02:37
Summary01:06

Authentication with APIs, Exercises and Solutions5 lectures • 15min

Authentication with APIs00:55
Authentication with APIs Exercise01:02
Authentication with APIs Solution08:29
Cypress REPO Exercise00:46
Cypress REPO Solution04:06

Closing Remarks2 lectures • 1min

Conclusions00:16
Bonus Lecture00:16

Software Development – Level 1 to 3 available – FREE course for Londoners

Posted on June 23, 2023 by admin

Overview

Become a software developer with a FREE course at ELATT

Our courses are currently taught live online using Zoom. We have daytime and evening courses available.

We are a charity supporting Londoners to better themselves through education for over 35 years. Please only apply if you live in a London Borough.

Resources

Software Development Level 1 Course outline

Description

About Software Development

Our software development course will teach you to design and develop a range of programs. You will learn Java as your core coding language and key soft skills such as project management, databases, case diagrams and essential office skills. Your tutor will help you develop proven professional coding skills through object oriented coding, design patterns and frameworks.

Software engineering combines problem solving, creativity and analytical skills. It suits those who are detail oriented with an eye for solutions. There is an ever-increasing demand for new software and a wide variety of clients across a range of sectors are looking to employ skilled software developers.

You will be taught by experienced teachers and we also support you with employability, including opportunities to speak to professionals and access mentoring, volunteering and work experience.

Level 2 Award

IT security for users: use basic techniques in the operation of an IT system to create, edit and view Python programs online
Specialist software: learn Online Python IDLE to create basic programs
Using collaborative technologies: use IT tools and devices for collaborative working and communications in and out of the classroom such as online lessons tools, instant messaging/chat, online forums and more
IT software fundamentals: learn the basics of software fundamentals, including traditional software development cycles

Level 2 Certificate

Improving productivity: plan, produce and evaluate your Python programme
IT security for users: learn about methods to minimise security risks to IT systems and data
Specialist software: use editors to create Graphical User Interfaces (GUI) and Python modules
Drawing and planning software: learn how to use free online software to create algorithmic flowcharts for Python programming
Presentation software: prepare a pitch for your app

Level 2 Diploma

Customer support provision: learn professional customer support behaviours and practices
Software testing/software testing fundamentals: test the functionality of a software application to find out whether the software meets the specified requirements
Creating an event driven computer programme: create a Graphical User Interface (GUI) app
Creating an procedural computer programme: create a Procedures and Classes Library for an app

Level 3 Diploma

Customer support provision: including software documentation and remote support
Software design fundamentals: the Software Development Life Cycle (SDLC)
Principles of ICT system and data security: understanding threats to ICT systems and site data encryption and cryptography
Software testing: test the functionality of your software. including PHP unit testing
Presentation software: use Microsoft PowerPoint to make a business pitch for your software
Develop software: using SQL (Structured Query Language)
The technologies of the internet: learn the principles, technologies, security and support systems that allow the internet to work, such as DevOps, website domain hosting and FTP
Creating an object-oriented computer programme: learn Object-Orientated Programming (OOP) with PHP

Understanding Coding Level 2

Posted on June 23, 2023 by admin

Overview

Are you looking to gain a greater understanding of the different stages in the software development cycle? Do you want to gain knowledge of coding terminology and the key principles of writing code? Do you want to have a certificate to add to your CV?

This qualification is designed for learners who want to gain knowledge of coding. The qualification may also support progression into further study in coding or other topics within the digital sector.

This qualification aims to:

focus on an introduction to coding within the digital sector.

The objectives of this qualification are to enable the learner to gain an understanding of:

principles of coding
the stages of the software development cycle
coding terminology and the key principles of writing code
different coding types
best practices in coding
methods of testing and the DevOps process
effective communication and project management in coding.

Successful completion of the course leads to a Level 2 Certificate in Understanding Coding. This is a government funded nationally-accredited qualification – which means that if you are eligible you can study for free!

Benefits of studying with vision2learn:

It’s FREE for eligible learners.
Study 100% online at your own pace, whenever and wherever you like.
Gain a vocational qualification valued by UK employers.
One-to-one support from a dedicated tutor throughout your course.
Free additional online and telephone customer support.

Description

Testimonial from our learners:

“My entire experience with vision2learn has been nothing but outstanding from start to finish. The signing up process was not at all difficult and I felt supported throughout.
I could not recommend this company highly enough and would definitely use them again”

Unit 1: Understand the principles of coding

Learning topics:

Know about coding languages

Understand job functions in coding

Understand the software development lifecycle

Unit 2: Understand terminology used in coding

Learning topics:

Understand basic computer terminology

Understand coding acronyms and terminology

Understand key principles when writing code

Unit 3: Understand coding design principles

Learning topics:

Know about different coding types

Understand what is meant by compiled and interpreted code

Understand what is meant by a pure function

Unit 4: Understand the processes and practice of coding

Learning topics:

Understand the principles of best practice in coding

Understand the methods of testing code

Understand DevOps processes

Understand the importance of robust coding

Unit 5: Understand the importance of communication and project management in coding

Learning topics:

Understand the importance of communication

Understand the purpose of feedback in developing communication skills

Understand the principles of project management

Understand Agile developments

Who is this course for?

This qualification is designed for learners who are looking to gain an introduction to coding. It will support the learner to progress into further study and support those interested in progressing to employment in a coding or IT related role.

Requirements

To be eligible for a free place you must be:

19 or over at the start of the current academic year 31st August
Resident of England only and does not cover Northern Ireland or Wales
UK or EU resident of three years or longer or outside of the EU be able to provide Visa Evidence
Your employment status or prior level of qualification may affect your eligibility to study with some colleges.
You are able to complete the course regardless, if you are working or receiving benefits. If you receive benefits, you will be asked to provide evidence in the form a current award letter.

QuantConnect Boot Camp in Python

Posted on June 3, 2023 by admin

Working with financial and alternative data

Implementing trading strategies

QuantConnect’s API

Robust algorithm design

Requirements

A good working knowledge of python is required.

Description

In QuantConnect’s Boot Camp tutorial series you’ll learn the tools for quantitative trading. You’ll build skills in finance, statistics, and software development while learning about QuantConnect’s API with code-along tasks. After this course, you’ll be able to implement your own trading strategies in python and have a foundation in robust algorithm design.

We’ll start out with the fundamentals for individual algorithm creation and move on to building an institutional-grade system using the Algorithm Framework. You’ll be able to use its architecture to deploy your own flexible investment strategies.

In each lesson, we’ll code together on QuantConnect’s integrated development environment to create algorithms that you can backtest and use. You’ll manage your portfolio, use indicators in technical trading strategies, trade on universes of assets, automate trades based on market behavior, and understand how data moves in and out of your algorithm.

QuantConnect is one of the largest quantitative trading communities in the world. Part of what makes it so special is the diverse backgrounds in the community. We’re so excited to make these skills accessible to you so you can implement your own unique ideas. Hope to see you in the first lesson!

Who this course is for:

Python developers who want to build trading algorithms.

Course content

What is machine learning? Intelligence derived from data

Posted on May 16, 2023 by admin

Machine learning algorithms learn from data to solve problems that are too complex to solve with conventional programming

Machine learning defined

Machine learning is a branch of artificial intelligence that includes methods, or algorithms, for automatically creating models from data. Unlike a system that performs a task by following explicit rules, a machine learning system learns from experience. Whereas a rule-based system will perform a task the same way every time (for better or worse), the performance of a machine learning system can be improved through training, by exposing the algorithm to more data.

Machine learning algorithms are often divided into supervised (the training data are tagged with the answers) and unsupervised (any labels that may exist are not shown to the training algorithm). Supervised machine learning problems are further divided into classification (predicting non-numeric answers, such as the probability of a missed mortgage payment) and regression (predicting numeric answers, such as the number of widgets that will sell next month in your Manhattan store).

Unsupervised learning is further divided into clustering (finding groups of similar objects, such as running shoes, walking shoes, and dress shoes), association (finding common sequences of objects, such as coffee and cream), and dimensionality reduction (projection, feature selection, and feature extraction).

[ When the robots come: ChatGPT and the ethics of AI ]

Applications of machine learning

We hear about applications of machine learning on a daily basis, although not all of them are unalloyed successes. Self-driving cars are a good example, where tasks range from simple and successful (parking assist and highway lane following) to complex and iffy (full vehicle control in urban settings, which has led to several deaths).

Game-playing machine learning is strongly successful for checkers, chess, shogi, and Go, having beaten human world champions. Automatic language translation has been largely successful, although some language pairs work better than others, and many automatic translations can still be improved by human translators.

Automatic speech to text works fairly well for people with mainstream accents, but not so well for people with some strong regional or national accents; performance depends on the training sets used by the vendors. Automatic sentiment analysis of social media has a reasonably good success rate, probably because the training sets (e.g. Amazon product ratings, which couple a comment with a numerical score) are large and easy to access.

Automatic screening of résumés is a controversial area. Amazon had to withdraw its internal system because of training sample biases that caused it to downgrade all job applications from women.Nominations are open for the 2024 Best Places to Work in IT

Other résumé screening systems currently in use may have training biases that cause them to upgrade candidates who are “like” current employees in ways that legally aren’t supposed to matter (e.g. young, white, male candidates from upscale English-speaking neighborhoods who played team sports are more likely to pass the screening). Research efforts by Microsoft and others focus on eliminating implicit biases in machine learning.

Automatic classification of pathology and radiology images has advanced to the point where it can assist (but not replace) pathologists and radiologists for the detection of certain kinds of abnormalities. Meanwhile, facial identification systems are both controversial when they work well (because of privacy considerations) and tend not to be as accurate for women and people of color as they are for white males (because of biases in the training population).

Machine learning algorithms

Machine learning depends on a number of algorithms for turning a data set into a model. Which algorithm works best depends on the kind of problem you’re solving, the computing resources available, and the nature of the data. No matter what algorithm or algorithms you use, you’ll first need to clean and condition the data.

Let’s discuss the most common algorithms for each kind of problem.

Classification algorithms

A classification problem is a supervised learning problem that asks for a choice between two or more classes, usually providing probabilities for each class. Leaving out neural networks and deep learning, which require a much higher level of computing resources, the most common algorithms are Naive Bayes, Decision Tree, Logistic Regression, K-Nearest Neighbors, and Support Vector Machine (SVM). You can also use ensemble methods (combinations of models), such as Random Forest, other Bagging methods, and boosting methods such as AdaBoost and XGBoost.

Regression algorithms

A regression problem is a supervised learning problem that asks the model to predict a number. The simplest and fastest algorithm is linear (least squares) regression, but you shouldn’t stop there, because it often gives you a mediocre result. Other common machine learning regression algorithms (short of neural networks) include Naive Bayes, Decision Tree, K-Nearest Neighbors, LVQ (Learning Vector Quantization), LARS Lasso, Elastic Net, Random Forest, AdaBoost, and XGBoost. You’ll notice that there is some overlap between machine learning algorithms for regression and classification.

Clustering algorithms

A clustering problem is an unsupervised learning problem that asks the model to find groups of similar data points. The most popular algorithm is K-Means Clustering; others include Mean-Shift Clustering, DBSCAN (Density-Based Spatial Clustering of Applications with Noise), GMM (Gaussian Mixture Models), and HAC (Hierarchical Agglomerative Clustering).

Dimensionality reduction algorithms

Dimensionality reduction is an unsupervised learning problem that asks the model to drop or combine variables that have little or no effect on the result. This is often used in combination with classification or regression. Dimensionality reduction algorithms include removing variables with many missing values, removing variables with low variance, Decision Tree, Random Forest, removing or combining variables with high correlation, Backward Feature Elimination, Forward Feature Selection, Factor Analysis, and PCA (Principal Component Analysis).

Optimization methods

Training and evaluation turn supervised learning algorithms into models by optimizing their parameter weights to find the set of values that best matches the ground truth of your data. The algorithms often rely on variants of steepest descent for their optimizers, for example stochastic gradient descent (SGD), which is essentially steepest descent performed multiple times from randomized starting points.

Common refinements on SGD add factors that correct the direction of the gradient based on momentum, or adjust the learning rate based on progress from one pass through the data (called an epoch or a batch) to the next.

Neural networks and deep learning

Neural networks were inspired by the architecture of the biological visual cortex. Deep learning is a set of techniques for learning in neural networks that involves a large number of “hidden” layers to identify features. Hidden layers come between the input and output layers. Each layer is made up of artificial neurons, often with sigmoid or ReLU (Rectified Linear Unit) activation functions.

In a feed-forward network, the neurons are organized into distinct layers: one input layer, any number of hidden processing layers, and one output layer, and the outputs from each layer go only to the next layer.

In a feed-forward network with shortcut connections, some connections can jump over one or more intermediate layers. In recurrent neural networks, neurons can influence themselves, either directly, or indirectly through the next layer.

Supervised learning of a neural network is done just like any other machine learning: You present the network with groups of training data, compare the network output with the desired output, generate an error vector, and apply corrections to the network based on the error vector, usually using a backpropagation algorithm. Batches of training data that are run together before applying corrections are called epochs.

As with all machine learning, you need to check the predictions of the neural network against a separate test data set. Without doing that you risk creating neural networks that only memorize their inputs instead of learning to be generalized predictors.

The breakthrough in the neural network field for vision was Yann LeCun’s 1998 LeNet-5, a seven-level convolutional neural network (CNN) for recognition of handwritten digits digitized in 32×32 pixel images. To analyze higher-resolution images, the network would need more neurons and more layers.

Convolutional neural networks typically use convolutional, pooling, ReLU, fully connected, and loss layers to simulate a visual cortex. The convolutional layer basically takes the integrals of many small overlapping regions. The pooling layer performs a form of non-linear down-sampling. ReLU layers, which I mentioned earlier, apply the non-saturating activation function f(x) = max(0,x).

In a fully connected layer, the neurons have full connections to all activations in the previous layer. A loss layer computes how the network training penalizes the deviation between the predicted and true labels, using a Softmax or cross-entropy loss for classification or a Euclidean loss for regression.

Natural language processing (NLP) is another major application area for deep learning. In addition to the machine translation problem addressed by Google Translate, major NLP tasks include automatic summarization, co-reference resolution, discourse analysis, morphological segmentation, named entity recognition, natural language generation, natural language understanding, part-of-speech tagging, sentiment analysis, and speech recognition.

In addition to CNNs, NLP tasks are often addressed with recurrent neural networks (RNNs), which include the Long-Short Term Memory (LSTM) model.

The more layers there are in a deep neural network, the more computation it takes to train the model on a CPU. Hardware accelerators for neural networks include GPUs, TPUs, and FPGAs.

Reinforcement learning

Reinforcement learning trains an actor or agent to respond to an environment in a way that maximizes some value, usually by trial and error. That’s different from supervised and unsupervised learning, but is often combined with them.

For example, DeepMind’s AlphaGo, in order to learn to play (the action) the game of Go (the environment), first learned to mimic human Go players from a large data set of historical games (apprentice learning). It then improved its play by trial and error (reinforcement learning), by playing large numbers of Go games against independent instances of itself.

Robotic control is another problem that has been attacked with deep reinforcement learning methods, meaning reinforcement learning plus deep neural networks, the deep neural networks often being CNNs trained to extract features from video frames.

How to use machine learning

How does one go about creating a machine learning model? You start by cleaning and conditioning the data, continue with feature engineering, and then try every machine-learning algorithm that makes sense. For certain classes of problem, such as vision and natural language processing, the algorithms that are likely to work involve deep learning.

Data cleaning for machine learning

There is no such thing as clean data in the wild. To be useful for machine learning, data must be aggressively filtered. For example, you’ll want to:

Look at the data and exclude any columns that have a lot of missing data.
Look at the data again and pick the columns you want to use (feature selection) for your prediction. This is something you may want to vary when you iterate.
Exclude any rows that still have missing data in the remaining columns.
Correct obvious typos and merge equivalent answers. For example, U.S., US, USA, and America should be merged into a single category.
Exclude rows that have data that is out of range. For example, if you’re analyzing taxi trips within New York City, you’ll want to filter out rows with pickup or drop-off latitudes and longitudes that are outside the bounding box of the metropolitan area.

There is a lot more you can do, but it will depend on the data collected. This can be tedious, but if you set up a data-cleaning step in your machine learning pipeline you can modify and repeat it at will.

Data encoding and normalization for machine learning

To use categorical data for machine classification, you need to encode the text labels into another form. There are two common encodings.

One is label encoding, which means that each text label value is replaced with a number. The other is one-hot encoding, which means that each text label value is turned into a column with a binary value (1 or 0). Most machine learning frameworks have functions that do the conversion for you. In general, one-hot encoding is preferred, as label encoding can sometimes confuse the machine learning algorithm into thinking that the encoded column is ordered.