Machine Learning – Definition

What Is Machine Learning?

Machine learning is more than just a buzz-word — it is a technological tool that operates on the concept that a computer can learn information without human mediation. It uses algorithms to examine large volumes of information or training data to discover unique patterns. This system analyzes these patterns, groups them accordingly, and makes predictions. With traditional machine learning, the computer learns how to decipher information as it has been labeled by humans — hence, machine learning is a program that learns from a model of human-labeled datasets.

It is unique in how it becomes, in a way, intuitive. Through repetition, it learns by inference without a need to be deliberately programmed each and every time. However, a caveat: Machine learning can make mistakes and appropriate caution should be used. 1

Machine learning proves to be useful especially in today’s big data world. We come into contact with machine learning on a daily basis. It supports technologies such as identifying voice commands on our phones, recommending which songs to listen to on Spotify or which items to purchase next on Amazon, and even determining the fastest way to reach your destination on Waze, to name a few.

How Machine Learning Can Help Businesses

Machine Learning helps protect businesses from cyberthreats. However, it works best as part of a multilayered security solution.

Machine learning is also used in healthcare, helping doctors make better and faster diagnoses of diseases, and in financial institutions, detecting fraudulent activity that doesn’t fall within the usual spending patterns of consumers.

Machine Learning Algorithm Types

Supervised Machine Learning

The traditional machine learning type is called supervised machine learning, which necessitates guidance or supervision on the known results that should be produced. In supervised machine learning, the machine is taught how to process the input data. It is provided with the right training input, which also contains a corresponding correct label or result. From the input data, the machine is able to learn patterns and, thus, generate predictions for future events. A model that uses supervised machine learning is continuously taught with properly labeled training data until it reaches appropriate levels of accuracy.

Unsupervised Machine Learning

Unsupervised machine learning, through mathematical computations or similarity analyses, draws unknown conclusions based on unlabeled datasets.An unsupervised machine learning model learns to find the unseen patterns or peculiar structures in datasets. In unsupervised machine learning, the machine is able to understand and deduce patterns from data without human intervention. It is especially useful for applications where unseen data patterns or groupings need to be found or the pattern or structure searched for is not defined. This also refers to clustering.

Instance-Based Machine Learning

Another type is instance-based machine learning, which correlates newly encountered data with training data and creates hypotheses based on the correlation. To do this, instance-based machine learning uses quick and effective matching methods to refer to stored training data and compare it with new, never-before-seen data. It uses specific instances and computes distance scores or similarities between specific instances and training instances to come up with a prediction. An instance-based machine learning model is ideal for its ability to adapt to and learn from previously unseen data.

Machine Learning and Cybersecurity

The emergence of ransomware has brought machine learning into the spotlight, given its capability to detect ransomware attacks at time zero.

Evolution is malware’s game. A few years ago, attackers used the same malware with the same hash value — a malware’s fingerprint — multiple times before parking it permanently. Today, these attackers use some malware types that generate unique hash values frequently. For example, the Cerber ransomware can generate a new malware variant — with a new hash value every 15 seconds.This means that these malware are used just once, making them extremely hard to detect using old techniques. Enter machine learning. With machine learning’s ability to catch such malware forms based on family type, it is without a doubt a logical and strategic cybersecurity tool. 

Machine learning algorithms are able to make accurate predictions based on previous experience with malicious programs and file-based threats. By analyzing millions of different types of known cyber risks, machine learning is able to identify brand-new or unclassified attacks that share similarities with known ones.

From predicting new malware based on historical data to effectively tracking down threats to block them, machine learning showcases its efficacy in helping cybersecurity solutions bolster overall cybersecurity posture.

And though machine learning has become a major talking point in cybersecurity fairly recently, it has already been an integrated tool in Trend Micro’s security solutions since 2005 — way before the buzz ever started.

Machine Learning-powered Threats

Advanced technologies such as machine learning and AI are not just being utilized for good — malicious actors are also abusing these for nefarious purposes. In fact, in recent years, IBM developed a proof of concept (PoC) of an ML-powered malware called DeepLocker, which uses a form of ML called deep neural networks (DNN) for stealth.

There are other ways in which cybercriminals exploit these technologies. A popular example are deepfakes, which are fake hyperrealistic audio and video materials that can be abused for digital, physical, and political threats. Deepfakes are crafted to be believable — which can be used in massive disinformation campaigns that can easily spread through the internet and social media. Deepfake technology can also be used in business email compromise (BEC), similar to how it was used against a UK-based energy firm. Cybercriminals sent a deepfake audio of the firm’s CEO to authorize fake payments, causing the firm to transfer 200,000 British pounds (approximately US$274,000 as of writing) to a Hungarian bank account.

We discuss the current and possible future ML- and AI-powered threats here:

Foreseeing a New Era: Cybercriminals Using Machine Learning to Create Highly Advanced Threats

We listed a rundown of PoCs and real-life attacks where machine learning was weaponized to get a clearer picture of what is possible and what is already a reality with regard to machine learning-powered cyberthreats.

Exploiting AI: How Cybercriminals Misuse and Abuse AI and ML

We discuss the present state of the malicious uses and abuses of AI and ML and the plausible future scenarios in which cybercriminals might abuse these technologies for ill gain.

How Does Trend Micro Use Machine Learning?

Machine learning is a key technology in the Trend Micro™ XGen™ security, a multi-layered approach to protecting endpoints and systems against different threats, blending traditional security technologies with newer ones and using the right technique at the right time. 

For over a decade, Trend Micro has been harnessing the power of machine learning to eliminate spam emails, calculate web reputation, and chase down malicious social media activity. Trend Micro continuously develops the latest machine learning algorithms to analyze large volumes of data and predict the maliciousness of previously unknown file types.

Connected Threat Defense for Tighter Security

Learn how Trend Micro’s Connected Threat Defense can improve an organizations security against new, 0-day threats by connecting defense, protection, response, and visibility across our solutions. Automate the detection of a new threat and the propagation of protections across multiple layers including endpoint, network, servers, and gateway solutions.

Trend Micro’s Machine Learning Milestones

YearMachine Learning Milestone
2005As early as 2005, Trend Micro has utilized machine learning to combat spam emails via the Trend Micro Anti Spam Engine (TMASE) and Hosted Email Security (HES) solutions.
2009To accurately assign reputation ratings to websites (from pornography to shopping and gambling, among others), Trend Micro has been using machine learning technology in its Web Reputation Services since 2009.
2010Trend Micro’s Script Analyzer, part of the Deep Discovery™ solution, uses a combination of machine learning and sandbox technologies to identify webpages that use exploits in drive-by downloads.
2012With the goal of helping law enforcement with cybercriminal investigations dealing specifically with targeted attacks, Trend Micro has developed SPuNge, a system that uses a combination of clustering and correlation techniques to “identify groups of machines that share a similar behavior with respect to the malicious resources they access and the industry in which they operate.”
2013Trend Micro developed Trend Micro Locality Sensitive Hashing (TLSH), an approach to Locality Sensitive Hashing (LSH) that can be used in machine learning extensions of whitelisting. It generates hash values that can be analyzed for whitelisting purposes. In 2013, Trend Micro open sourced TLSH via GitHub to encourage proactive collaboration.
2015In 2015, Trend Micro successfully employed machine learning in its Mobile App Reputation Service (MARS) for both iOS and Android, as well as in its mobile security products (Trend Micro™ Mobile Security for Android™ for end users and Trend Micro™ Mobile Security for Enterprise for organizations).Machine learning algorithms enable real-time detection of malware and even unknown threats using static app information and dynamic app behaviors. These algorithms used in Trend Micro’s multi-layered mobile security solutions are also able to detect repacked apps and help capacitate accurate mobile threat coverage in the TrendLabs Security Intelligence Blog.Since 2015, Trend Micro has topped the AV Comparatives’ Mobile Security Reviews. The machine learning initiatives in MARS are also behind Trend Micro’s mobile public benchmarking continuously being at a 100 percent detection rate — with zero false warnings — in AV-TEST’s product review and certification reports in 2017.
2017Predictive Machine Learning Engine was developed in 2016 and is a key part of the Trend Micro XGen solution. It uses two types of machine learning: pre-execution machine learning that identifies malicious files based on the file structure, and run-time machine learning for files that execute malicious behavior.
 AV-TEST featured Trend Micro Antivirus Plus solution on their MacOS Sierra test, which aims to see how security products will distinguish and protect the Mac system against malware threats. Trend Micro’s product has a detection rate of 99.5 percent for 184 Mac-exclusive threats, and more than 99 percent for 5,300 Windows test malware threats. It also has an additional system load time of just 5 seconds more than the reference time of 239 seconds.Overall, at 99.5 percent, AV-TEST reported that Trend Micro’s Mac solution “provides excellent detection of malware threats and is also well recommended” with its minimal impact on system load (something more than 2 percent).
 On February 7, 2017, Trend Micro further solidified its position at the forefront of machine learning technology — by being the first standalone next-generation intrusion prevention system (NGIPS) vendor to use machine learning in detecting and blocking attacks in-line in real time.The patent-pending machine learning capabilities are incorporated in the Trend Micro™ TippingPoint® NGIPS solution, which is a part of the Network Defense solutions powered by XGen security.Through advanced machine learning algorithms, unknown threats are properly classified to be either benign or malicious in nature for real-time blocking — with minimal impact on network performance.

Read: Machine Learning Masters ]

Trend Micro’s Dual Approach to Machine Learning

Machine learning at the endpoint, though relatively new, is very important, as evidenced by fast-evolving ransomware’s prevalence. This is why Trend Micro applies a unique approach to machine learning at the endpoint — where it’s needed most.

Pre-execution machine learning, with its predictive ability, analyzes static file features and makes a determination of each one, blocks off malicious files, and reduces the risk of such files executing and damaging the endpoint or the network. Run-time machine learning, meanwhile, catches files that render malicious behavior during the execution stage and kills such processes immediately.

Both machine learning techniques are geared towards noise cancellation, which reduces false positives at different layers.

A high-quality and high-volume database is integral in making sure that machine learning algorithms remain exceptionally accurate. Trend Micro™ Smart Protection Network™ provides this via its hundreds of millions of sensors around the world. On a daily basis, 100 TB of data are analyzed, with 500,000 new threats identified every day. This global threat intelligence is critical to machine learning in cybersecurity solutions.

The Trend Micro™ XGen page provides a complete list of security solutions that use an effective blend of threat defense techniques — including machine learning.

Trend Micro’s Predictive Machine Learning Technology

Data is vital to machine learning. Traditional machine learning models get inferences from historical knowledge, or previously labeled datasets, to determine whether a file is benign, malicious, or unknown.

We developed a patent-pending innovation, the TrendX Hybrid Model, to spot malicious threats from previously unknown files faster and more accurately. This machine learning model has two training phases — pre-training and training — that help improve detection rates and reduce false positives that result in alert fatigue.

Learn more about how we utilize both static and dynamic features to accurately and efficiently analyze unknown files here:

Faster and More Accurate Malware Detection Through Predictive Machine Learning

We have developed a machine learning model called TrendX Hybrid Model that uses two training phases — pre-training and training — and allows us to correlate static and behavior features to improve detection rates and reduce false positives.

Machine Learning vs. the Hype

How Is Big Data Relevant to Machine Learning?

The prevalence of the internet and the Internet of Things (IoT) — from devices, smart homes, and connected cars to smart cities — has made available large amounts of digital data that are generated on a daily basis, all available for collecting, analyzing, and utilizing.

These large amounts of data is aptly called big data. It is a combination of structured data (searchable by algorithms and databases) and unstructured data (hard or impossible to search via machine algorithm, such as macro files, emails, web searches, and images) that continue to grow at a highly accelerated pace. In fact, it is predicted that by 2025, 180 zettabytes (180 trillion gigabytes) of data will be generated.

Big data is being harnessed by enterprises big and small to better understand operational and marketing intelligences, for example, that aid in more well-informed business decisions. However, because the data is gargantuan in nature, it is impossible to process and analyze it using traditional methods.

Machine learning plays a pivotal role in addressing this predicament. Machine learning algorithms enable organizations to cluster and analyze vast amounts of data with minimal effort. But it’s not a one-way street — Machine learning needs big data for it to make more definitive predictions. Essentially, big data is necessary for machine learning to exist.

An understanding of how data works is imperative in today’s economic and political landscapes. And big data has become a goldmine for consumers, businesses, and even nation-states who want to monetize it, use it for power, or other gains.

Read: Knowledge is Power: The societal and business impact of big data ]
Read: Big data analytics in the real world: Unique big data use cases ]

The world of cybersecurity benefits from the marriage of machine learning and big data. As the current cyberthreat environment continues to expand exponentially, organizations can utilize big data and machine learning to gain a better understanding of threats, determine fraud and attack trends and patterns, as well as recognize security incidents almost immediately — without human intervention.

Read: Big data and machine learning: A perfect pair for cyber security? ]
Read: Machine learning and the fight against ransomware ]
Read: Artificial intelligence could remake cyber security – and malware ]

Cognizant of these benefits, Trend Micro has partnered up with Hadoop developers to help improve its security model. Hadoop is a popular big data framework used by giant tech companies such as Amazon Web Services, IBM, and Microsoft.

Read: Securing Big Data and Hadoop ]

Are Data Mining and Machine Learning the Same?

Despite their similarities, data mining and machine learning are two different things. Both fall under the realm of data science and are often used interchangeably, but the difference lies in the details — and each one’s use of data.

Data mining is defined as the process of acquiring and extracting information from vast databases by identifying unique patterns and relationships in data for the purpose of making judicious business decisions. Data mining is effectively used for different purposes. A clothing company, for example, can use data mining to learn which items their customers are buying the most, or sort through thousands upon thousands of customer feedback, so they can adjust their marketing and production strategies. 

Machine learning, on the other hand, uses data mining to make sense of the relationships between different datasets to determine how they are connected. Machine learning uses the patterns that arise from data mining to learn from it and make predictions.

To simplify, data mining is a means to find relationships and patterns among huge amounts of data while machine learning uses data mining to make predictions automatically and without needing to be programmed.

Can end-to-end deep learning solutions replace expert-supported AI solutions?

ML- and AI-powered solutions make use of expert-labeled data to accurately detect threats. However, some believe that end-to-end deep learning solutions will render expert handcrafted input to become moot. There have already been prior research into the practical application of end-to-end deep learning to avoid the process of manual feature engineering. However, deeper insight into these end-to-end deep learning models — including the percentage of easily detected unknown malware samples — is difficult to obtain due to confidentiality reasons.

In an attempt to discover if end-to-end deep learning can sufficiently and proactively detect sophisticated and unknown threats, we conducted an experiment using one of the early end-to-end models back in 2017. Based on our experiment, we discovered that though end-to-end deep learning is an impressive technological advancement, it less accurately detects unknown threats compared to expert-supported AI solutions.

Learn more about our experiment that measured the detection rates of end-to-end deep learning technology here:

Diving Into End-to-End Deep Learning for Cybersecurity

We look into developments in end-to-end deep learning for cybersecurity and provide insights into its current and future effectiveness.

Is Machine Learning a Security Silver Bullet?

Machine learning is a useful cybersecurity tool — but it is not a silver bullet. While others paint machine learning as a magical black box or a complicated mathematical system that can teach itself to generate accurate predictions from data with possible false positives, we at Trend Micro view it as one valuable addition to other techniques that make up our multi-layer approach to security.

Machine learning has its strengths. It is effective in catching ransomware as-it-happens and detecting unique and new malware files. It is not the sole cybersecurity solution, however. Trend Micro recognizes that machine learning works best as an integral part of security products alongside other technologies.

Trend Micro takes steps to ensure that false positive rates are kept at a minimum. Employing different traditional security techniques at the right time provides a check-and-balance to machine learning, while allowing it to process the most suspicious files efficiently.

A multi-layered defense to keeping systems safe — a holistic approach — is still what’s recommended. And that’s what Trend Micro does best.

What Is Machine Learning? A Definition

The robot-depicted world of our not-so-distant future relies heavily on our ability to deploy artificial intelligence (AI) successfully. However, transforming machines into thinking devices is not as easy as it may seem. Strong AI can only be achieved with machine learning (ML) to help machines understand as humans do.

Machine learning can be confusing, so it is important that we begin by clearly defining the term:

Machine learning is an application of AI that enables systems to learn and improve from experience without being explicitly programmed. Machine learning focuses on developing computer programs that can access data and use it to learn for themselves.

How Does Machine Learning Work?
Similar to how the human brain gains knowledge and understanding, machine learning relies on input, such as training data or knowledge graphs, to understand entities, domains and the connections between them. With entities defined, deep learning can begin.

The machine learning process begins with observations or data, such as examples, direct experience or instruction. It looks for patterns in data so it can later make inferences based on the examples provided. The primary aim of ML is to allow computers to learn autonomously without human intervention or assistance and adjust actions accordingly.

Why Is Machine Learning Important?
Machine learning as a concept has been around for quite some time. The term “machine learning” was coined by Arthur Samuel, a computer scientist at IBM and a pioneer in AI and computer gaming. Samuel designed a computer program for playing checkers. The more the program played, the more it learned from experience, using algorithms to make predictions.

As a discipline, machine learning explores the analysis and construction of algorithms that can learn from and make predictions on data.

ML has proven valuable because it can solve problems at a speed and scale that cannot be duplicated by the human mind alone. With massive amounts of computational ability behind a single task or multiple specific tasks, machines can be trained to identify patterns in and relationships between input data and automate routine processes.

Data Is Key: The algorithms that drive machine learning are critical to success. ML algorithms build a mathematical model based on sample data, known as “training data,” to make predictions or decisions without being explicitly programmed to do so. This can reveal trends within data that information businesses can use to improve decision making, optimize efficiency and capture actionable data at scale.
AI Is the Goal: ML provides the foundation for AI systems that automate processes and solve data-based business problems autonomously. It enables companies to replace or augment certain human capabilities. Common machine learning applications you may find in the real world include chatbots, self-driving cars and speech recognition.

Machine Learning Is Widely Adopted
Machine learning is not science fiction. It is already widely used by businesses across all sectors to advance innovation and increase process efficiency. In 2021, 41% of companies accelerated their rollout of AI as a result of the pandemic. These newcomers are joining the 31% of companies that already have AI in production or are actively piloting AI technologies.

Data security: Machine learning models can identify data security vulnerabilities before they can turn into breaches. By looking at past experiences, machine learning models can predict future high-risk activities so risk can be proactively mitigated.
Finance: Banks, trading brokerages and fintech firms use machine learning algorithms to automate trading and to provide financial advisory services to investors. Bank of America is using a chatbot, Erica, to automate customer support.
Healthcare: ML is used to analyze massive healthcare data sets to accelerate discovery of treatments and cures, improve patient outcomes, and automate routine processes to prevent human error. For example, IBM’s Watson uses data mining to provide physicians data they can use to personalize patient treatment.
Fraud detection: AI is being used in the financial and banking sector to autonomously analyze large numbers of transactions to uncover fraudulent activity in real time. Technology services firm Capgemini claims that fraud detection systems using machine learning and analytics minimize fraud investigation time by 70% and improve detection accuracy by 90%.
Retail: AI researchers and developers are using ML algorithms to develop AI recommendation engines that offer relevant product suggestions based on buyers’ past choices, as well as historical, geographic and demographic data.

Training Methods for Machine Learning Differ
Machine learning offers clear benefits for AI technologies. But which machine learning approach is right for your organization? There are many to ML training methods to choose from including:

  • supervised learning
  • unsupervised learning
  • semi-supervised learning

Let’s see what each has to offer.

Supervised Learning: More Control, Less Bias
Supervised machine learning algorithms apply what has been learned in the past to new data using labeled examples to predict future events. By analyzing a known training dataset, the learning algorithm produces an inferred function to predict output values. The system can provide targets for any new input after sufficient training. It can also compare its output with the correct, intended output to find errors and modify the model accordingly.

Unsupervised Learning: Speed and Scale
Unsupervised machine learning algorithms are used when the information used to train is neither classified nor labeled. Unsupervised learning studies how systems can infer a function to describe a hidden structure from unlabeled data. At no point does the system know the correct output with certainty. Instead, it draws inferences from datasets as to what the output should be.

Reinforcement Learning: Rewards Outcomes
Reinforcement machine learning algorithms are a learning method that interacts with its environment by producing actions and discovering errors or rewards. The most relevant characteristics of reinforcement learning are trial and error search and delayed reward. This method allows machines and software agents to automatically determine the ideal behavior within a specific context to maximize its performance. Simple reward feedback — known as the reinforcement signal — is required for the agent to learn which action is best.

Machine Learning Is Not Perfect
It is important to understand what machine learning can and cannot do. As useful as it is in automating the transfer of human intelligence to machines, it is far from a perfect solution to your data-related issues. Consider the following shortcomings before you dive too deep into the ML pool:

Machine learning is not based in knowledge. Contrary to popular belief, machine learning cannot attain human-level intelligence. Machines are driven by data, not human knowledge. As a result, “intelligence” is dictated by the volume of data you have to train it with.
Machine learning models are difficult to train. Eighty-one percent of data scientists admit that training AI with data is more difficult than expected. It takes time and resources to train machines. Massive data sets are needed to create data models, and the process involves manually pre-tagging and categorizing data sets. This resource drain can create latency and bottlenecks in advancing ML initiatives.
Machine learning is prone to data issues. Ninety-six percent of companies have experienced training-related problems with data quality, data labeling and building model confidence. Those training-related problems are a key reason why seventy-eight percent of ML projects stall prior to deployment. This has created an extraordinarily high threshold for ML success.
Machine learning is often biased. Machine learning systems are known for operating in a black box, meaning you have no visibility into how the machine learns and makes decisions. Thus, if you identify an instance of bias, there is no way to identify what caused it. Your only recourse is to retrain the algorithm with additional data, but that is no guarantee to resolve the issue.

The Future of Machine Learning: Hybrid AI
For all of its shortcomings, machine learning is still critical to the success of AI. This success, however, will be contingent upon another approach to AI that counters its weaknesses, like the “black box” issue that occurs when machines learn unsupervised. That approach is symbolic AI, or a rule-based methodology toward processing data. A symbolic approach uses a knowledge graph, which is an open box, to define concepts and semantic relationships.

Together, ML and symbolic AI form hybrid AI, an approach that helps AI understand language, not just data. With more insight into what was learned and why, this powerful approach is transforming how data is used across the enterprise.