Machine Learning in Cybersecurity

Ace your homework & exams now with Quizwiz!

Steps Required to Create a ML Tool

1. Data collection - While it's possible to run and even create ML algorithms based on streaming, real-time data, the majority of techniques involve collecting data ahead of time and creating a model using stored data. 2. Data cleaning - Raw data is often unusable as there may be missing data, inconsistent data use, & numeric data with non-numeric characters. This step also involves combining multiple data sources to a single usable source. Cleaning is often a time-consuming and iterative process, as fixing one issue uncovers another. 3. Feature engineering - After all the data is ready for use it's time to ensure that maximum information is extracted from the data itself. This process usually takes place prior to creating the ML algorithm. 4. Model building/Model Validation - This involves building the model and testing to ensure it works properly on unlabeled data. When working with supervised ML, a chief concern is whether the model is overfit to the training data, i.e., whether the model that was produced takes into account properties that are unique to the training data. There are many statistical techniques used to minimize this risk. 5. Deployment/Monitoring - Deployment of an ML model is rarely a "once-and-done" event. In case of network traffic, historical observations does not necessarily match future activity. Hence, even after deployment, models are monitored & periodically rerun through the build/validate step to ensure top performance.

ML Techniques

~ Classification is the process of automatically inferring a label. (i.e., "spam" vs "legitimate") ~ Forecasting is the use of historical data to predict future behavior.

Features

~ If our data is stored in a spreadsheet where a single row represents one data point, then the features are the columns. ~ Having useful features is a critical prerequisite while having too many non-informative features may degrade algorithm performance.

Machine Learning

~ It refers to systems that are able to automatically improve with experience. ~ With ML, software can gain the ability to learn from previous observations to make inferences about both future behavior, as well as guess what you want to do in new scenarios.

ABI Research

~ Machine learning in cyber security will boost big data, intelligence, and analytics spending to $96 billion by 2021.

Big Data

~ More data is almost always a good thing; it allows algorithms to be aware of many more varieties of categories. ~ Such enormous data sets are technically hard to work with, and an entire field of research and tooling called Big Data has developed with the specific intent of simplifying the process of working with data of this size.

ML Classification Techniques

~ Supervised Learning - refers to algorithms that are provided with a set of labeled training data, with the task of learning what differentiates the labels. Example is Google Image Search. ~ Unsupervised Learning - refers to algorithms provided with unlabeled training data, with the task of inferring the categories all by itself. Sometimes labeled data is very rare, or the task of labeling is itself very hard, or we may not even know if labels exist. Example is network flow data.

Feature Engineering

~ The goal of this practice is to extract the maximum information from the available features so as to maximize our ability to predict or categorize unknown data. ~ It will take multiple features and combine or transform them in complex ways to obtain new, more informative features.


Related study sets

P.E. Mid-Term (Western Oregon University)

View Set

define origin , insertion, prime mover, synergist, and antagonist

View Set

Microbiology Lecture Exam 4: chapters 19,20,21,22,23

View Set

World History: Unit 5: Growth of World Empires

View Set

Springer US History Chapter 21 Quiz

View Set

COMPTIA - Cloud Essentials Certification Practice Questions

View Set