IS 425 Exam 2
Text Mining
- The semiautomated process of extracting patterns (useful information and knowledge) from large amounts of unstructured data sources. "Unstructured" can include documents as well as unstructured data sources like XML or JSON. Text Mining is popular in fields like law (scanning legal documents) and medicine (scanning medical histories). For many years, data was stored in documents and not necessarily in structured databases, so there needs to be a way to extract knowledge from this information.
Deep Neural Networks
A deep neural network is a NN that uses tensors (multi-dimensional vectors) as inputs and weighting functions. For example, images are matrices (2-dimensional tensors) and those serve as inputs to image-recognition neural networks. A tensor is a generalized version of a vector or scalar, so deep neural network algorithms are generalized versions of what we've discussed before. A deep neural network can also involve multiple intermediate (hidden) layers. You can also use variations of these algorithms to solve specific problems.
ANN Structures
A neural network is ultimately made up of these processing elements (neurons) organized in a specified way. These neurons are grouped into layers. There is an input layer where each input represents a single attribute. There is an output layer which represents the output (usually a classification output....whether it's a picture of a dog or a cat). Hidden layers take inputs from the previous layers, processes them, and then sends the output to the next layer. Usually there is only one hidden layer, that's the one layer that actually does the processing.
Ensemembles - Pros and Cons
Advantages: Usually, the right combination of models and techniques can produce a more accurate and precise model. These models tend to handle outliers better (boosting in particular). Ensembles can also reduce overfitting, when your model is specific to the data you provided and doesn't reflect the overall problem. Disadvantages: Creating more models is more complicated. It becomes harder to explain the results. The more complicated your set up, the harder it is to deploy in production.
Artificial Neural Networks
Artificial neurons receive input from other neurons or from an external source, perform some kind of transformation on the data, and then pass it on to the next neuron. Different type of ANNs (and the different architectures that create them) vary by the configuration of the neurons, like how the neurons are connected to each other and what kind of transformations each individual neuron does.
Association
Association Rule algorithms take records and attempt to find associations between different variables in those records. - Association Rule algorithms include Market-Basket Analysis - Looking at what people purchase in a store (the "basket") to identify items commonly purchased together. Link Analysis - Identifies links between different types of objects. - The best example would be analyzing web traffic to see commonly viewed pages. "People who looked at Page A also tended to look at Page B." Sequence Mining - Identifies associations over a period of time.
Backpropogation
Backpropagation - An algorithm for supervised learning that uses stochastic gradient descent to optimize the outputs. Note that the choice of evaluation/optimization technique is different than the choice of neural network architecture. Stochastic Gradient Descent (SGD) - An iterative gradient-based optimizer used for finding the minimum (error) in performance functions. Stochastic = Random Gradient = Looking at the slope Descent = Want to find a minimum
Bagging and Boosting
Bagging - You take multiple samples from the same (larger) dataset, create your models, and then use some kind of formula to "average" the results. The same records can show up in multiple bags. This is also known as bootstrap aggregating. Boosting - You create a model using your dataset. You then take the poorly fitted data and re-run the model-building algorithm on the errors. Then repeat multiple times. Another way to look at it is bagging runs in parallel whereas boosting runs sequentially.
Bayes Theorem
Bayes Theorem - A mathematical formula for the calculation of conditional probabilities. Assumes that variables are independent. My paper copy of the textbook on Page 279 has a typo, so please be careful with the formula here.
Bayesian Networks
Bayesian Belief Network - A graph (network) that represents the dependencies between different variables. Remember that when we say that P(Y|X) is the probability of Y given X, we're saying that Y "depends" on X. Constructing a BBN generally requires the following: A domain expert to tell you which variables should be dependent on each other. Prior statistics that you can use to assign probabilities to these dependencies. While there are many variables, with a BBN we only need to calculate the conditional probabilities that the domain experts say are important. In this overly simplistic example: We assign probabilities to Lamar Jackson running more than 100 yards and the Ravens defense holding their opponent to less than 100 yards. Lamar Jackson running can affect the probability that he gets in injured and the probability that the Ravens win. The Ravens D's performance affects the Ravens winning but does not affect Lamar Jackson's injury status. So, we don't model the conditional probability of the Ravens D on Lamar Jackson's injury status.
Computers as Thinkers
Before the development of modern machine learning and deep learning techniques, programmed computers were already good at tasks that humans did. In this sense, a series of if/then/else statements and other common programming motifs already constitute an artificial intelligence.
Biological Neural Networks
Biological neural networks (like our brains) are composed of neurons (cells in the nervous system) that communicate with each other by transmitting and receiving electrical signals. Nucleus - The central processing portion of a neuron. Axon - Sends a signal. Dendrites - Receive the signal. Synapses - Have the ability to increase or decrease the strength of the connection between neurons.
Classification
Classification: Analyzes historical behavior to create a model that predicts future behavior. - We use the historical behavior as input to predict a class. - Outputs of classification algorithms include decision trees and neural networks. Decision Trees: Classify data into a finite number of classes based on the values of the input variables.
Clustering
Clustering - Partitions a collection of things into segments (or groupings) whose members share similar characteristics. On a map, we could cluster objects that are "near" each other. Other algorithms cluster records based on how "similar" they are. This involves some kind of distance measure. Clustering can be used as a precursor to classification to identify groups of records that are similar to each other.
Cognitive Computing
Cognitive Computing - Computing systems that use mathematical models to emulate (or simulate) the human cognition process to find solutions to complex problems and situations where the potential answers can be imprecise. For example, a self-driving computer attempts to mimic a human driver. Attributes of Cognitive Computing Systems Adaptive - Must be flexible enough to learn as information changes and goals evolve. Interactive - Must be able to interact with humans and other processes. Iterative and Stateful - Can identify problems by asking questions or pulling in additional data if a stated problem is vague. Contextual - Must understand, identify, and mine contextual data.
Cognitive Computing
Cognitive Computing - Computing systems that use mathematical models to emulate (or simulate) the human cognition process to find solutions to complex problems and situations where the potential answers can be imprecise. ▪ For example, a self-driving computer attempts to mimic a human driver. ▪ Attributes of Cognitive Computing Systems ▪ Adaptive - Must be flexible enough to learn as information changes and goals evolve. ▪ Interactive - Must be able to interact with humans and other processes. ▪ Iterative and Stateful - Can identify problems by asking questions or pulling in additional data if a stated problem is vague. ▪ Contextual - Must understand, identify, and mine contextual data.
Classification Evaluation
Confusion Matrix - A matrix comparing what the classification algorithm predicted with the actual values (the "ground truth"). "Positive" and "Negative" refer to classes and can be arbitrarily defined. Accuracy - the ratio of correctly classified records divided by the total number of records. Precision - the ratio of correctly classified positives divided by the sum of all positives. Recall - the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly classified negatives.
Convolutional Neural Network
Convolutional Neural Network (CNN) - A deep neural network architecture specifically designed for computer vision applications. Convolution - A linear operation that aims to extract simple patterns like edges and patterns from sophisticated data patterns. An image input has so many pixels...it simplifies our work if we can reduce an image from a series of pixels to a series of patterns. These convolution layers are also known as filters. For example, a layer that detects denim or a layer that detects eyes. Pooling - Consolidates the input to tensors that are smaller. This is mostly done to reduce the input size. This can also be employed in situations where most of the image isn't relevant to the analysis (like when you're tracking movement) and you want to reduce that info.
Data mining
Data Mining: Discovering or "mining" knowledge from large amounts of data. - A process that uses statistical, mathematical, and artificial intelligence techniques to extract and identify useful information and subsequent knowledge (or patterns) from large sets of data.
Data Mining Results
Data mining results should be: Nontrivial - Some experimental type search or inference is involved. Valid - The discovered patterns should hold true on new data with a sufficient degree of certainty. Novel - The patterns are not previously known to the user within the context of the system being analyzed. Potentially Useful - The discovered patterns should lead to some benefit to the user or task. Ultimately Understandable - The patterns should make business sense.
Deductive vs. Inductive
Deductive Reasoning Requirements: - A set of established facts - A set of logical reasoning rules Processes: Apply logical reasoning rules to those facts • To test those existing facts • To derive new facts Conclusions: New facts that are provably true. Inductive Reasoning Requirements: A set of examples/observations Processes: Attempts to create generalized conclusions or hypotheses about the examples provided to it. Conclusions: Fuzzier results that aren't 100% provable but that can be non-obvious.
Examples of Deductive and Inductive Reasoning
Deductive Reasoning ▪ Knowledge Engineering ▪ Expert Systems ▪ Traditional Computer Logic ▪ Programming Languages ▪ Clustering ▪ Association ▪ Classification Inductive Reasoning ▪ Deep Learning Algorithms ▪ Neural Networks
Deductive Reasoning
Deductive Reasoning - Facts and logical rules are used to make conclusions. One tries to reason using logic (or other systematic methods like search) with a well-established base of hypotheses in order to make provably correct conclusions.
More Examples
Deductive Reasoning Examples • Tax-filing systems like TurboTax. TurboTax is sophisticated but, in the end, it is a series logical statements and calculations based on the current tax codes. • WebMD Symptom Checker. A knowledge base of symptoms written into a database and a series of rules designed to take the user's input and search that database. • Learning a language with a grammar book. Inductive Learning Examples • Tesla Full-Self Driving. Tesla's neural networks are developed by processing millions of sets of telemetry by their beta testers. • Learning a language by observing speakers of that language. • A phone's ability to identify the faces of your friends in your photos based on prior-tagged sample photos. • "Perusing a worked example to learn algebraic manipulation."
Some Other Observations
Deductive reasoning problems tend to be more narrowly focused whereas inductive learning problems produce more generalized results. • Explainability The results of deductive reasoning tend to be more interpretable and more explainable. The "black box" nature of many deep learning algorithms makes inductive learning techniques less explainable. While developments in deductive reasoning continue to be refined, research in inductive learning has the potential to produce giant leaps in computational ability. Inductive reasoning techniques (like deep learning techniques) are harder
Ensemble Modeling
Ensembles - Combinations of the outcomes produced by two or more analytics models into a compound output. This is common in research. Sometimes a combination of models produces the best results.
Entropy
Entropy - The amount of uncertainty or randomness in a data set. We want to minimize this. In the Ravens example there is no split of # of wins that causes entropy to be zero.
The Black Box Nature of ANNs
Explainable AI - One of the concerns about ANNs is how difficult it is to explain why the neural network model created its output. This is an issue for all machine learning techniques, but since neural networks (and other deep learning techniques) require less human input, it is even more difficult to explain the results. Sensitivity Analysis - When you make slight adjustments to the input (perturbation) to see how it affects the output.
Developing Neural Networks
If you want to use neural networks in your organization, the process isn't that much different than it would be for any other machine learning process: You still must clean the data, separate it into training and (test) validation sets. You must decide on a neural network architecture. You must choose a learning algorithm. You must set the parameters of the network. You then run the network on the input and test the output. When satisfied, you deploy the output of the network.
Classification Testing
In order to determine the previous formulas, we need to split the dataset into a training set and a test set. The training set is the set of records you use to build the model. The test set is the set of records you use to test the model. Simple Split - Partitions the data into a training set and a test set. It is the simplest way of doing this kind of split.
Association Strength
In order to evaluate whether or not the association rules that were created were any useful, we use the following measures. Support - How often do these two items appear together in the same transaction. Confidence - How often do the two items appear together in the same basket compared to all transactions involving the first item. Lift - The ratio of the confidence of a rule and the expected confidence of the rule. It is a measure of the strength of this association.
Nearest Neighbor Performance
In order to perform this algorithm, you must compare the distance of your new point with all the other points to identify which ones are the nearest. This can be problematic if you have millions of points in your dataset. If your data is stored in a database with decent indexes, then the database will figure out faster ways to process this. Researchers have figured out all kinds of workarounds and "cheats" to get around this problem. There are people who focus on this full-time.
Clustering Steps
K-Means is a clustering algorightm. K-Means algorithm - specify the number of clusters (k) we want. We randomly place those k points as initial cluster centers. We assign each point to its nearest cluster center. Each grouping is a new cluster. We recompute new cluster centers. And repeat until the clusters don't change.
Nearest Neightbors
K-Nearest Neighbors (k-NN) - A prediction method for classification or regression where the prediction is made based on similarity to k neighbors. How it works (for classification) Your existing data (or test data) is plotted on a graph. This existing data has properties (determining where it is plotted on the graph) and a class. When a new object is brought in, that object is plotted on a graph. Depending on what k is, we identify the k-nearest neighbors to our new point and count the number of each class. So, if k = 3, then we look at the three closest objects to our new object. Whichever class is the most represented is the class assigned to our new object.
Social Network Structures
Multiplexity - Two people can be connected in multiple ways (like if they're friends and work together). In graph networks, nodes that can connect to each other with multiple links are multiplex networks. Network Closure - A measure of the completeness of relational triads. Is your set of friends self-contained? Bridge - An individual whose weak ties fill a structural hole. So if two groups of friends are only linked by one person. Centrality - A way of measuring how important (or well-connected) an individual node is in the network. Distance - In a graph, the number of connections needed to connect two individuals. So, if someone has "six degrees of separation" to Kevin Bacon, that distance is six. There are many ways to calculate distance on a graph. The textbook just uses the "number of links between nodes" as the measure.
Natural Language Processing
Natural Language Processing (NLP) - The process of taking depictions of human language into more formal representations that are easier for computer programs to manipulate. The goal is to go beyond matching words...instead we want to analyze text to understand its meaning.
Naive Bayes Classifier
Naïve Bayes Classifier - A simple classification algorithm based on Bayes Theorem. In general, it works like this: We start with a new data set. This data point has some variables assigned to it. We calculate the following with the dataset used to build the model: The probability each of the classes. In the textbook example, "Golf Yes" and "Golf No." The probability of each dependent variable given the class. For example: Probability of Sunny Given Yes Probability of Windy Given No There are many of these to calculate. We then use Bayes Theorem to calculate the probability of each class given the variables associated with the input. Whichever class has the highest probability is the class we assign.
Neural Networks
Neural Computing - A pattern-recognition methodology for machine learning. Artificial Neural Network (ANN) - The model generated by this methodology. Perceptron - An early neural network model.
inductive Learning
Not all knowledge can be easily derived using deductive reasoning. Sometimes humans make leaps in judgment (breakthroughs) that do not follow from logical rules. Much of human intelligence is gleaned from evidentiary experience in dayto-day life that supports intuitive choices (but may not result in provably correct conclusions). Inductive Learning - Data instances (examples) provide the evidence needed to build hypotheses and make predictions from them. Making non-obvious inferences often requires the sacrifice of provable correctness.
Challenges with NLP
Part-Of-Speech Tagging - The process of marking the words in a text as corresponding to a particular part of speech (nouns, verbs, etc.) based on a word's definition and the context in which it is used. Text Segmentation - Splitting text (or speech) into words. Some languages don't employ distinct "words" and verbal speech sometimes blend words together. Word Sense Disambiguation - Selecting the meaning of a word based on its context in the document. Syntactic Ambiguity - Identifying the sentence structure of text. Imperfect/Irregular Input - Speech impediments or accents can make it harder for algorithms to process speech if the algorithm hasn't been trained to interpret them.
Neurons
Processing Element - An artificial neuron that makes up a neural network. ▪ Step 1 (X): A weight w is assigned to the input p. ▪ Step 2 (Σ): A bias term b is assigned to the weighted input z. ▪ Step 3 (f): A transfer/activation function is applied to the net input n. This produces the final output a. ▪ 𝑎 = 𝑓(𝑤𝑝 + 𝑏) Notice that Step 1 is a multiplication and Step 2 is an addition. Combined they are a simple linear transformation. Step 3 (the activation function) can be non-linear.
Recurrent Neural Networks
Recurrent Neural Network (RNN) - A type of neural network that is specifically designed to process sequential inputs. RNNs can be combined with CNNs For example, if we want to track a person walking down a street on a video camera, we have to identify a person in a video (the CNN) and then track their movements (the RNN). Long Short-Term (LSTM) Network - The most common RNN
Classification vs. Regression
Remember that the output of classification is a model that can identify the class of an object. - Regression is similar but the output is numeric. - Logistic regression produces a probability, but classification uses a threshold to classify objects.
Association Outputs
Remember, the output of association rule mining are associations.
Search Engine
Search Engine - A software program that searches for documents (Internet sites or files) based on keywords that the user provides. The documents are indexed using the same techniques described earlier.
Sentiment Analysis
Sentiment Analysis - A technique used to detect favorable and unfavorable opinions toward specific products and services using a large number of textual data sources. For example, analyzing social media to see what people think of something. This can include positive/negative opinions, how strong those opinions are, or how popular something is.
Data Mining Privacy
Since a lot of data contains private information, a common tactic to deal with this is to anonymize the data. There are all kinds of legal reasons why this must be done. There are situations where even if the data is anonymized people can still isolate the records to identify results. In many cases this is due to the power of data mining algorithms. This has led to more sophisticated anonymization techniques to counter this such as differential privacy and k-anonymity.
Basic Neural Network Math
So, remember that an artificial neuron (or processing element) takes an input, performs a linear transformation on the input (with w and p) and then applies an activation function f to produce an output. In real life, with multiple inputs, the same linear transformation is done but with vectors. ▪ 𝑎 = 𝑓(𝑊𝑝 + 𝑏) Where p is an input vector, W is a vector representing the weights of those inputs. Wp is the dot product of W and p, which results in a number (scalar). This means b and ultimately n are scalars, and the function f operates on those scalars.
Social Analytics
Social Network - A social structure composed of individuals/people linked to one another with some type of connections/relationships. Social Network Analysis - The systematic analysis of these networks. This usually involves modeling these networks as mathematical graphs with people as nodes and relationships as edges connecting the nodes.
Stacking
Stacking - When you use different algorithms on the same dataset to create different models and then use ANOTHER machine learning algorithm to combine the models. This is not bagging. With bagging you split the data and run the same algorithm. Here you run different algorithms on the whole dataset. Information Fusion - Same as stacking but instead of running another machine learning algorithm you just do simple weighting or averaging on the results of the different algorithms.
Text Mining Process
Step 1: Establish the Corpus Remember that the corpus is the set of documents we want to analyze. Step 2: Create the Term-Document Matrix Term-By-Document Matrix - A matrix showing how many times a term shows up in each document in your corpus. The documents are the rows and the terms are the columns in the matrix. Step 3: Extract The Knowledge Once the term-by-document matrix is created, different data mining techniques can be applied to extract knowledge. Classification can be used to categorize documents based on their contents. Clustering can be used to group similar documents together. For example, when searching for documents, other documents in the cluster can show up in the results. Association rule mining can be used to identify concepts that commonly appear together in documents. Trend Analysis - The collecting of information and attempting to spot a pattern, or trend, in the information.
Sentiment Analysis Process
Step 1: Sentiment Detection Determine whether the text is fact or opinion, or if the statement is objective vs. subjective. This places the statement on the S-O Polarity scale. Step 2: Polarity Classification If the text is an opinion, is it a positive or negative opinion, and how strongly positive or negative it is. This places the statement on the P-N Polarity scale. Two methods of polarity identification: Using a lexicon (catalog of words) with their meanings. This lexicon can be different for different domains. Using a series of training documents, where the statements are tagged with their polarity. Step 3: Target Identification Identify the subject of the opinion. If we have a positive or negative opinion, WHAT do we have that opinion of? ▪ Step 4: Collection and Aggregation The result of the first three steps is a series of tags of statements from the document (or corpus) with the sentiment, polarity, and target for that statement. Then we can aggregate these results to create a score to identify this information for the entire document.
ANN Learning Process
Supervised Learning - Sample cases are shown to the network as input and weights are adjusted to minimize the error in the inputs. First, run the inputs through the neural network. Second, compare outputs with desired targets Did it correctly label a dog as a dog? Performance Function - The function used to compare the computed outputs with reality. Third, adjust weights and repeat the process.
Support Vector Machines
Support Vector Machines - Algorithms that attempt to perform classification or regression by using a series of linear operations on vectors. For classification, a data set of with two classes, and we want to use a straight line to classify them. Hyperplane - Higher dimensional analogues of the straight line used in the 2D example. The resulting model generated by a support vector machine is a series of linear equations. These are generally very fast to generate and very fast to use to predict the class of new data. Non-Linear Classification: While linear SVMs are the most common, you don't have to split your data set with a straight line. Kernel Trick - Sometimes data sets that don't look like they can be split linearly can be transformed into a higher dimension so that they can be split
Classification Inputs and Outputs
The input of classification is a set of objects and some characteristics of those objects. The output of classification is an algorithm (or model) of some kind that identifies the class of an object. Is this the dog or a cat? - Input: A picture of something. - Output: Dog or Cat Should the bank give you a loan? - Inputs: Your credit score, your current income, etc. - Output: Yes or No
Data Mining Realities
The textbook refers to them as myths but let's focus on the reality. Data mining is a multistep process that requires deliberate, proactive design and use. These algorithms are tools. You must know how to use them appropriately. The current state of the art in data mining is ready for use now. Data Mining technology is becoming increasingly democratized. The hard work was done by Ph.Ds., the tools are available for everyone to use. Data Mining has such a low barrier to entry that you don't need to work for a large company to implement it.
Data Mining Tools
The three most common tools for data mining are: - R - Python - SQL
Nearest Neighbor Considerations
There are a few ways experimenters can vary the parameters when doing k-NN: We must choose a value of k that works. If the value is too big then we risk blurring the distinctions between classes. If the value is too small, then we risk distorting our predictions. For example, if k=1 then we assign the class of that one nearest neighbor. That's a lot of sway from that one neighbor. We must choose a distance measure. There are some intuitive reasons why you'd want to choose one over the other. If we were legitimately talking about city transportation, then we probably want the Manhattan distance. If we're talking about GIS data, then we probably want Euclidean distance. We mentioned assigning the class based on counting the number of neighbors and counting the classes of those neighbors. Picking the class with the most is the voting scheme used. This could be a problem if one class is over-represented in the original data set. Usually, we account for this by having the data set used to create the model include all classes equally. Ultimately, like all modeling problems, we create the model and test it (using our test data set) and see how accurately it labeled the classes. We vary these parameters to get the best Accuracy/Precision/Recall scores.
Term-By-Document Matrix
This matrix tends to be very large, because you can have hundreds of documents and thousands of different terms in the document. One way to deal with this is to either include the terms that you care about in your domain of interest (like if you're only interested in medical terms) or to use mathematical techniques to shrink the matrix. Term Dictionary - The collection of terms specific to a narrow field (like medicine or law). Singular Value Decomposition - A method used to transform the term-bydocument matrix to a manageable size using mathematical techniques.
Data Mining vs. Statistics
Traditionally statistics involves a specific hypothesis and the testing of that hypothesis. With data mining, we are looking for insights (knowledge) that we vaguely have an idea about.
Text Mining Terminology
Unstructured Data - Does not have a predetermined data structure. Examples include Word and PDF documents as well as standard formats like XML and JSON. Corpus - A large and structured set of texts prepared for the purpose of conducting knowledge discovery. Term - A single word or phrase extracted directly from the corpus of a specific domain by means of NLP methods. Concepts - Features generated from a collection of documents by some kind of statistical methodology. A concept is a high-level abstraction. Stemming - The process of reducing inflected words to their stem. For example: viewed, viewing, viewer are all based on the root view. Stop Words - Words that we don't include in an analysis. For example, in this sentence the underlined words. Synonym - Different words with identical or similar meanings. Polyseme - Identical words with different meanings. Token - A categorized block of text in a sentence. We extract all the useful terms from a text. A "unit of meaning." Tokenizing - The process of assigning tokens to blocks of text. A lot of text mining involves converting documents into something a machine learning algorithm can process, like a vector or matrix. These structured data sources contain the terms we're looking at. Word Frequency - The number of times a word is found in a specific document. Morphology - Branch of linguistics that studies the internal structure of words.
Training Concerns
Usually, the ability for an organization to apply neural networks to a problem (like a classification problem) depends on the computing power available to the organization. Can use the cloud. Can purchase powerful workstations with either graphics cards or dedicated cards optimized for these algorithms. Overfitting - When the output matches the input very well but doesn't work well when applied to new data. Same as any other machine learning technique. The best way to prevent this to make sure you have a diverse input data set. You can also split the data into training and validation (test) sets.
Similarity Measures
We say we're looking for "nearest" points but in general we're talking about "similarity" that's plotted on a graph. Minkowski Distance - A generalized measure of distance from an object. Examples include: Euclidean Distance - The linear distance between two points (as taught in high school). Manhattan Distance - As if we were measuring distance by city blocks Some Additional Notes: In the examples to the left, there are only two dimensions of interest. See page 275 in the text for the generalized example. When calculating distance: d(i,j) = 0 means that the two objects have identical properties. This does NOT mean that they are the same object.
Decision Trees
We'll use the Ravens example from earlier. We want to predict if the Ravens will make the playoffs. We take the dataset of every Ravens season with the number of wins as the input, and if the Ravens made the playoffs (yes or no) as the class. We choose a characteristic (or variable) like the number of wins and split the data at some point that does the best job of predicting making the playoffs. If we split at 10 games then every game with 10+ wins was a playoff season, but we leave out one playoff season where the Ravens went 9-7 (2009). If we split at 9 games, then we add one playoff game but add two 9-7 seasons that weren't playoff seasons (2004,2017).
Web Mining
Web Mining - The process of discovering intrinsic relationships from Web data. The goal is to take data from websites and make it useful for analysis. Web Content Mining - The extraction of useful information from Web pages. Web Crawlers (Spiders) - Can scrap information from websites. This is how search engines scan for and index websites. Authoritative Pages - Pages that are the considered a primary source of information. For example, government websites. Hub - One or more web pages that provide a collection of links to authoritative pages. Hyperlink-Induced Topic Search (HITS) - An algorithm designed to scan links to pages to identify hubs and authoritative pages. It is possible to "game" the system by creating dummy pages with the sole purpose to link to pages you want to direct traffic to. Techniques like this are known as Search Engine Optimization (SEO). Web Structure Mining - The process of extracting useful information from the links embedded in Web documents.
Web Usage Mining
Web Usage Mining - The extraction of useful information from data generated through Web page visits and transactions. Also known as Web analytics. Off-Site Analytics - Analyzing interest in your website from external sources (like social media). On-Site Analytics - Analyzing traffic on your website. For example, which pages on your site are viewed the most, and how users navigate through pages on your site. Google Analytics is the most common tool for this.
Deep Learning
What makes Deep Learning "deep" is the ability to identify patterns in data without a human providing structure and meaning to the source data set. With deep learning, humans do less work, but you usually require a larger data set which means that you need more powerful computers to process it.