MIS 409 Final
Data Mining Consists of Five Major Elements
1.Extract, transform, and load transaction data onto the data warehouse system. 2.Store and manage the data in a multidimensional database system. 3.Provide data access to business analysts and information technology professionals. 4.Analyze the data by application software. 5.Present the data in a useful format, such as a graph or table. ●
Data Mining Tools
Free Data Mining Software: ●Weka - an open-source software for data mining ●RapidMiner- an open source system for data and text mining ●KNIME - an open source data integration, processing, analysis, and exploration platform Proprietary Data Mining Software: ●IBM SPSS Modeler ●Microsoft Analysis Services ●Oracle Data Mining ●SAS Enterprise Miner
Knowledge Discovery in Database (KDD)
Knowledge Discovery in Database (KDD) ●Refers to the overall process of discovering useful knowledge from data. ●It involves the evaluation and possibly interpretation of the patterns to make the decision of what qualifies as knowledge. ●It also includes the choice of encoding schemes, preprocessing, sampling, and projections of the data prior to the data mining step.
Data Mining Techniques
Techniques ● Association rules ●Classification algorithms ●Decision trees ●Regression algorithms ●Neural networks
Machine learning main industries
There's a wide range of application for data science and machine learning. These include: Technology: Technology companies are among the biggest employers of data scientists and machine learning experts. These companies use data science and machine learning techniques to develop products and services, improve user experiences, and optimize business processes. Finance: Financial institutions such as banks, insurance companies, and investment firms use data science and machine learning to analyze large volumes of data and make informed business decisions. They also use these techniques for fraud detection and risk management. Healthcare: Data science and machine learning are increasingly being used in healthcare to improve patient outcomes, develop new treatments and drugs, and streamline healthcare operations.
potential applications of chatGPT
Uses in Entertainment ●Able to generate new storylines and concepts. ●Able to summarize characters, plotlines, etc. to catch new writers up to speed. ●Potential for endless scripts based off of pre-existing shows like Seinfeld. Uses in Healthcare ●Able to summarize a patient's medical history and highlight major issues to doctors ●Able to chat with patients to help diagnose an issue without having to pay to see a doctor Uses in Construction ●Able to create building ideas and internal layouts ●Able to generate specifications for measurements, materials needed, etc. for designs
Neural network
are fundamental to data science and machine learning. They're computational models inspired by the human brain's structure and function. In data science, neural networks are used for tasks such as classification, regression, clustering, and more. They're particularly adept at handling complex data patterns and making predictions based on them.
deep learning
is a subset of machine learning that uses neural networks with many layers (hence the term "deep"). These deep neural networks are capable of learning hierarchical representations of data. has gained significant attention and popularity due to its remarkable ability to automatically learn features from raw data. Traditionally, feature engineering was a crucial step in machine learning, where experts manually selected or crafted features that they believed would be useful for the model. However, deep learning eliminates much of this manual feature engineering by learning hierarchical representations of features directly from the data.
How does chatGPT work
works by leveraging a deep learning model called the Generative Pre-trained Transformer (GPT). 1. Generative: ChatGPT is capable of generating human-like text. It's trained on a vast corpus of text data to understand the structure and patterns of human language. 2. Pre-trained: Before it's fine-tuned for a specific task like conversation, it undergoes extensive training on diverse datasets. This pre-training helps it grasp a broad understanding of language. 3. Transformer Architecture: This is the backbone of ChatGPT. The Transformer architecture enables it to process and generate text in parallel, making it highly efficient. It consists of layers of self-attention mechanisms, allowing it to weigh the significance of different words in a sentence concerning each other. 4. Fine-tuning: After pre-training, the model is fine-tuned for specific tasks, such as conversation, by exposing it to examples of human dialogue. During this phase, the model adjusts its parameters to better suit the nuances of the task. 5. Response Generation: When you interact with ChatGPT, it takes your input, processes it, and generates a response based on its understanding of the input and the context provided by the conversation history. 6. Continual Learning: ChatGPT continuously learns from the interactions it has. This means that the more it's used, the better it becomes at understanding and generating human-like responses.
What is AI?
●AI stands for "artificial intelligence." ●AI refers to the development of computer systems that can perform tasks that normally require human intelligence. ●AI uses algorithms, statistical models, and other techniques to enable machines to learn from data and improve their performance over time. ●AI includes subfields such as machine learning, natural language processing, robotics, and computer vision. ●AI is being used in a wide range of applications, including autonomous vehicles, virtual assistants, medical diagnosis, and financial analysis.
Amazon
●Automate prediction of customer demand ●Customer support chatbots ●Product Recommendation ●Alexa ●Warehouse and Delivery Optimization ○Smart Robots that scan packages Optimizing delivery routes
BI / BA Tools
●BI / BA tools help to: ○Discover patterns and outliers in data. ○Collect, organize, and analyze data all in one place. ○Get insights for growth and resolving issues. ○Forecast future outcomes for the business. ●BI tools help to provide descriptive analysis of data while BA tools help to provide predictive analysis of data. ●BI uses historical data to determine what happened within an organization, while BA uses this data to determine why those things happened in an attempt to make predictions. ●Microsoft Power Platform ○A platform that brings together tools for data visualization, developing low-code applications, automating workflows, and building intelligent chatbots. ○Can be used as standalone applications or connected to existing systems within a business.
What is chatGPT
●ChatGPT is a language model developed by OpenAI that can be used for various natural language processing tasks, including text generation, summarization, and question answering.
Supervised learning purpose
●Classification: Where the algorithm predicts a categorical output variable based on input features. Examples include identifying spam emails or classifying images. ●Regression: Where the algorithm predicts a continuous output variable based on input features. Examples include predicting stock prices or estimating housing prices. ●Object detection: Where the algorithm identifies objects within an image or video and classifies them into different categories. ●Natural language processing: Where the algorithm processes and analyzes text data, such as sentiment analysis or language translation.
unsupervised learning purpose
●Clustering: Where the algorithm groups similar data points together based on their characteristics. Examples include identifying customer segments based on their purchasing behavior or grouping news articles by topic. ●Anomaly detection: Where the algorithm identifies data points that are significantly different from the rest of the dataset. Examples include detecting fraudulent transactions or identifying faulty equipment in a manufacturing process. ●Dimensionality reduction: Where the algorithm reduces the number of variables in a dataset while retaining as much information as possible. This can help simplify data analysis and visualization.
What is data science
●Data science is the combination of math and statistics, specialized programs, advanced analytics, AI and machine learning ●It's often used for specific subject matter to find insights hidden in an organization's data ●This data often guides them through decision making and strategic planning ●An area that manages, manipulates, extracts, and interprets knowledge from tremendous amount of data. ● Data science (DS) is a multidisciplinary field of study with goal to address the challenges in big data. ●Data science principles apply to all data - big and small
Advantages of BI
●Data visibility ●Identify trends and patterns ●Improved customer/employee experiences ●Revenue growth ●Quick decision process Performance measurement
Why is Data Science Important?
●Enables organizations to make data-driven decisions. ●Helps to identify patterns and trends in data. ●Improves the accuracy and reliability of decision-making. ●Helps to identify opportunities and potential problems. ●Enables organizations to stay competitive and gain a strategic advantage.
How to Use AI
●Enhancing decision making ○Using AI to analyze large amounts of data and provide insights that can help businesses make better strategic decisions ●Improving Efficiency and Productivity ○Implementing AI-powered tools and systems to optimize supply chain management, resource allocation, and production processes. ●Enhancing cybersecurity ○Using AI to detect and respond to potential cyber threats and improve the overall security posture of the organization.
Reinforcement purpose
●Game playing: Where the algorithm learns to play games such as chess. ●Robotics: Where the algorithm learns to control a robot to perform a task, such as navigating a maze or grasping objects. ●Autonomous vehicles: Where the algorithm learns to control a vehicle to drive safely and efficiently.
IBM
●Helps integrate and scale AI and machine learning ●Infuses AI-powered intelligent workflows into business processes on Google Cloud. ●Partners with clients to transform customer service with Google's Contract Center AI Platform Project Debater
BI in Business
●Important to businesses of all sizes to make better data-driven decisions. ●Helps to create a better future for the business and to adjust their strategy if necessary. ●Business data is transformed into actionable information with user-friendly visualizations such as graphs, reports, and dashboards. ●Gain a better understanding of: ○Marketing trends. ○Customers and their buying behavior. ○Factors impacting profit and loss. ○Areas where performance is low. ●Businesses can utilize this to further improve and gain a competitive advantage over other businesses. ●Better visibility within the business helps to improve productivity, efficiency, customer knowledge, financial performance, and control over business processes. ●Efficient way of combining several metrics of data from different departments in one convenient place for businesses to easily identify patterns within data.
●Improve search features with Google Lens and new multi search feature. ●Analyzes data to provide info on traffic conditions and delays for Google Maps ●YouTube uses AI to generate captions on videos. ●Spam filtering in Gmail ●AI formats Google Ads to fit the viewport
Importance of BA
●Informed Decision-Making ○Uses data to make informed decisions based of fact not guesswork. ●Improved Operational Efficiency ○Helps businesses identify inefficiencies and optimize operations for better productivity. ●Better Customer Insight ○Helps businesses understand customer behavior and preferences. ○Helps tailor marketing and product development to fit customer desires. ●Competitive Advantage ○More likely to give businesses a competitive advantage when in use. ●Predictive Modeling ○Helps predict future trends and outcomes ○Aides in proactive decision making and risk management ●Leads Management ○Enables businesses to focus on the best leads available. ○Uses information gathered from raw data.
How Data Mining is Used in Business
●It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. ●Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. ●Data Mining is primarily used today by companies with a strong consumer focus, to "drill down" into their transactional data. ●data mining helps determine pricing, customer preferences and product positioning, impact on sales, customer satisfaction and corporate profits. ●Businesses can use this information to influence their business decisions, follow and predict trends, and improve efficiency.
AI Subfields
●Machine Learning ●Deep Learning ●Neural Networks ●Cognitive Computing ●Natural Language Processing ●Computer Vision
What is Machine Learning
●Machine learning is a field of computer science and artificial intelligence. ●It focuses on building algorithms and models that enable computer systems to learn and improve from experience. ●The goal of machine learning is to develop programs that can automatically learn and improve from data without being explicitly programmed. ●Machine learning algorithms use statistical techniques to analyze and identify patterns in data. ●The identified patterns are then used to make predictions or decisions about new data. ●The ultimate goal of machine learning is to develop programs that can learn and improve over time, becoming more accurate and effective as they are exposed to more data. This has the potential to revolutionize many fields, from healthcare and finance to transportation and entertainment.
Microsoft
●Microsoft AI is powered by Azure. ●Azure is built for running large AI models. ●Azure OpenAI Service ○GPT-3.5, ○Codex ○DALL-E 2 ●AI powered search with Bing ●Cortana ●Copilot
AI modeling
●Refers to the process of creating mathematical representations of real-world phenomena using AI techniques. ●Data is collected, analyzed, and used to develop algorithms and models that can make predictions or decisions based on new input data.
Reinforcement Learning
●Reinforcement learning is a type of machine learning where the algorithm learns by trial and error. ●It is not given any labeled data or specific examples of what to do, but instead learns by receiving feedback in the form of rewards or penalties. ●The algorithm interacts with the environment by taking actions and observing the resulting state and reward. ●The goal of the algorithm is to learn a policy that maximizes the cumulative reward over time. ●The policy is a function that maps the current state of the environment to an action.
Basket Analysis
●Sometimes called "affinity analysis," this looks at the items that a customer bought, which could help brick-and-mortar stores improve their layouts or online companies like Amazon recommend related products. The "basket" refers to what shoppers use when they are shopping. ●It's based on the assumption that you can predict future customer behavior by past performance, including purchases and preferences. And it's not just grocery stores that can use this data. Here are a few ways it can be applied in various industries:
Supervised machine learning
●Supervised machine learning uses a labeled dataset with input data and corresponding output data. ●The algorithm learns to map the features to the labels by identifying patterns in the data. ●The algorithm adjusts its internal parameters to minimize the difference between predicted output and actual output during training. ●Once trained, the algorithm can be used to make predictions or decisions about new, unseen data.
What is Business Intelligence?
●The process of collecting, storing, and analyzing the data produced by a company ●Uses strategic decision making ●Information collected that analyzes: ○Pattern ○Trends ○Relationships ●Refers to capabilities that enable organizations to make better decisions, take informed actions, and implement more efficient business processes ●Uses data to manage day to day operational management within a business ●Compiling and accessing big data
What is Business Analytics?
●The process of transforming data into insight to improve business decisions ●Using skills, technologies, and practices to explore ●Analyzing data to identify: ○Trends ○Patterns Root causes ●Forecasting future business needs, performance, and industry trends based on past and present data ●Identifying new patterns and relationships with data mining ●Using quantitative and statistical analysis to design business models
Sales Forecasting
●This looks at when customers bought, and tries to predict when they will buy again. You could use this type of analysis to determine a strategy of planned obsolescence or figure out complimentary products to sell. ●This also looks at the number of customers in your market and predicts how many will actually buy. For example, imagine if you have a coffee shop in Seattle. Here are questions you might ask: ●How many people/households/businesses within a mile of your store will buy your coffee? ●How many competitors are in that mile? ●How many people/households/businesses in 5 miles? ●How many competitors in those 5 miles?
Unsupervised machine learning
●Unsupervised machine learning uses an unlabeled dataset with no output variables or labels. ●The algorithm tries to identify patterns and structures in the data on its own. ●The algorithm looks for similarities and differences between data points and groups them into clusters based on these similarities and differences. ●The goal of unsupervised learning is to find meaningful patterns and relationships in the data without any prior knowledge of what the output should look like.
chatgpts relationship to AI NLP, deep learning, and machine learning
●Uses AI to generate human-like text in response to natural language prompts. ●Uses Machine Learning to learn from vast amounts of text data and generate high-quality responses. ●It's architecture is based on Deep Learning techniques, allowing it to learn complex patterns and generate highly realistic and convincing text. ●Uses NLP to to understand and respond to human language naturally and appropriately.
How Does Data Mining Work
●While large-scale information technology has been evolving separate transaction and analytical systems, data mining provides the link between the two. ●Data mining software analyzes relationships and patterns in stored transaction data based on open-ended user queries.
data mining
●is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. ● overall goal is to extract information from a data set and transform it into knowledge●refers to the application of algorithms for extracting patterns from data without the additional steps of the KDD process. ●Data mining is considered a misnomer, because it's the extraction of patterns and knowledge from large amounts of data not the extraction (mining) of data itself. ●is the analysis step of the "KDD" process.