CIS 352 Exam 2

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

5.9.4 What are commonly used Web analytics metrics? What is the importance of metrics?

• Website usability: How were they using my website • Traffic sources: Where did they come from? • Visitor profiles: What do my visitors look like? • Conversion statistics: What does all this mean for the business? Metrics are important because they can lead to a highly quantified marketing program

5.8.4 What things can help Web pages rank higher in the search engine results?

•Ad words- paying for position •Content and keywords are relevant to many searches

4.2.2 What recent factors have increased the popularity of data mining?

•More intense competition at the global scale •Recognition of the value in data sources •Availability of quality data •Consolidation and integration of data repositories into data warehouses •The exponential increase in data processing and storage capabilities; and decrease in cost •Movement toward conversion of information resources into nonphysical form

5.2.3 Why is the popularity of text mining as an analytics tool increasing?

•Roughly 85-90% of all data is textual in nature •The number is doubling roughly 18 months •Unstructured data provides more value

5.7.4 What is Web content mining? How can it be used for comparative advantage?

•What the web page actually says •Can collect intelligence about competitors' products, services, and customers

5.8.1 What is a search engine? Why are they important for today's businesses?

-A software program that searches for documents, sites, or files based on keywords -Important because the are used in e-commerce and people use them to learn about products and services, so companies want prominent visibility on the Web

5.9.1 What are the three types of data generated through Web page visits?

-data stored in server access logs, referrer logs, agent logs, and client-side cookies -user characteristics and usage profiles -metadata, such as page attributes, content attributes, and usage data

5.8.2 What is a Web crawler? What is it used for? How does it work?

-used to read through the content of a Web site automatically -used to collect competitive intelligence or information/ data collection, -it works by following the hyperlinks on URLs

4.4.3 List and briefly define the phases in the CRISP-DM process.

1. Business Understanding- knowing what the study is for and a rough budget and team is established 2. Data Understanding- identify and categorize relevant data 3. Data Preparation- prepare data for analysis by using preprocessing steps 4. Model Building- build, assess, and compare models 5. Testing and Evaluation- assess and evaluate models for accuracy and generality 6. Deployment- organize and present the knowledge in a way the end user can understand and benefit from

5.6.4 What are the main steps in carrying out sentiment analysis projects?

1. Sentiment Detection 2. N-P Polarity Classification 3. Target Identification 4. Collection and Aggregation

4.7.5 What are the most common data mining mistakes/ blunders? How can they be alleviated or completely eliminated?

1.Selecting the wrong problem for data mining 2.Ignoring what your sponsor thinks data mining is and what it really can and cannot do 3.Beginning without the end in mind 4.Define the project around a foundation that your data cant support. 5.Leaving insufficient time for data preparation. 6.Looking only at aggregated results and not at individual records 7.Being sloppy about keeping track of the mining procedure and results 8.Using data from the future to predict the future 9.Ignoring suspicious findings and quickly moving on 10.Starting with a high profile complex project 11.Running mining algorithms repeatedly and blindly. 12.Ignoring subject matter experts 13.Believing everything you are told about data 14.Assuming data keepers will cooperate 15.Measuring your results differently from the way your sponsor measures them 16.Ignoring deployment phase Ways to minimize these risks are basically the reverse of these items.

4.5.5 Briefly describe the general algorithm used in decision trees.

A general algorithm for building a decision tree is as follows: 1. Create a root node and assign all of the training data to it. 2. Select the best splitting attribute. 3. Add a branch to the root node for each value of the split. Split the data into mutually exclusive (non-overlapping) subsets along the lines of the specific split and mode to the branches. 4. Repeat steps 2 and 3 for each and every leaf node until the stopping criteria is reached (e.g., the node is dominated by a single class label).

5.9.2 What is clickstream analysis? What is it used for?

Analysis of the data and information collected by Web servers Used to discern interesting patterns from the clickstreams.

4.4.5 How does CRISP-DM differ from SEMMA?

CRISP-DM includes business and data understanding while SEMMA assumes these steps have already been identified and understood.

4.4.1 What are the major data mining processes?

CRISP-DM, SEMMA, KDD

5.3.3 What are some of the benefits and challenges of NLP?

Challenges: •Part of speech tagging •Text segmentation •Word sense disambiguation Benefits: •Computer does the work of understanding the data for you

4.4.4 What are the main data preprocessing steps? Briefly describe each step, and provide relevant examples.

Data Consolidation- collect and combine data Data Cleaning- remove unnecessary data Data Transformation- normalize and discretize data, create new attributes Data Reduction- reduce dimension and volume, balance data

4.7.1 What are the privacy issues in data mining?

Data that is collected, stored, and analyzed in data mining often contains information about real people. This includes identification, demographic, financial, personal, and behavioral information. Most of these data can be accessed through some third-party data providers. In order to maintain the privacy and protection of individuals' rights, data mining professionals have ethical (and often legal) obligations.

5.4.1 List and briefly discuss some of the text mining applications in marketing.

Enables better CRM and allows companies to understand customer's perceptions through analyzing unstructured data collected by call centers or reviews.

5.3.1 What is NLP?

Natural language processing- Structuring a collection of text. Studies "understanding" of the natural human language.

4.2.1 Define data mining. Why are there many different names and definitions for data mining?

Non-trivial process of identifying valid, novel, potentially useful patterns in data. Data mining is a blend of many different disciplines, so the names and definitions vary between disciplines.

4.2.4 What are some major data mining methods and algorithms?

Prediction •Classification - decision trees •Regression •Time series Association •Market basket •Link analysis •Sequence analysis Segmentation •Clustering •Outlier analysis The learning algorithms are either supervised (requires a target variable) or unsupervised (not target)

5.6.1 What is sentiment analysis? How does it relate to text mining?

Sentiment analysis tries to answer the question, "What do people feel about a certain topic?" by analyzing data related to opinions of many using a variety of automated tools. They use the same techniques but sentiment classifies positive/negative opinions instead of concepts.

4.2.5 What are key differences between the major data mining tasks?

Some are explanatory and some are predictive

5.2.1 What is text analytics? How does it differ from text mining?

Test Analytics includes information retrieval and text mining. Text mining is focused on discovering new and useful knowledge from textual data sources. Text analytics is broader.

5.2.2 What is text mining? How does it differ from data mining?

Text mining is a semi automated process of extracting knowledge unstructured data. In text mining the data is unstructured, while data mining is structured.

5.3.2 How does NLP relate to text mining?

Text mining uses NLP to structure the text.

5.10.1 What is meant by social analytics? Why is it an important business topic?

involves mining the textual content created in social media and analyzing socially established networks. This is an important business topic because it helps companies gain insight about existing and potential customers' current and future behaviors, and about the likes and dislikes toward a firm's products and services.

4.7.3 What are the most common myths about data mining?

• Data mining provides instant, crystal-ball predictions. • Data mining is not yet viable for business applications. • Data mining requires a separate, dedicated database. • Only those with advanced degrees can do data mining. • Data mining is only for large firms that have lots of customer data.

5.5.1 What are the main steps in the text mining process?

• Establish the Corpus: Collect and organize unstructured data • Create the Term-Document Matrix: Introduce structure to the corpus • Extract Knowledge: Discover novel patterns from the T-D matrix

5.7.5 What is Web structure mining? How does it differ from Web content mining?

•How the pages link together •It is more related to navigation while web content is related to the content

5.10.4 What is social media analytics? What are the reasons behind its increasing popularity?

•It is the systematic and scientific ways to consume the vast amount of content created by Web-based social media outlets, tools, and techniques for the betterment of an organization's competitiveness •The increasing popularity is due to the growth of social media outlets and growth of text and Web analytics technologies.

4.5.4 What are some of the criteria for comparing and selecting the best classification techniques.

•The amount and availability of data •The types of data: categorical, interval, ratio, etc. •What is being predicted—nominal or interval value •The purpose

5.8.3 What is "search engine optimization?" Who benefits from it?

•The initial activity affecting the visibility of an e-commerce site or website in a search engine's natural search results •Companies with e-commerce site benefit when that are a top result

5.7.1 What are some of the main challenges the Web poses for knowledge discovery?

•Too big for effective data mining •too complex •too dynamic •not specific to a domain •Web has everything

5.7.3 What are the three main areas of Web mining?

•Web Content Mining •Web Structure Mining •Web Usage Mining

5.10.2 What is a social network? What is the need for SNA?

•social network - a social structure composed of individuals linking to each other •Social network analysis (SNA) is the systematic examination of social networks. Identifying social groups and the people who lead the groups (many followers)

4.5.1 Identify at least three of the main data mining methods.

Classification Association Clustering

4.5.8 Give examples of situations in which cluster analysis would be an appropriate data mining technique.

Cluster algorithms are used to solve classification problems where cases are sorted into groups. Ex: when establishing score ranges into which to assign class grades for a college class.


Kaugnay na mga set ng pag-aaral

Chapter 41- Normal Anatomy and Physiology of the Female Pelvis

View Set

BUA 380, Computer 12: Communication in Organization

View Set

medsurg tings 1-73 ATI ADULT MED SURG

View Set

Exam: 03.07 Module Three Review and Practice Exam Geometry

View Set

Ch. 8- Corporate Strategy: Vertical Integration and Diversification

View Set

Decision-Making and Cost Calculation

View Set

Chapter 31 Biology Homework and Readings

View Set