Structured vs Unstructured Data.

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Describe the process of joining more than one relational database.

1. FOREIGN KEY - primary key is joined to a matching foreign key. 2. JOIN QUANTITY - Joins can be 1 to 1 (if both databases have the same number of values) or 1 to many (if one database has more fields than the other) 3. BRIDGING TABLE - sometimes and intermediatery bridging table is needed to bridge the join if the databases do not have a common value. 4. SQL SUMMERIZING - SQL can be used to summarise counts e.g. how many observations were made at a particular site.

What are the issues with unstructured data?

1. RELIANCE ON TRUST - goodchild (2013) proposed that it has no provenance (ie who made it) - e.g. spot height on a USGS topographic map (how did they measure it? Did they adjust it in any way? - can we have confidence in generalizations drawn from data we may not be able to trust? 2. INEFFICIENT - as a lot of it comes from qualitative data - e.g. satellite photos classified into land use types to make quantitative data - due to large velocity and volume of data - is this efficient? Although with advancements in machine learning this process can become more automated making it more efficient. 3. MANAGEMENT ISSUES - lack of organisations can cause management problems if the data is not easy to retrieve. e.g. in Ghana - lack of structured meant that unauthorised building on public land occured. If they had structured databases of admin boundaries and land use this may be prevented. (heywood et al 2011). 4. BIAS IN COLLECTION - users who are more capable of volunteering often are highly educated with more disposable income - this means poorer areas may be less surveyed. (Hacklay, 2010) 5. INCONSISTENCY OF TAGS - e.g. a shop might be classified as a "grocerystore" or "shop" - making it had to synthesis and compare.

Describe how relational databases become normalised.

1. SINGULARITY - one field cannot contain more than one value. 2. PRIMARY KEY - values in each field must relate to the primary key - a unique ID number. If there is more than one observation for each record this is called a composite key. 3. DATA REDUNDANCY - avoid storing the same data in more than one place. 4. SQL - Structured query language can be used to retrieve data (e.g. area>= 2.5).

What are the 4 types of data? Define each and provide an example.

1. STRUCTURED - data with a set of rules as a table - each variable/characteristic has its own field in the table - examples include census data and meteorological data. 2. UNSTRUCTURED - Often generated by members of the public or web enabled devices. They do not follow the strict rules of structure data or rigorous quality control - e.g. twitter data 3. SEMI-STRUCTURED - Data that is not as organised as structure data but does follow some rules -e.g. open street map. They can have more than one value in each field. 4. BIG DATA - 3 Vs - Volume (10s of terabytes) Velocity (new data coming in constantly - e.g. people tweeting continously) and Variety (various formats e.g. text, audio and video).

Structured data can be queried with SQL - this is not possible in unstructured datasets. Explain how else information can be retrieved.

1. TEXT MINING - looking for key terms within the data e.g. "amenity". 2. DIGITAL IMAGE PROCESSING - e.g. determining land use types from satellite imagery - this is becoming increasingly possible with machine learning (training a computer to recognise and classify particular features). 3. NoSQL DATABASES - can be developed to cope with formats such as JSON and XML.

What are the benefits of unstructured data?

1. TIME ACCURACY - As more people can edit it is updated more frequently - therefore the temporal resolution is greater. Structured data has a limited budget for collection and therefore data sets are only released periodically (e.g. census every 10 years - meaning it is currently 8 years out of date). 2. EDITABILITY - If a user finds a problem they can fix it in a day - e.g. using things like "open street bug" whereas with structured data they would have to go through the lengthy process of reporting errors are waiting for them to be updated and released. - however Linus Law - the larger the review group the faster and more efficient the convergence (raymond, 1999) - but this puts the responsibility on the user who may lack the technical knowlege to refine data sets. 3. FLEXIBLE TAGGING - users make their own tags which can be flexible to their individual needs and demands for research - e.g. Groundwater:monitoring.

What is a relational database? Give examples of formats and explain how they are collected.

A set of tables with formatted rules from which data can be retrieved without reorganising the data. Examples include Microsoft access (.mdb) dBase (.dbf - used for shapefiles) and SQLite (used in QGIS) They are collected using a predefined schema (e.g. questionnaire) by scientisits/proffessionals in either the public or private sector. They have been used since the 1980s.

Explain how OSM is open source and how people edit it.

It is different to national mapping where only professionals can add to it - anyone can add to it using the Java OSM editor. This means it lacks the rigorous quality control of structured data - although it does have moderators who filter it.

What are the 6 factors to consider when comparing data sets? Who proposed this framework? TLCLPAU

Kresse and Fadae (2010) 1. TEMPORAL ACCURACY - rate of update -e.g. census = 10 years, whereas OSM is rapid. 2. LOGICAL CONSISTENCY - does this fit the geographic context? Can be evaluated using Toplers (1970) law of geography and Thorne (1997) law of hydrology. 3. COMPLETENESS - how many values are missing? - e.g. Hackley (2010) found OSM roads 69% completeness compared to Ordinance Survey (OS) 4. LINEAGE - How was it predicted? How has it evolved? 5. POSITIONAL ACCURACY - how correct are the values? Does the tag represent what it really is? Goodchilds (2013) idea of provenance and trust. 6. USAGE - How will the data be used? Is uncertainty indicated?

What is Open Street Map? Describe how it is semi-structured and how compatible it is with traditional GIS software e.g. ArcMap

OSM is the worlds largest geospatial database. It is open source meaning that anyone can edit it. It is structured in 2 ways : -ATTRIBUTE DATA - Tags are added to data - you can have multiple tags per field (breaks relational database rules) and people can make their own tags (e.g. groundwater:monitoring. - SPATIAL DATA - In nodes (points) ways (lines) closed ways (polygons) and relations. As GIS developed based on relational databases it copes well with the spatial data as this is similar to traditional shapefiles - but the attribute data is less compatible.


Kaugnay na mga set ng pag-aaral

Chapter 1: What does assessment mean?

View Set

Bio lab EXAM 2 (Exercise 36,37,38b)

View Set

Principles Of Real-estate 2 Practice Test

View Set

Strict Liability & Products Liability

View Set

Water Treatment Exam Preparation Grade 2

View Set

Chapter 20: Environmental Health

View Set

Barbicide Corona-19 Certification

View Set