GIS Final
Root mean square error
Estimates the difference between the measured points and the transformed points for both x and y coordinates - try to minimize this - more complex transformations not always best fit - goal is to fit transformation with a RMSE of less than 5 meters
Vector data sources
Geocommunity, TIGER, Sensors, social media, GIS sites
Unordered attribute table
Records appear in order they were entered Very inefficient
Visual query
Select features by pointing to a feature via the record in a table or a symbol in a map
Ordered attribute table
Table ordered by Last Name field Works great for searching by last name, however if I search by anything else its still inefficient
An index can be created for any field and data types. a) True b) False
a
ObjectID in ArcGIS is a good choice for a primary key. a) True b) False
b
Line smoothing and thinning
uses a set of polynomial functions to fit a smoothed line to the points - uses sets of points to delete (thin) redundant points
Types of shapefiles
- .shp = main file, variable-record-length file: each record describes a shape with a list of its vertices. STORES GEOMETRIES - .dbf = dBASE table contains FEATURE ATTRIBUTES with one record per feature - .shx = index file CONNECTS GEOMETRIES W/ ATTRIBUTES, points to segments that have relationships to others. Must have these or will be corrupted - .prj = projection/coordinate system info - .sbn and .sbx = spatial index - .xml = metadata
TIGER data enforcement
- 0cells are either isolated points or adjacent to one or more 1cells - all 1cells end with exactly 2 0cells - each line segment b/n adjacent 0cells is assigned to exactly one 1cell - every place on the map is b/n noodles is assigned to a single 2cell
Surfaces
- 3D - length, width, height - adds volume to discrete representations
File formats
- DLG (Digital Line Graph) and TIGER have topology - Open geospatial data - ESRI Geodatabase, shapefiles, coverages
Mixed pixel problem
- Most important - define what is most important in your study - Winner takes all - if one cell has 49% water and 51% grass, then that cell becomes grass - Edges separate - edges become a third category
Primary key
- a column or combo of columns that has a unique value for each and every record in the table - a key is needed to join two tables together, something that both tables have in common - attributes that make acceptable keys have non-repeating values - ObjectID is NEVER a good primary key because it is assigned arbitrarily
Primary data
- collected or developed by the intended user - advantages: higher quality, more specific, access - disadvantages: more expensive, requires more time, money, and employees -ex: remote sensing, GPS, traditional ground surveys
Scanning Hardcopy Maps
- converts hardcopy analog media into digital images - places map on glass plate and passes light beam over it - measures reflected light intensity - features can drop out
Secondary data
- data collected for other purposes that can be converted for use in GIS - advantages: can be easier and/or cheaper to acquire - disadvantages: may not be correct resolution and/or format, metadata may not exist, data you want may not be available - ex: hardcopy aerial photos, USGS topographic maps, feature names from atlases, social media
Indexed attribute table
- index files can be created which order specific fields and streamline the search process - search will use appropriate index file to locate record
Common data types
- numeric: short integer, long integer, float, double - text - date - BLOG - binary large object, images and multimedia
Table relationships
- one to one relationship - one to many - many to one - many to many
Modes of digitzing
- point - stream - distance
Attribute table structure
- records/objects/features in rows - fields/attributes in columns
Raster data
- supports gridded data - fundamental unit is a cell (pixel) - all cells in a raster dataset will almost always have the same resolution - conceptually simple and computationally fast - poor at representing points, lines, and areas, but good for surfaces - suffers from mixed pixel problem - often include redundant or missing data
Erros in digitizing
- undershoots and overshoots - invalid polygons - sliver polygons
Rubber sheeting
-create control points that are identifiable in the data to be digitized AND geographic or projected coordinates can be located for those points - match created using a mathematical relationship
Raster data formats
.TIF, .JPG/JPG2000, .sid, .ECW, ESRI GRID, DEM, .BIL
Raster Data Compresion
1. Full raster encoding 2. Run-length encoding 3. Value point encoding - matrix form is inefficient because of redundancy
What does a GIS dataset consist of?
1. spatial data - geometrical data capturing location and form of a geographical feature 2. attribute data - textual info describing key characteristics of associated geographical feature
Quadtrees
2D version of run length encoding - lossless compression - entire array defined, then recursively sub-divided into quadrants until cells have same values - root represents the entire raster - can describe a cell by its position in quadtree - read from top left, top right, bottom left, bottom right
Points
Discrete, zero dimensional, occupies no space (no width or length), focus is on location, density, distribution
Rasterization
How to assign values to cells - presence/absence: good for points and lines - cell centroid: good for polygons - dominant type: good for polygons - percent occurrence: each cell's layer coded separately
Join vs Relate
Join: appends fields from second table with data for each record where a key field match is found. For 1:1 or M:1 only. Relate: allows automatic access to a related table's records; keeps tables physically separate. For 1:M or M:M
Run length encoding
Store each run length (start at top, go from left to right) - individual numbers are listed as pairs ex: row 1 - 99666667 means (2,9) (5,6) (1,7) - lossless compresion
TIGER
Topologically Integrated Geographic Encoding Referencing System - noodles = lines - cells -- 0-cell: a) wherever 2 noodles cross or b) noodle terminates (node) -- 1-cell: each length of noodle b/n 2 consecutive 0-cells (arc) -- 2-cell: each group of consecutive 1-cells forming an enclosed area that does not contain any 1-cells that are not part of the boundary (polygon)
TIN
Triangular Irregular Network - Raster DEM (digital elevation model) may not be best to represent surface - referred to as DEMs or DTMs (digital terrain models) - preserve topology
Attribute query
Use the aspatial characteristics of a feature as a criterion
Spatial query
Use the spatial relationship between features as a criterion
Both vector and raster data are eventually stored in machine readable format that consists of binary digits. a) True b) False
a
Changing vector to raster is straightforward whereas raster to vector conversion can introduce odd errors such as false or lost connections among spatial features. a) True b) False
a
In relational databases, normalization is needed to reduce the redundancies by splitting the relations into many tables. a) True b) False
a
Which of the following raster encoding produces a large file size? a) Full raster encoding b) Run-length encoding c) Quadtree d) Value point encoding
a
_______________ affects results when point-based measures of spatial phenomena (e.g., population density) are aggregated into districts. The resulting summary values (e.g., totals, rates, proportions) are influenced by the choice of district boundaries. a) The modifiable areal unit problem (MAUP) b) Reclassification c) Buffering d) Dissolve operation
a
_______________ reduces over and undershoots within a specified threshold/tolerance. a) Snapping b) Root Mean Square Error c) Stream mode digitization d) Line smoothing
a
Bit
a binary digit that can have two values: on (1) or off (0) come in sets of eight (=byte) 8 bits = 1 byte
Foreign keys
a field in a table that has exactly the same value as the primary key column of a row in another table - a primary key-foreign key pair is needed to join two tables
Candidate key
a subset of attributes of a super key may also be a super key, and is called a candidate key
Which of the following are TRUE for georeferencing? a) Georeferencing involves capturing the map, and sometimes the attributes b) Georeferencing is the conversion of spatial information into digital form c) Georeferencing uses developable surfaces for map projections. d) Georeferencing leaves a "stamp" on the data. The method of geocoding can influence the structure and error associated with the spatial information that results e) Can involve address matching (=geocoding; image above)
a, b, d, e
Select the conditions necessary for dissolving polygons. a) Polygons need to be adjacent. b) Polygons need to have a similar size. c) Polygons need to have the same value of an attribute. d) Polygons need to intersect with each other.
a, c
Select the true statements about SQL. a) A powerful language which can be used to define one or more criteria that can consist of attributes, operators, and calculations b) SQL refers to Spatial Query Language c) ArcGIS supports a subset of function of the standard SQL, it also supports GIS queries that are not covered by a standard SQL d) In ArcGIS, SQL is used to select features with the Select by Attributes dialog box
a, c, d
Which of the following are true for Root Mean Square Error (RMSE)? a) Affected by transformation errors of rotation, translation and scale change. b) The objective is to try to maximize RMSE c) More complex transformations not always provide the best fit, even if they produce lower RMSE d) Estimates the difference (error) between the measured (known) points and the transformed (fit) points for both the x and y coordinates
a, c, d
Select the true statements. a) Raster data structure is simpler than vector. b) Analysis of continuous data is simpler with vector data model. c) Vector data is often easy to modify due to simple data structure. d) Raster data model is good for representing images and surfaces, but discrete features may show "stairstep" edges.
a, d
Relationships
associated b/n two or more objects in a geodatabase that can exist between spatial objects (features in feature classes), nonspatial objects (rows in tables), spatial and nonspatial objects
Functional dependency
attributes are functionally dependent if at a given point in time each value of the dependent attribute is determined by a value of another attribute
Data collected for other purposes, which can be converted for use in a GIS, is an example of primary data collection. a) true b) false
b
If GPS picks satellites close together (rather than far apart) in the sky, range of uncertainty decreases, therefore, the error is minimized. a) True b) False
b
Raster data model is better suited to represent discrete features. a) True b) False
b
The query below will run without errors: SELECT FirstName, LastName FROM Student WHERE Instructor = Garrison a) True b) False
b
To join a table, the two fields (primary and foreign key) can be in different data types. a) True b) False
b
Which of the following source of error can be removed completely? a) Atmosphere b) Selective availability c) Poor geometry d) Multipath
b
Select the queries that will return results without an error. a) SELECT FirstName, LastName FROM Student WHERE Instructor = Garrison b) SELECT FirstName, LastName FROM Student WHERE Instructor = 'Koylu' c) SELECT * FROM Student d) SELECT FirstName, LastName FROM Student WHERE Grade = 'B'
b, c, d
Select the strategies that can be used in converting a vector feature set to a raster. a) Quadtree b) Presence/Absence c) Cell Centroid d) Dominant Type e) Percent Occurrence
b, c, d, e
Which of the following are examples of secondary data? a) GPS b) Feature names from atlases c) Remote sensing d) Traditional ground surveys e) Hardcopy aerial photos f) USGS topographic maps
b, e, f
What does the Pseudo Random Code (PRC) do in a GPS receiver? a) Reduce range uncertainty. b) Find the best coverage. c) Measure lag time within the signal and allow communication with a GPS satellite. d) Connect to a new satellite.
c
What is the corresponding decimal (base 10) value of the binary code 00011000 ? a) 11,000 b) 2 c) 24 d) 128
c
Which of the below is the term used to describe the below process? 1. Create control points that are identifiable in the data to be digitized AND geographic or projected coordinates can be located for those points 2. "Match" created using a mathematical relationship a) Scanning b) On-screen digitizing c) Rubber sheeting d) Smoothing
c
Which of the below represent the disadvantages of primary data? a) Metadata may/may not exist b) May not be correct resolution (spatial/temporal) and/or format c) May be more expensive (time, money, employees)
c
__________ is a classification method that divides polygon features into groups so that the total area of the polygons in each group is approximately the same. a) Equal interval b) Quantile c) Equal area d) Natural Break
c
Which of the below is a raster data source? a) TIGER b) Polygon shapefile of the States c) Satellite imagery d) SPOT Data e) Landsat Data
c, d, e
Georeferencing
capturing data from analog maps and text through scanning and/or digitizing - conversion of spatial info into digital form - involves capturing the map, and sometimes the attributes - can involve address matching (geocoding) - leaves a stamp on the data
Conversion b/n raster and vector
changing vector to raster is straightforward; raster to vector can introduce errors
Primary keys
chosen from the set of candidate keys
Select the methods that help reduce the slivers. a) Check your input layers and redefine boundaries with highest coordinate accuracy, replace/fix these before overlay b) Manually identify and remove them c) Use a snapping tolerance distance during overlay d) All of the above
d
Which of the following correctly identifies the Multipath problem? a) Interference of the signals by the atmosphere b) Orbit error c) Errors caused by geometric arrangement of satellites. d) Errors caused by the bounce and reflection over the Earth's surface.
d
Which of the following is NOT a mode of entering coordinate data for points? a) Distance mode b) Stream mode c) Point mode d) Topology mode
d
Which of the following is NOT true for raster data model? a) Conceptually simple and often computationally fast b) Suffer from the mixed pixel problem c) Correspond to natural data model for scanned or remotely sensed data d) Good for representing discrete features, but poor at representing surfaces
d
Areas
discrete, 2D (length and width), focus is on length, orientation, area, shape
Lines
discrete, one dimensional (length, no width), focus is on length and orientation
Select all the correct choices for the statement: SQL can be used to: a) Create - construct a new data table b) Select - query one or more rows from a table / multiple tables c) Insert, Delete, Update - edit the table or the values in the table d) Drop - discard a data table e) All of the above
e
Select the correct statements about run-length encoding a) Stores each run length starting at top and left to right. b) Run-length encoding is not useful for files that do not have many runs (values are often different in adjacent cells). c) Stores pairs instead of individual numbers. d) It is a lossless compression which allows the original data to be perfectly reconstructed from the compressed data. e) all of the above are correct
e
Which of the following are true for Database Management Systems? a) DBMS allows for centralized control and maintenance b) Must support a diverse user community. c) A DBMS requires conceptualization of the data in the form of a model d) DBMS is a software application designed to provide efficient/effective way to store and retrieve data e) all of the above
e
Spatial and Attribute Data
features and attributes are linked by a unique integer identifier: - shapefile: FID column - feature class: OBJECTID column
Data models
fields and objects = conceptualizations, ways in which we think about geographic phenomena - not designed to deal with limitations of computers
On screen digitizing
if data already exist in digital form AND possess spatial info, it is possible to digitize directy
Binary
machine code that represents decimal (normal) numbers
Topology
maintenance of spatial relationships between geographic features - spatial relationships between adjacent or neighboring features - can share endpoints, boundaries, segments. There can b overlap, share vertices,
Data structures
methods of representing data model in digital form
Spaghetti Polygon Chains
no topology - features are spatially independent if they share a border or vertex - dual line encoding: internal border can be stored twice and it is easy for each copy to be different
Topology enforcement
objects used to describe spatial variation must obey simple rules: 2 areas cannot overlap, every place must be w/n exactly one area, or on a boundary - can build objects out of digitized lines - linear features must begin at a node and end at a node (from and to node, respectively) - info about left and right polygon bounding are stored with lines - nodes occur at all intersections - polygons must close - planar enforcement is applied
Super Key
one or more attributes that may be used to uniquely identify every record (row) for a table
Raster data perimeter calculation
perimeter = sum of(grid cells/edge * resolution) - perimeter = 8 grid cells/edge * 2km/side * 4 edges = 64 km perimeter
Snapping
reduces over and undershoots within a specified threshold/tolerance
Raster Data sources
satellite imagery, landsat data, SPOT data, Earth Explorer, DEMs, LiDAR, sensors
Tables
set of data elements (values) that is organized using a model of vertical columns and horizontal rows - all rows in a table have the same columns (fields) - each column has a data type, like integer, decimal, number
Byte
since 1 byte = 8 bits, has integer values of 0 to 255
Error propagation
small digitizing errors can scale up to large errors in GIS data layer
Database Management System (DBMS)
software package that enables us to organize data and retrieve info when needed - Microsoft access, ArcGIS, NoSQL, Oracle
Vector representations
spatial objects: point, line, polygon - widely used
Shapefiles
store non-topological geometry and attribute info for the spatial features in a dataset - geometry for a feature is stored as a shape comprising a set of vector coordinates
Joins
tables can be joined or combined together using primary keys
Relational Database Management System (R-DBMS)
tables, databases, and DBMS - data are building blocks - data models are design plans - database is construction phase advantages: columns, tables, indexes support DBMS, data independence, multiple user views, centralized control and maintenance disadvantages: may require specialized training to design, use and maintain. defining relationships can be complex