GEOG 560 Exam 2
You have a raster land cover map for your favorite city. The high-density urban class is represented by the numeric value 7. Assuming the raster map is named "mylandcovermap", write the equation you'd use in Arc's Raster Calculator to make a map where only the urban class shows up as the value 1, and everything else is 0.
"mylandcovermap" == 7 Double equal sign means to select where it is this value only, while single = means "set to this value"
List or sketch the steps that Google Maps likely performs when you type two street addresses and it computes the shortest distance route.
For each address: Parse the address. Find it in its reference database Interpolate if necessary and location position on a network data base. Then it needs to use an algorithm like Djikstra's to find all possible routes between the two locations, likely using real-time network impedance to find routes Finally identify the least cost path between the points.
You've got raster layer depicting ocean temperature in degrees C, called "oceantemp". You also have a raster layer depicting chlorophyll concentration (mg/m^3) , called "oceanconcn". Both have raster cells that are the same size, and aligned with each other well. Show the raster algebra statements you could use to 1) create a raster mask of ocean temperature less than 16 and 2) a map of chlorophyll concentration within areas of ocean temperature less than 16.
"oceantemp" < 16 "oceanconcn" * ("oceantemp" < 16) Note that you can use parentheses to enclose an operation to do first. You could extend this to the example I showed of calculation illumination, where multiple layers are combined to create a single new layer.
You're checking your digital elevation model by visiting sites in the field with your GPS. You use the X, Y coordinates to locate which pixel of the DEM you're in, and then check the GPS Z-values (elevation) against the DEM elevation. What are at least two sources of error that might cause these numbers to rarely agree exactly?
- Error in original DEM - Size of raster cells - X,Y of GPS is wrong - Z-value of GPS is wrong
Your data range from 550 to 1750 units. Assuming you use a linear display function to display the full data range on a standard greyscale monitor, what display value would be displayed for the data value of 1250?
- Use y = mx + b - Data range is the X values (550, 1750) - Display is Y values (0, 255) - Slope (m) is rise/run or (1750 - 550) = 1200 over (255 - 0) [changes in X and Y) = 0.2125 Y = .2125 * X + b We know that Y is 255 and X is 1750 255 = (0.2125 * 1750) + b Subtract (0.2125 * 1750) on both sides to get b alone b = -116.875 Plug in 1250 to the X part of the equation Y = (0.2125 * 1250) - 116.875 = 148.75 Most computers will round up to 149.
You are in charge of managing raster data for your group, including collections of many model output runs. A single model output run is 1000 by 1000 cells. Valid data values for each cell range from 0.0 to 9,499.9 Assuming the decimal portion of the number is important, how many bytes would a single output layer contain? If the decimals could be dropped, how much space would you save?
-For decimals, float type required which is 4 bytes per cell (1,000 * 1,000) * 4 = 4 million bytes - For no decimals, no float type required can use 2 bytes per pixel or 16-bit data type (1,000 * 1,000) * 2 = 2 million bytes
What is the range of unique grayscale values that a computer monitor element can display?
256 unique values -- 0 to 255.
Your friend wants to locate a cafe. He has identified the addresses of three candidate properties, and he wants to find the the one of those three that is furthest from the nearest competitor cafes. Assuming you can get a network map of streets and a database of existing cafes, which type of network analysis might you attempt to help him out? How?
A location-allocation approach would be helpful here. I would approach this by separately running a location allocation focused on Market Share for each of the three candidate locations, taking into account the competitors' locations. The one with the greatest market share would be the preferred.
Your friend has scanned an old air photo, and you are trying to find railroad lines in it. What kind of raster filter might you try to accentuate the edges of the railroad lines?
Assuming the railroad lines are linear features with some degree of contrast between them and the surrounding areas, any type of edge detection kernel filter would help. You would run that filter and examine the resultant output image -- edge pixels of all sorts would be accentuated. If the railroad lines appear to follow one of the cardinal directions, a directional edge filter might work even better, but it wouldn't find railroad lines that run perpendicular to that.
A friend wants to use a 30m DEM to understand species occurrence on north slopes, but wants to relate aspect to climate data with pixels of 1km grain size. What two sequential operations should they conduct to convert the 30-m DEM to a 1km aspect raster?
First, use a raster aggregation to aggregate the 30m elevation values to 1km equivalents, and then calculate the aspect. Optionally, if they are truly only interested in "northness", they could take the resultant aspect value in degrees and convert to northness by taking the cosine of the aspect. The cosine of 0 degrees or 360 degrees is 1, while the cosine of 180 is -1. Thus, all aspects that point north would have values near 1. East and west aspects would both get the value of 0 (do the math to convince yourself!).
You use Yelp on your GPS-enabled phone to find restaurants nearest to you. Describe the steps and the types of dataset that are being used behind the scenes to perform this task.
First, your GPS finds your location, and sends it to Yelps geocoding reference database to find where your location is on a network. Then it searches the attributes of its network database for restaurants -- using something like our "selection" queries we've done in class. Finally, it calculates the cost-distance between your location and all of the restaurants, and identifies those with the closest distance using the method described in the prior question.
One type of kernel operation is called the "majority" filter. It identifies the single most common value in a 3 by 3 or 5 by 5 kernel. Why would this filter be more appropriate for filtering a land cover map than, for example, a low-pass filter?
For land cover maps, each cell has a discrete value code whose numeric value does not imply any quantity. It is simply a code, and it is not appropriate to apply mathematical operations that assume that the relative values of the numbers have meaning relative to each other. Low-pass filters treat each cell as a value whose numeric value is meaningful, and thus a low-pass filter would essentially average the codes within the kernel window, and that is not a meaningful operation. Majority filters, on the other hand, simply identify some unique value -- here, the code number -- in the kernel, so they do not apply any mathematical operation on the numbers themselves.
Sally tracks an animal that makes burrows on hills but predates on fish in the stream. She has GPS'd the burrows and a vector layer of streams. How could she use a raster operation to help her figure out how far each burrow is from a stream?
From her vector layer, she could produce a distance raster either from the horizontal distance, or she could use the slope to calculate slope-distance raster, where the distance surface reflects the greater actual distance covered on a steep slope for a given change in horizontal position. She could then extract the distance raster value for the points that she GPSd for the burrows.
A friend uses a point-based dataset with incomes of individual households in your city to make a raster map of income level using the spline technique. She is excited to identify hotspots of very high income that nobody knew existed before. Do you believe her discovery? Why or why not?
I do not believe her, at least not without more evidence. The spline technique can lead to over- or under-shoots in data, and thus create high and low values that are entirely an artifact of the method.
Your friend uses IDW to build a surface from a sample of points. He dutifully checks the error of prediction by comparing his predicted surface values to his source points, and is happy when he finds that he has almost no error! He then claims that his whole surface must have almost no error. Why is he wrong?
In most cases, the IDW approach use an exact interpolation approach, meaning that the values at measured locations will, by definition, have zero error. Even if the approach does not use an *exact* interpolation, the estimate right near the point is highly related to that point, and will always be close the actual value even if the intermediate space between points is quite wrong. Thus, his assessment of error is not relevant to most of the map.
Fill in the empty cells in this raster calculation:
In the review questions, this is a raster calculation times each other, so the values in Raster 1 are multiplied by the values in raster 2 and the resultant values are Raster 3
Fill in the empty cells of the raster logical operation: == 4
In the review questions, this is a raster calculation where we are only interested in finding the cells that have a value of 4, so the resultant raster is a raster that has 1s where Raster 1 had 4s and 0s every where else
If my spectral index is defined: Index = Band 1 / Band 2, fill the "index value matrix." Approximations to first decimal are okay.
In the review questions, this is just dividing the band 1 values by band 2 and the resultant values are the index.
You open a file that should be landcover map with interesting colors, but instead shows up as a black and white image. Moreover, when you inspect the image, the names of the classes do not show up. What might have happened to the file?
It appears that the metadata that holds the landcover lookup names and colors has been lost or corrupted. If you have a version of the landcover whose colors and labels make sense, you can save the symbology information to a stand-alone file. See: https://pro.arcgis.com/en/pro-app/help/data/imagery/symbology-pane.htm
You type an address into a geocoding database and it places you on the correctly named street, but the other side of town. What part of the geocoding process is likely in error?
It's likely that the parsing of the address has somehow found a different descriptor -- say NW instead of SW -- of your street.
You open a raster layer in Arc and it is all black, but your cursor indicates that cells on the screen have actual data values. What might be happening and how might you fix it?
It's likely the stretch is wrong. It could be that there are outlier high values that are extending the upper end of the range, and making all of the good data values display as black. To evaluate, you could look at the histogram of the values in the symbology window, or simply look at the minimum and maximum values in that window. Your best bet would be to change the minimum and the maximum values in the symbology settings. You could do this manually, or you could use one of the stretch options that clips off the extreme values.
Describe how trend analysis and kriging differ in the assumptions of spatial structure of your data.
Kriging assumes that spatial interactions between data points exist, and that these spatial relationships can be modeled with a semi-variogram, and further that those can be applied in the kriging process. A trend analysis makes no explicit claim about spatial relationship between points - the entire population is considered as one entity with a single spatial interpolation field.
Your friend has a study site with coordinates in Oregon state plane coordinate system. He gets an NLCD map to describe the study site, but it's in Albers Conformal Conic. What type of resampling approach (bilinear, nearest neighbor, cubic convolution) do you recommend to maintain the integrity of the NLCD land cover codes? Why?
Nearest neighbor resampling is pretty much the only choice here. It retains the values of the land cover labels, and doesn't attempt any mathematical averaging, which would be inappropriate for class data.
You've interpolated some points and still don't think that the surface fits the points very well. Would changing the cell size of the output grid help? Why or why not?
No. The grid cells are simply a sampling of the underlying interpolation surface. Changing the cell size would not change the surface. Note: The thing here is that the cell size is simply a sampling of that underlying surface -- to make a better surface, we'd need to do better interpolation, either through the use of a different interpolation approach, or by acquiring more points to build the interpolation surface.
If landcovertype needleleafforest is assigned a code value of 10, and landcovertype broadleadforest is assigned a code value of 20, is it right to assume that a landcovertype called "mixed needleleaf/broadleaf" would have a code value of 15? Why or why not?
No. The land cover codes are arbitrary numeric values. The don't correspond to a quantity, and thus averaging or other math on them does not make sense.
You have several field observations that are GPS'd well in your local datum, and you need to know exactly which land cover class they represent in the NLCD map, which is in an entirely different datum and projection. Would you have better results reprojecting the points to the NLCD projection, or the NLCD map to the projection of the points? Why?
Reproject the points into the NLCD projection. Point-data can be reprojected without needing to resample, which is required with raster reprojection. Moreover, because we are resampling a landcover map, we would need to use a nearest neighbor resampling, which can lead to duplicate or missing pixels if the projections are warped enough. Given that there is a datum conversion necessary as well, it's important to minimize any possible sources of error.
Your remote sensing geek friend talks about the "near infrared" band. Translate that for your non-remote-sensing-geek friends.
Satellites take pictures of the earth just like our digital cameras do: there's a lens that focuses light onto grids of photo-sensitive sensor elements. In our cameras, there is typically one grid of sensors to take a picture of red light, one of green light, and one of blue light. These are combined by our phone to make combined colors of the rainbow. Red, green and blue light are forms of electromagnetic energy. The sensors that take red pictures only see light with electromagnetic energies in a narrow range of values -- this range of values is named the "band" of energy that the sensor can measure. Satellite sensors can be built to have sensor grids that are sensitive not just to visible light, but also electromagnetic energies that we can't see with our eyes. The "near infrared" band is a sensor tuned to a band of wavelengths in the near infrared part of the spectrum. The near infrared part of the spectrum has light with wavelengths of roughly 700 to 1000 nanometers.
Fill out the missing value in the raster grid (indicated by the ?). Assume you are in the middle of the raster layer, so edge calculations are not invoked. (Applying a 3 by 3 mean filter)
Take the neighborhood around the cell (considering the unknown cell as the center of the neighborhood) and calculate the mean of those values in the original raster to get the new raster cell value
You register an old image to your study site in Portland using road intersections as control points. Using only four control points and a quadratic resampling function, you find an excellent RMS error of around a half a pixel. But you're disappointed to see that many roads do not match up. What is happening? How would you fix it?
The best approach would be to gather many more GCPs -- four points is too few. If more points cannot be done, then a first-order polynomial is the next best solution, because it would lead to less distortion.
A colleague has registered an airphoto at their study site using ground control points they measured only at their site. They used a third-order polynomial to reproject the photo, with an RMSE of 3.5 meters. Your study site is in the next valley over and does not overlap with their site at all, but it is also on the same photo. Should you use their reprojected photo for your work, or should you re-project the photo using your own GCPs? Why?
Unfortunately, you should endeavor to collect new GCPs. The third-order polynomial has the mathematical tendency to swing steeply outside the domain of the points constraining it. Since your valley has no points from your colleague's work, your valley is likely highly distorted in her reprojection. Best to constrain your valley to points that are geographically within it.
You have field plots scattered randomly around your study area, which has steep slopes. You'd like to know which ones are easier to access from the road, factoring in both the distance from road and the steepness. How would you do this?
The general strategy would be to calculate a distance raster from a roads vector, but add in a cost surface that is related to the slope. That surface could be something as straightforward as calculating the actual distance walked up a hill for a given horizontal distance change, or further penalizing that as the slope got steeper. The actual distance walked is the hypotenuse of a triangle whose base leg is the distance in the horizontal dimension and whose other leg is the vertical distance climbed. The cosine of the slope in degrees is the adjacent over that hypotenuse, so the length calculation in the raster calculator would simply be the cell size (the distance in the horizontal dimension) divided by the cosine of the slope. You may want to adjust that penalty surface if you feel that steeper slopes are even more difficult to cover than the simple hypotenuse would suggest. One approach might be to use the same calculation as before, but square it, so larger slopes' penalties would go up by the square.
In a semi-variogram, what does the "range" value represent?
The lag distance (separation distance) beyond which paired points are no longer spatially autocorrelated. Note: this is relevant not only in kriging, but can be used to help guide field sampling to avoid spatial autocorrelation. For example, if you believe that an image serves as a good proxy for the variable you want to sample on the ground, you could build a semivariogram of your study area from the image, and calculate the range. If you make sure that none of your sample points are closer to each other than this distance, you have a reasonably defensible means of saying that you're avoiding pseudoreplication in your measurements.
You type a street address into a website that provides coordinates from several different Geocoding sources. Why might they disagree?
The reference databases in the different databases may parse the address slightly differently, have different levels of detail in the network, or have different strategies for interpolating on the network segments.
What if the vertical units of your DEM are supposed to represent meters, but Arc assumes they are feet. Would the resultant slope values be steeper or gentler than what they should be? By what factor?
The resultant slope would be gentler than what it should be. The factor in % slope would be approximately 3.28, as this is the ratio between feet and meters. The factor in degrees slope would actually vary, since the calculation of degrees follows the atan function -- the maximum would be near 3.28 at very shallow slopes, but go down as slope is higher. The ratio at 45 degrees slope would be approximately 2.65. Why? Assume the horizontal units are in meters. As a test case, let's pick two cells that are 100 meters (change in x value) away from each horizontally, and have elevations that are 100m (change in z value) different. The percent slope would simply be 100 * dz/dx, or 100%. The slope in degrees would be atan(1), or 45 degrees. [Note: If you're calculating this in a spreadsheet, it is typically for trigonometric functions to work in radians, not degrees, so you may need to convert your answer]. If the GIS software thought the units of the Z-dimension were actually in feet, it would first convert the change in Z value using the ratio 3.28 feet / meter. Thus, the change in z value it would use in the slope calculations would be 100/ 3.28, or approximately 30.49 meters. The slope in percent would be 30.49. The atan of 0.3049 is approximately 16.96 degrees, which is different from the actual 45 degrees by a factor of approximately 2.65.
Fill in the "class codes" matrix.
To do this, you use the appropriate band values as X and Y values and assign them to the class "blob" that they fall into
You calculate the slope image from your 10m horizontal-resolution DEM. From your metadata, you know the DEM elevations are in meters. Unfortunately, they forgot to write down whether you had Arc calculate percent slope or degrees slope. To figure this out, you look at your DEM original values, and match those to the slope image. If your DEM and slope images looked like the following, which units would say your slope is calculated: degrees or percent? Explain or show your logic.
To me, the simplest approach here is to simply calculate the slope and see if it matches. Because we've got formulas from Bolstad that tell us how to calculate slope in degrees, let's recall that from the powerpoint: In our case the dZ/dy is (1005 - 995 ) / 20 with all units in meters, and dZ/dx is (1000-990)/20. Both are 0.5; squaring both, summing, and then taking the square root yields 0.707, and the atan of that is 0.6155 in radians, or 35.26 in degrees. So, it would appear the units are in degrees!
A colleague looks at a map of % impervious cover of your city, and compares it to a vector layer of the city. The vector layer delineates a big parking lot as a single entity, but % impervious layer shows much more variability within it. Why might that be?
While parking lots have a lot of pavement in them, many have vegetation features -- trees, shrubs, etc. -- scattered throughout them. Presuming the cells of the impervious cover map are smaller than the size of the parking lot, some cells might have some vegetation and thus their areal % impervious cover would be less than 100%. An adjacent cell could contain only impervious cover, resulting in some variability across cells within the parking lot. The vector layer simply defines the bounds of the parking lot as an abstraction: A parking lot is defined as much by its function as it is by the land cover components that make it up. The fact that a given parking lot has vegetation scattered throughout does not mean that it ceases to be a parking lot.
What are the fundamental differences in how you might choose to display a raster dataset with nominal data (e.g. land cover) vs. interval or ratio data (e.g. digital elevation model)
With interval or ratio data, the differences between the numeric values have real meaning -- large numbers indicate a quantity is greater than when numbers are small. For this reason, it is common to display them so the colors also imply some relationship with each other: colors and tones have a gradation that mirrors the change in the numeric value. For example, if using display colors of black and white, low values may be represented as black, with gray scale values increasing as the numeric values increase to some maximum at white. Similarly, colors might be chosen to indicate progression -- heatmaps may move from cool colors (blues and greens) to warm colors (yellow, orange, red). With nominal data, the relationship between numbers has no meaning. Thus, there is no need to choose colors that follow some gradient. Rather, then can be chosen to be intuitive representations of the classes they represent. Thus, water might be blue, vegetation green, soil brown, etc.
You and your lab-mate are attempting to find transects to survey for your field study. Each transect should be 100m long, and you'll be surveying every 5 meters. You need to know the elevation of your survey points. Your lab-mate suggests grabbing a 30-m resolution digital elevation model (DEM) and overlaying the planned transects on the DEM to extract the elevation directly from the values in the DEM. Why you might you caution against this?
Within each 30 by 30m cell of the DEM, the reported elevation is constant. If your points are five meters apart from each other, then many of them will take on the same elevation value because they will be in the same cell. This is likely not the case of the actual landscape's elevation, and if you really do need to know differences at the scale of 5meter separation, then a 30m DEM will not give you enough detail.