# «By James R. Keron Graduate Program in Anthropology Submitted in partial fulfillment of the requirements for the degree of Master of Arts Faculty of ...»

2. XYZ Input Files In order to prepare the data required for the analysis, the chert types of the debitage were determined and a spreadsheet constructed showing the artefact type, chert type, and the location in Cartesian Co-ordinates. As normal CSP practices involve recording transit readings (distance and direction from a known point) or compass readings (two directions from each of two known points), it is first necessary to convert the transit or compass readings into Cartesian co-ordinates as most mapping programs require this format to define position and MFworks is no exception. This calculation is done simply using spreadsheets (Keron and Prowse 2001) available on the London

**Chapter, OAS web site. A sample set of spreadsheet data follows:**

The data are then imported into the GIS system by importing a AXYZ file@. The X and Y are the two Cartesian Co-ordinates from the above spreadsheet and the Y value is the actual count of artefacts at each spot. In the analysis conducted here, there is only one item in each row of the spreadsheet so the value is set to A1".

A second issue arising from CSP field methodology occurs when several items are cataloged at the same pick-up point. While this procedure greatly expedites the time required in the field to collect the original data, it complicates plotting as several items map to the same physical location. The GIS script is capable of dealing with multiple finds at the same point since the AScore@ function can accept the totals at each find spot.

However, it is desirable visually to see the actual distributions used. In order to break up multiple finds at the same location, a short GIS script was run after each XYZ file was imported into the GIS. This script will take the value of a specific point and create the same number of individual points within a couple of meters (the GIS provides the scaling) of it, thus, creating a visual representation of the original density of recovered material. The script has one minor drawback in that a recurring pattern is created rather than a random pattern that would be visually more appealing.

Maplayer1 is the default name of the imported XYZ file. It will contain the number of artefacts recorded at each find spot after the import of the XYZ file. The result of the filter operation will be one dot on the map for each artefact recorded at that spot.

These dots will be within two metres of the point recorded as the find spot. This scattering is accomplished with a set of calculated values in the filter map that will lead to a series of numbers in the resulting map that vary above and below the value A1" at predefined points as defined in the filter. The script can accommodate up to twenty items at the same point and the result will have exactly the same number points equal to or greater than A1" as is represented at the particular find point. For example, if the value is A4", there will be four points equal to or greater than A1" and sixteen less than A1". All points less than A1" are dropped and all points greater than or equal to 1 are changed to 1 in the final ARecode@ operation. Thus the value of A4" in the example becomes four individual points with the value A1".

The preceding discussion, particularly, assumes a knowledge of MFWorks in general and the AFilter@ operation in particular. Without that knowledge, the preceding paragraph will not make sense. It has been included here as documentation for how the script works.

3. Spatial Analysis With an assignment of the internal space of the site in place and the artefacts being analyzed imported, the next step is to examine the artefact distribution over these areas looking for patterns. This process simply involves counting the number of flakes of each type in each sub-area and then calculating the percentage of each source type by subarea. As the flakes found within each area can be considered a sample, in the statistical sense, from that area, it is necessary to allow for sampling error to determine whether or not the differences are statistically significant. To do these calculations, more complex statistics are not required and simple confidence intervals can demonstrate non-random variation. The use of confidence intervals brings some assumptions about the nature of the data being used as the confidence interval is a parametric measure. The primary issue from the statistical perspective is that of the randomness of the sample. In the case of a CSP, if the entire site is clearly covered, that is we are not dealing with part of the site being inaccessible due to different crop cover or a bush lot or the use of different methods such as a CSP in a ploughed field combined with test pitting in a bush lot, and the entire CSP has been executed at the same point in time, then it is reasonable to assume that the sample is representative of the entire site. A CSP should meet the requirements of the confidence interval statistic.

The data from the CSP as described above is plotted against the various spatial units as defined in the map CulturalAreas by selecting various types and entering it into the GIS as individual maps (i.e. one for each kind of chert). An analysis can then be run showing summaries of total type and percentage by each zone.

**Once the total of each type has been calculated for each spatial unit, a confidence interval for the percentage of each category, such as Kettle Point chert, is calculated using the following formula (Wonnacott and Wonnacott (1990: 5):**

What the confidence interval means is that the real value of the entire population being measured falls within the range of the specified confidence interval 95% of the time. The calculated range is similar to the range established for a radio-carbon date except that the radio-carbon dates are expressed as one standard deviation and thus the real date lies within the range only 66% of the time. The size of the confidence interval is inversely proportional to the size of the sample. Bigger samples result in a narrower range. Thus, each spatial unit has a confidence interval assigned and it is then necessary to compare the ranges of the intervals against each other. In the simple case, if two middens have confidence intervals that do not overlap, then there is a statistically significant difference in the distributions. For example, if one area of a site has 60 flakes of Onondaga chert out of a total of 396, the confidence interval is 15 +/- 3 %. If another area has 15 flakes out of 617, the confidence interval is 2 +/- 1 %. The two intervals do not overlap and the difference between the middens is statistically significant.

This process is hypothesis testing with H, the null hypothesis, stating that the percentages in each spatial unit are similar to each other. Any observed differences are simply the result of sampling error. The hypothesis being tested H is that there are significant internal differences in the distribution of material over the site.

To implement the calculations in MFWorks requires the use of a number of

**mathematical functions of the GIS. The site data described above is entered into**

MFWorks using a "XYZ" file that allows a surface scatter to be plotted. These maps must be aligned properly with the CulturalAreas map that allocated the village space. The amount of each chert type is then counted by running a "Score" operation against each area which totals the number of each type per sub-area. These numbers are then used to calculate the confidence interval values for each sub area of the village. The final "Combine" is simply used to create a single legend with all of the pertinent data. The

**script to do these calculations follows:**

As noted above, when the confidence intervals do not overlap the determination of statistical significance is easy and can be made directly from a review of the legend.

However, a problem arises when there is partial overlap. In this case, more statistical calculations are necessary to determine whether or not the differences are statistically significant. It was not possible to implement these calculations in the GIS as it involves comparison of each area of the site with all other areas of the site. In order to calculate whether the differences between areas were statistically significant the data from the legend produced by the preceding script were entered into an Excel spreadsheet that performed the calculations using the following formula to compare each pair of

**areas. The formula is taken from Wonnacott and Wonnacott (1990):**

Interpreting the results of this calculation is simply answering the question, "Is zero included within the resulting confidence interval. If the answer is "yes" the differences are not statistically significant. If the answer is "No" the differences are statistically significant. The results of these calculations on the individual sites are included in Appendix C: Chapter Five Tables and the differences that are significant are highlighted.

The result of this analysis is that the differences in percentage of various site areas can be quickly calculated and compared. Once the initial maps of artefact distributions are prepared it is relatively simple to run a number of iterations on the analysis simply by creating different maps defining the cultural areas.

Appendix C: Chapter 5 Tables Abbreviations Used in Chapter 5 Tables

Appendix E: Chapter 7 Tables This table maps the observations from chapters five and six against the original behaviourial hypotheses from chapter three. The left hand column contains the hypotheses, the centre column contains observations that support the hypotheses and the right hand column contains observations that invalidate the hypotheses as stated.

The observations are referenced through an abbreviation in the form An-xxxx-m@ where An@ is the chapter (5 or 6), Axxxx@ is an abbreviation of the appropriate section and Am@ is the number of the observation within that section.

Section title abbreviations are as follows.

1981 The Archaeologist as Detective: A Lesson in History. Kewa 81-4:1-7.

1981 The Brian Site: A Late Prehistoric Neutral Village Middlesex County, Ontario.

Kewa 81-6:2-10.

1983 The Harrietsville Site: The 1981 Excavations. Kewa 83-3.

1983 Archaeological Survey of the Townships of Westminster and North Dorchester:

License Number 81-74. Report on file at the Ministry of Culture, Toronto.

1983 An Annotated Bibliography of Kewa Articles. ARCH NOTES 83-5.

1984 Archaeological Survey of the Townships of Westminster and North Dorchester - 1983. License Number 83-42. Report on file at the Ministry of Culture, Toronto.

1986 CSPMAP: A Surface Distribution Plot Program. Kewa 86-1:3-10.

1986 The Embro International Airport Survey: Archaeology in the Classroom.

Kewa 86-5: 4-20 1986 The Iroquoian Occupation of Southeast Middlesex County, Ontario.

Honours Essay on file at the University of Waterloo, Waterloo, Ontario.

**Conference Participation:**

1996 Summary Panel Discussant, IEEE Second International Workshop on Systems Management, Toronto, Ont.

1998 Moderator, User Experience Panel, IEEE Third International Workshop on Systems Management, Newport, Rhode Island 1998 Summary Panel Discussant, IEEE Third International Workshop on Systems Management, Newport, Rhode Island 1998 An Architecture for Mobile Computing, CASCON 98, Toronto, Ontario