e-book Grammatical and Lexical Variance in English

Free download. Book file PDF easily for everyone and every device. You can download and read online Grammatical and Lexical Variance in English file PDF Book only if you are registered here. And also you can download or read online all Book PDF file that related with Grammatical and Lexical Variance in English book. Happy reading Grammatical and Lexical Variance in English Bookeveryone. Download file Free Book PDF Grammatical and Lexical Variance in English at Complete PDF Library. This Book have some digital formats such us :paperbook, ebook, kindle, epub, fb2 and another formats. Here is The CompletePDF Book Library. It's free to register here to get Book file PDF Grammatical and Lexical Variance in English Pocket Guide.
Title, Grammatical and lexical variance in English. Author, Randolph Quirk. Edition, illustrated. Publisher, Longman, Original from, the University of.
Table of contents

The number of variants per alternation ranges from 2 to 7, most with 4 variants; the number of forms per variant ranges from 1 to 12, most with 2 forms. This situation is problematic and points to a larger issue with polysemy and homophony in our feature set, which we return to later in this paper, but crucially because the proportional use of each variant is calculated relative to the frequency of the other variants of that alternation, the maps for these overlapping variants are distinct.

After selecting these variants, we extracted the regional data for each from the BBC Voices dataset, which provides the percentage of informants in UK postal code areas who supplied each variant. Notably, these two extreme postal code areas have the fewest respondents, leading to generally less reliable measurements for these areas.

Most areas, however, are associated with far more informants and thus exhibit much more variability. There are also a very small number of missing data points in our BBC Voices dataset 48 out of 17, values , which occur in cases where no responses were provided by any informants in that postal code area for that question.

Because this is a negligible amount of missing data and because it is distributed across many variants, we simply assigned the mean value for that variant across all locations to those locations. In addition, because the BBC Voices dataset provides percentages calculated based on the complete set of variants, whereas we are looking at only the most common variants, we recalculated the percentage for each variant in each postal code area based only on the variants selected for analysis. For example, in the Birmingham area, the overall percentages for cack-handed Finally, we mapped each of the variants in this dataset.

In this case, a clear regional pattern can be seen within and across variants, with sofa being relatively more common in the South, couch in Scotland, and settee in the Midlands and the North of England. The complete set of maps are presented in the Supplementary Materials. The regional dialect corpus used for this study consists of a large collection of geolocated Twitter data from the UK that we downloaded between and using the Twitter API.

This data was collected as part of a larger project that has explored lexical variation on Twitter see also Huang et al. In total, this corpus contains 1. The median number of Tweets per account is The corpus contains data for days, with data for 5 days missing due to technical issues. To analyse regional variation in the corpus, we formed regional sub-corpora by grouping all individual Tweets by postal code regions based on the provided longitude and latitude.

Postal code regions were used to facilitate comparison with the BBC Voices data. Overall, the corpus contains postal code regions, with on average 1. Notably, we do not filter our corpus in any way, for example by excluding re-Tweets or spam or Tweets from prolific posters or bots.

Tweets from one user may also appear in different regional sub-corpora if the user was in different postal code regions when those posts were made. The Twitter corpus analyzed in this study is an unbiased sample of geolocated Tweets, similar to what a user would see if they browsed Tweets from a region at random. We believe that modifying the corpus to make it more likely to show regional patterns is a highly subjective process that necessarily results in a less representative corpus. By including all Tweets from a given region in our corpus, we have taken a more conservative choice, allowing us to assess the base level of alignment between Twitter data and traditional dialect surveys.

Removing Tweets from the corpus may lead to the identification of stronger regional patterns or better alignment with dialect survey maps, but this can only be tested once a baseline is established. Next, we measured the frequency of each of the lexical variants in our BBC Voices dataset across our postal code area sub-corpora.

We then summed the counts for all forms associated with each variant in each postal code area and computed a percentage for each variant for each alternation in each postal code area by dividing the frequency of that variant by the frequency of all variants of that alternation in that postal code area. We also mapped the percentages of all variants across the postal code areas. Crucially, we counted all tokens of the variants in our corpus, making no attempt to disambiguate between word senses.

Lexical Variation

This is the simplest and most common approach in Twitter-based dialectology, although it is clearly problematic. Automatic word sense disambiguation systems are not commonly used in corpus-based dialectology because they are difficult to apply at scale and are fairly inaccurate, especially when working with uncommon dialect forms in highly informal data. We return to the issue of polysemy later in this paper, when we consider how variation in meaning affects the overall alignment between the two sets of maps and how much alignment can be improved through the application of techniques for word sense disambiguation.

Finally, it is important to acknowledge that Twitter corpora do not represent language in its entirety. Twitter corpora only represent Twitter, which is a very specific form of public, written, computer-mediated communication. The unique constellation of situational properties that define Twitter affect its form and differentiate it from other varieties of languages, as does the demographic background of Twitter users, who in the UK are more likely to be young, male, and well-educated compared to the general population Longley et al.

These are the social and situational patterns that define Twitter and they should be reflected in any corpus that attempts to represent this variety of language. The goal of this study is to evaluate the degree to which general patterns of regional variation persist in Twitter corpora despite its unique characteristics. To systematically assess the similarity of the Twitter maps and the survey maps we measured the degree of alignment between each pair of maps. There is, however, no standard method for bivariate map comparison in dialectology.

Other than visually comparing dialect maps e. This was the approach taken in Grieve , for example, where Pearson correlation coefficients were calculated to compare a small number of maps representing general regional patterns of grammatical and phonetic variation.

Grammatical variation in national varieties of English — The corpus-based approach

This is also the general approach underlying many dialect studies that have used methods like factor analysis e. Although correlating dialect maps generally appears to yield consistent and meaningful results, this process ignores the spatial distribution of the values of each variable.

SearchWorks Catalog

Consequently, the similarity between two dialect maps can be estimated incorrectly and significance testing is unreliable, as it is based on the assumption that the values of a variable are independent across locations see Lee, Alternatively, methods in spatial analysis have been designed specifically for inferential bivariate map comparison Wartenberg, ; Lee, Most notably, Lee proposed a spatial correlation coefficient L that measures the association between two geographically referenced variables, taking into account their spatial distribution.

Lee's L is essentially a combination of Pearson's r , the standard bivariate measure of association, and Moran's I , the standard univariate measure of global spatial autocorrelation see Grieve, On the one hand, Pearson's r correlates the values of two variables x and y by comparing the values of the variables at each pair of observations i. On the other hand, Moran's I compares the values of a single variable x across all pairs of locations, with the spatial distribution of the variable used to define a spatial weights matrix w , which specifies the weight assigned to the comparison of each pair of locations i, j.

For example, a spatial weights matrix is often set at 1 for neighboring locations and 0 for all other pairs of locations. When row standardized, Moran's I can be expressed as. Combining these two measures, Lee defined his bivariate measure of spatial association L as. Lee's L is independent of scale, which is important as our maps can differ in terms of scale. In addition, pseudo-significance testing can be conducted for Lee's L through a randomization procedure, in much the same way as Moran's I.

Grammatical and Lexical Variance in English | Taylor & Francis Group

Lee's L is recalculated for a large number of random rearrangements of the locations over which the variable was measured. The set of values that results from this process represents the null distribution of Lee's L. The observed value of Lee's L is then compared to this null distribution to generate a pseudo p -value. Finally, to calculate Lee's L , a spatial weights matrix must be defined. For this study, we used a nearest neighbor spatial weights matrix, where every location is compared to its nearest n neighbors, including itself, with each of these n neighbors assigned a weight of 1 and all other locations assigned a weight of 0.

We correlated all pairs of Twitter and BBC Voices dialect maps using Lee's L , based on a 10 nearest neighbor spatial weights matrix. These results demonstrate that the regional patterns in the BBC Voices survey data and our Twitter corpus are broadly comparable.

It is unclear, however, just how similar these maps really are. Significant alignment, at any level, is not a guarantee of meaningful alignment.

Top Authors

Furthermore, given standard rules of thumb for Pearson's r , a median Lee's L of 0. We do not know, however, how exactly to interpret Lee's L within the context of this study. Ultimately, the question we are interested in answering is whether two sets of maps under comparison tend to align in a meaningful way for dialectologists. It is therefore crucial that we compare the two sets of maps visually to assess the degree of alignment, especially those map pairs that show seemingly low-to-middling correlations.

In other words, we believe it is important to calibrate our interpretation of Lee's L for dialectological inquiry, rather than simply noting that a certain percentage of map pairs show a significant or substantial spatial correlation. For example, we believe it is clear that the maps for sofa, couch and settee presented in Figure 1 broadly align.

Crucially, the result for settee suggests that what appears to be low-to-middling values for Lee's L might represent very meaningful alignments in the context of dialectology. To investigate this issue further, we examined how the visual similarity between the pairs of maps degrades as Lee's L falls. In Figure 2 , we present 8 pairs of maps with L values ranging from 0. We can clearly see that the alignment between the two sets of maps falls with Lee's L , as expected. In Figure 3 , we present 8 pairs of maps with L values around 0.

Once again, we see broad alignment across the maps, although there is considerably more local variation than most of the pairs of maps presented in Figure 2. Finally, in Figure 4 , we present 8 pairs of maps with p values around 0.


  1. Lectures on Profinite Topics in Group Theory!
  2. Professional Discourse;
  3. Mastering Microsoft® Office XP Premium Edition!
  4. Global Imbalances and the Lessons of Bretton Woods (Cairoli Lectures)!
  5. Top Authors.

Overall, we therefore find considerable alignment between the BBC Voices and the Twitter lexical dialect maps. The matches are far from perfect, but in our opinion a clear majority of the map pairs analyzed in this study show real correspondence, with the nations of the UK and the major regions of England being generally classified similarly in both sets of maps. The maps do not appear to be suitable in most cases for more fine-grained interpretations, except at higher levels of correlation, but given that these maps are defined at the level of postal code areas, which in most cases are fairly large regions, this seems like a reasonable degree of alignment, suggesting that these two approaches to data collection in dialectology allow for similar broad underlying patterns of regional lexical variation to be identified in British English.