Twitter上方言的使用率很高
文章来源:未知 文章作者:enread 发布时间:2011-01-10 02:57 字体: [ ]  进入论坛
(单词翻译:双击或拖选)

Microbloggers may think they're interacting in one big Twitterverse, but researchers at Carnegie Mellon University's School of Computer Science find that regional slang and dialects are as evident in tweets as they are in everyday conversations. Postings on Twitter reflect some well-known regionalisms, such as Southerners' "y'all," and Pittsburghers' "yinz," and the usual regional divides in references to(关于) soda1, pop and Coke. But Jacob Eisenstein, a post-doctoral fellow in CMU's Machine Learning Department, said the automated2 method he and his colleagues have developed for analyzing3 Twitter word use shows that regional dialects appear to be evolving within social media.

In northern California, something that's cool is "koo" in tweets, while in southern California, it's "coo." In many cities, something is "sumthin," but tweets in New York City favor "suttin." While many of us might complain in tweets of being "very" tired, people in northern California tend to be "hella" tired, New Yorkers "deadass" tired and Angelenos are simply tired "af."

The "af" is an acronym4(首字母缩略词) that, like many others on Twitter, stands for a vulgarity. LOL is a commonly used acronym for "laughing out loud," but Twitterers in Washington, D.C., seem to have an affinity5(密切关系,吸引力) for the cruder LLS.

Eisenstein said some of this usage clearly is shaped by the 140-character limit of Twitter messages, but geography's influence also is apparent. The statistical6 model the CMU team used to recognize regional variation in word use and topics could predict the location of a microblogger in the continental7 United States with a median error of about 300 miles.

Eisenstein will present the study on Jan. 8 at the Linguistic8 Society of America annual meeting in Pittsburgh. The paper is available online at http://people.csail.mit.edu/jacobe/papers/emnlp2010.pdf.

Studies of regional dialects traditionally have been based primarily on oral interviews, Eisenstein said, noting that written communication often is less reflective of regional influences because writing, even in blogs, tends to be formal and thus homogenized. But Twitter offers a new way of studying regional lexicon9, he explained, because tweets are informal and conversational10. Furthermore, people who tweet using mobile phones have the option of geotagging(标记) their messages with GPS coordinates11.

For this study, Eisenstein and his co-authors — Eric P. Xing, associate professor of machine learning, Noah A. Smith, assistant professor in the Language Technologies Institute (LTI), and Brendan O'Connor, machine learning graduate student — collected a week's worth of Twitter messages in March 2010, and selected geotagged messages from Twitter users who wrote at least 20 messages. That yielded a data base of 9,500 users and 380,000 messages.

Though the researchers could pinpoint12 the users' locations using the geotags, they can only guess as to their profiles. Eisenstein said it's reasonable to assume that people sending lots of tweets from mobile phones are younger than the average Twitter user and the topics discussed by these users seem to reflect that.

Automated analysis of Twitter message streams offers linguists13(语言学家) an opportunity to watch regional dialects evolve in real time. "It will be interesting to see what happens. Will 'suttin' remain a word we see primarily in New York City, or will it spread?" Eisenstein asked.

It might be a mistake to assume that the greater interconnectivity afforded by computer networks and sites such as Twitter will necessarily result in more homogeneity(同质,同种) in language. The social circles maintained by social networks such as Twitter often are geographically14 focused, he noted15. Also, many people use the Internet to seek out like-minded people with similar interests, rather than expose themselves to a broader range of ideas and experiences.



点击收听单词发音收听单词发音  

1 soda cr3ye     
n.苏打水;汽水
参考例句:
  • She doesn't enjoy drinking chocolate soda.她不喜欢喝巧克力汽水。
  • I will freshen your drink with more soda and ice cubes.我给你的饮料重加一些苏打水和冰块。
2 automated fybzf9     
a.自动化的
参考例句:
  • The entire manufacturing process has been automated. 整个生产过程已自动化。
  • Automated Highway System (AHS) is recently regarded as one subsystem of Intelligent Transport System (ITS). 近年来自动公路系统(Automated Highway System,AHS),作为智能运输系统的子系统之一越来越受到重视。
3 analyzing be408cc8d92ec310bb6260bc127c162b     
v.分析;分析( analyze的现在分词 );分解;解释;对…进行心理分析n.分析
参考例句:
  • Analyzing the date of some socialist countries presents even greater problem s. 分析某些社会主义国家的统计数据,暴露出的问题甚至更大。 来自辞典例句
  • He undoubtedly was not far off the mark in analyzing its predictions. 当然,他对其预测所作的分析倒也八九不离十。 来自辞典例句
4 acronym Ny8zN     
n.首字母简略词,简称
参考例句:
  • That's a mouthful of an acronym for a very simple technology.对于一项非常简单的技术来说,这是一个很绕口的缩写词。
  • TSDF is an acronym for Treatment, Storage and Disposal Facilities.TSDF是处理,储存和处置设施的一个缩写。
5 affinity affinity     
n.亲和力,密切关系
参考例句:
  • I felt a great affinity with the people of the Highlands.我被苏格兰高地人民深深地吸引。
  • It's important that you share an affinity with your husband.和丈夫有共同的爱好是十分重要的。
6 statistical bu3wa     
adj.统计的,统计学的
参考例句:
  • He showed the price fluctuations in a statistical table.他用统计表显示价格的波动。
  • They're making detailed statistical analysis.他们正在做具体的统计分析。
7 continental Zazyk     
adj.大陆的,大陆性的,欧洲大陆的
参考例句:
  • A continental climate is different from an insular one.大陆性气候不同于岛屿气候。
  • The most ancient parts of the continental crust are 4000 million years old.大陆地壳最古老的部分有40亿年历史。
8 linguistic k0zxn     
adj.语言的,语言学的
参考例句:
  • She is pursuing her linguistic researches.她在从事语言学的研究。
  • The ability to write is a supreme test of linguistic competence.写作能力是对语言能力的最高形式的测试。
9 lexicon a1rxD     
n.字典,专门词汇
参考例句:
  • Chocolate equals sin in most people's lexicon.巧克力在大多数人的字典里等同于罪恶。
  • Silent earthquakes are only just beginning to enter the public lexicon.无声地震才刚开始要成为众所周知的语汇。
10 conversational SZ2yH     
adj.对话的,会话的
参考例句:
  • The article is written in a conversational style.该文是以对话的形式写成的。
  • She values herself on her conversational powers.她常夸耀自己的能言善辩。
11 coordinates 8387d77faaaa65484f5631d9f9d20bfc     
n.相配之衣物;坐标( coordinate的名词复数 );(颜色协调的)配套服装;[复数]女套服;同等重要的人(或物)v.使协调,使调和( coordinate的第三人称单数 );协调;协同;成为同等
参考例句:
  • The town coordinates on this map are 695037. 该镇在这幅地图上的坐标是695037。 来自《简明英汉词典》
  • The UN Office for the Coordination of Humanitarian Affairs, headed by the Emergency Relief Coordinator, coordinates all UN emergency relief. 联合国人道主义事务协调厅在紧急救济协调员领导下,负责协调联合国的所有紧急救济工作。 来自《简明英汉词典》
12 pinpoint xNExL     
vt.准确地确定;用针标出…的精确位置
参考例句:
  • It is difficult to pinpoint when water problems of the modern age began.很难准确地指出,现代用水的问题是什么时候出现的。
  • I could pinpoint his precise location on a map.我能在地图上指明他的准确位置。
13 linguists fe6c8058ec322688d888d3401770a03c     
n.通晓数国语言的人( linguist的名词复数 );语言学家
参考例句:
  • The linguists went to study tribal languages in the field. 语言学家们去实地研究部落语言了。 来自辞典例句
  • The linguists' main interest has been to analyze and describe languages. 语言学家的主要兴趣一直在于分析并描述语言。 来自辞典例句
14 geographically mg6xa     
adv.地理学上,在地理上,地理方面
参考例句:
  • Geographically, the UK is on the periphery of Europe. 从地理位置上讲,英国处于欧洲边缘。 来自辞典例句
  • All these events, however geographically remote, urgently affected Western financial centers. 所有这些事件,无论发生在地理上如何遥远的地方,都对西方金融中心产生紧迫的影响。 来自名作英译部分
15 noted 5n4zXc     
adj.著名的,知名的
参考例句:
  • The local hotel is noted for its good table.当地的那家酒店以餐食精美而著称。
  • Jim is noted for arriving late for work.吉姆上班迟到出了名。
TAG标签: language Twitter Word
发表评论
请自觉遵守互联网相关的政策法规,严禁发布色情、暴力、反动的言论。
评价:
表情:
验证码:点击我更换图片