|
MIT揭秘新浪微博“删帖”内幕 麻省理工学院《MIT技术评论》杂志刊登一份研究报告(请看附件的英文原文和链接),揭露新浪微博的“删帖”内幕。该报告说,30%的删帖在微博发出5至30分钟内完成,删帖几乎是“实时”进行。 进行这一研究的是美国休斯敦莱斯大学计算机系教授丹•沃勒克(Dan Wallach)和他的同事们,他们收集一些用户发出的帖子,然后进行跟踪观察,看哪些帖子被删除。 中国的微博用户多达三亿多,当然不可能跟踪每一用户。沃勒克和他的同事们就把目光锁定在曾遭删帖的3500多名用户,于去年对他们进行了为期15天的跟踪观察。 观察期间,这些用户每天有大约4500个帖子被删,占总发帖量的12%。 当然,帖子被删并不全是被当局“审查”的结果,有的是用户自己的删除行为。不过,这二者仍有区别:被管理员删除的微博会显示“此微博不适宜对外公开”,自行删除的则显示“该帖不存在”。沃勒克团队关注的是前一种情况。 结果显示,5%的删帖在微博发出后的8分钟内完成,30%的删帖在30分钟内完成,90%的删帖在24小时内完成。 数据显示,删帖量最高的是在帖子发出的5-10分钟内,这意味着微博管理员的审查“几乎是实时进行的”。 中国微博用户众多,发帖量也巨大, 平均每分钟会有7万个微博帖子发出。研究人员说,如果一个审查员平均每分钟浏览50个帖子、按8小时工作日来算,则需要大约4200名监管员来应对每天发出的海量帖子。 即便如此,也需要技术的支持。沃勒克说,一种技术就是锁定“关键字词”,但网民会想法设法地来绕开关键词,让这一技术也面临难题。 此外,当局还特别关注那些有“被删帖”历史的用户。沃勒克发现,这些用户的帖子的确被删得更快。 沃勒克还观察了一天当中的删帖模式,发现到了晚上七点后趋向平静,到第二天上午或中午删帖加剧。 MIT Technology Review原文: http://www.technologyreview.com/view/512231/computer-scientists-measure-the-speed-of-censorship-on-chinas-twitter/ Computer Scientists Measure the Speed of Censorship On China’s Twitter Censorship on Weibo, China’s version of Twitter, is near real-time and relies on a workforce of over 4,000 censors who stop work during the evening news, according the first detailed analysis of censorship patterns. The Chinese version of Twitter is a microblogging service called Weibo which launched in 2010. This allows users to post 140 character messages with @usernames and #hashtags, just like Twitter– although 140 characters in Chinese contain significantly more information content than in English. In just three years, Weibo has picked up some 300 million users who between them send 100 million messages each day at the rate of 70,000 per minute. That makes the inevitable process of censorship a tricky task for the Chinese authorities. So an interesting question is how they do it. Today,Dan Wallach at Rice University in Houston, Texas, and a few pals reveal the results of a detailed study of censorship on Weibo. Their method has allowed them to reconstruct the censorship techniques used by the government, to calculate the number of workers who must be involved and even to discover their daily work schedules. The work is possible because at least some of the content on Weibo is not censored prior to publication, only afterwards. Their approach was to collect posts from a set of users once every minute. They then tracked these posts to see which ones later became unavailable. Of course, it’s not feasible to track everyone on Weibo so Wallach and co spent some time looking for users who seemed to have posts deleted more often than others, assuming that these users would be more likely to be censored in the future. Using this manual technique, they ended up observing some 3500 users over a period of 15 days last year who between them experienced around 4500 deletions per day, or about 12 per cent of the total. Not all deletions are the result of censorship, however, since a user can delete his or her own posts. Wallach and co say that through their own trial and error they observed two types of deletion which return different messages. When users delete their own messages, a query for the post returns a “post does not exist” error message. However, when a post is deleted by the censors, Weibo returns a different message saying: “permission denied”. It is these second type of deletions that Wallach and co concentrated on. The results of their study are fascinating. They say that in their data set about 5 per cent of the deletions occur within 8 minutes of posting and around 30 per cent within 0 minutes. In total, 90 per cent of deletions occur within a day, although at times deletions can occur several days later. Those are impressive numbers given the popularity of the microblogging service. How does Weibo manages this task? Wallach and co say their data point to a number hypotheses about what’s going on. Since the highest volume of deletions occur within 5-10 minutes of posting, Weibo must be censoring them in near real time. If an average censor can scan around 50 posts a minute, that would require some 1400 censors at any instant to handle the 70,000 posts pouring in. And if they work 8 hour shifts, that’s a total of 4200 censors on the payroll each day. Even then, this work force must have some technological help. Wallach and co say the data suggests Weibo has a number of techniques in operation. The first is keyword alerting. When a keyword appears, the post is immediately flagged for censors. However, this is no mean feat since the Chinese language is notoriously hard to filter in this way because of the complexity of its alphabet and because of the neologisms and shortened language that is used on Weibo. Wallach and co say that the authorities also target users who have a history of deletions, presumably assuming that they are more likely to post forbidden content in future (just as Wallach and co did). It turns out that these users tend to be censored more quickly than others on the network. “Userswith larger deletion frequencies tend to observe fastercensorship of their work,” they say. Wallach and co have also examined the rate of deletions throughout the 24 hour cycle finding that the censors are less active at night, when presumably fewer are working. They also face a backlog each morning. “They catch up by late morning or early afternoon,” conclude Wallach and co. There is even a slight dip in the censorship rate at 7pm when the national evening news is on television. Wallach and co are also interested in the type of posts that are censored and have examined the content of these for clues. They saythat topics commonly deleted include phrases such as “support Syrian rebels”, “Lying of gov. (Jixiang)”, “One-Child policy abuse” and “group sex”. The topics that trigger mass removal the fastestare those that combine events that are hot topics in Weiboas a whole, such as “sex scandal”, with themes common to sensitive posts, such as government or policeman. That’s a fascinating study that provides a rare but illuminating insight into the nature of Chinese censorship. One question that this study does not address is why the authorities allow uncensored Weibo posts to appear in public at all. Given the formidable censorship machine in operation, why not block publication of all posts for 30 minutes or so, until the censorship is largely complete? Wallach and co seem to suggest that this is possible. They say that on 1 August 2012, they tried to post a message including the phrase “Secretary of the Political and Legislative Committee.” “When we submit a post withthis character string in it, a warning message says”Sorry, since this content violates ‘Sina Weibo regulation rules’ or a related regulation or policy, thisoperation cannot be processed. If you need help,please contact customer service.” So clearly some posts are blocked before they even reach public view. Whatever the reason, clearly more work is needed. Wallach and co say they have several goals for the the future such as attempting to find out more about the way Weibo prioritises content for deletion. All that will depend on the team’s access to data and on the assumption that the authorities won’t be able to track down and stop the team’s accounts and the Tor network links they use to send the data out of the country. Brave work! Ref:arxiv.org/abs/1303.0597: The Velocity of Censorship: High-Fidelity Detection of Microblog PostDeletions |
|