| | MIT揭秘新浪微博“刪帖”內幕 麻省理工學院《MIT技術評論》雜誌刊登一份研究報告(請看附件的英文原文和鏈接),揭露新浪微博的“刪帖”內幕。該報告說,30%的刪帖在微博發出5至30分鐘內完成,刪帖幾乎是“實時”進行。 進行這一研究的是美國休斯敦萊斯大學計算機系教授丹•沃勒克(Dan Wallach)和他的同事們,他們收集一些用戶發出的帖子,然後進行跟蹤觀察,看哪些帖子被刪除。 中國的微博用戶多達三億多,當然不可能跟蹤每一用戶。沃勒克和他的同事們就把目光鎖定在曾遭刪帖的3500多名用戶,於去年對他們進行了為期15天的跟蹤觀察。 觀察期間,這些用戶每天有大約4500個帖子被刪,占總發帖量的12%。 當然,帖子被刪並不全是被當局“審查”的結果,有的是用戶自己的刪除行為。不過,這二者仍有區別:被管理員刪除的微博會顯示“此微博不適宜對外公開”,自行刪除的則顯示“該帖不存在”。沃勒克團隊關注的是前一種情況。 結果顯示,5%的刪帖在微博發出後的8分鐘內完成,30%的刪帖在30分鐘內完成,90%的刪帖在24小時內完成。 數據顯示,刪帖量最高的是在帖子發出的5-10分鐘內,這意味着微博管理員的審查“幾乎是實時進行的”。 中國微博用戶眾多,發帖量也巨大, 平均每分鐘會有7萬個微博帖子發出。研究人員說,如果一個審查員平均每分鐘瀏覽50個帖子、按8小時工作日來算,則需要大約4200名監管員來應對每天發出的海量帖子。 即便如此,也需要技術的支持。沃勒克說,一種技術就是鎖定“關鍵字詞”,但網民會想法設法地來繞開關鍵詞,讓這一技術也面臨難題。 此外,當局還特別關注那些有“被刪帖”歷史的用戶。沃勒克發現,這些用戶的帖子的確被刪得更快。 沃勒克還觀察了一天當中的刪帖模式,發現到了晚上七點後趨向平靜,到第二天上午或中午刪帖加劇。 MIT Technology Review原文: http://www.technologyreview.com/view/512231/computer-scientists-measure-the-speed-of-censorship-on-chinas-twitter/ Computer Scientists Measure the Speed of Censorship On China’s Twitter Censorship on Weibo, China’s version of Twitter, is near real-time and relies on a workforce of over 4,000 censors who stop work during the evening news, according the first detailed analysis of censorship patterns.  The Chinese version of Twitter is a microblogging service called Weibo which launched in 2010. This allows users to post 140 character messages with @usernames and #hashtags, just like Twitter– although 140 characters in Chinese contain significantly more information content than in English. In just three years, Weibo has picked up some 300 million users who between them send 100 million messages each day at the rate of 70,000 per minute. That makes the inevitable process of censorship a tricky task for the Chinese authorities. So an interesting question is how they do it. Today,Dan Wallach at Rice University in Houston, Texas, and a few pals reveal the results of a detailed study of censorship on Weibo. Their method has allowed them to reconstruct the censorship techniques used by the government, to calculate the number of workers who must be involved and even to discover their daily work schedules. The work is possible because at least some of the content on Weibo is not censored prior to publication, only afterwards. Their approach was to collect posts from a set of users once every minute. They then tracked these posts to see which ones later became unavailable. Of course, it’s not feasible to track everyone on Weibo so Wallach and co spent some time looking for users who seemed to have posts deleted more often than others, assuming that these users would be more likely to be censored in the future. Using this manual technique, they ended up observing some 3500 users over a period of 15 days last year who between them experienced around 4500 deletions per day, or about 12 per cent of the total. Not all deletions are the result of censorship, however, since a user can delete his or her own posts. Wallach and co say that through their own trial and error they observed two types of deletion which return different messages. When users delete their own messages, a query for the post returns a “post does not exist” error message. However, when a post is deleted by the censors, Weibo returns a different message saying: “permission denied”. It is these second type of deletions that Wallach and co concentrated on. The results of their study are fascinating. They say that in their data set about 5 per cent of the deletions occur within 8 minutes of posting and around 30 per cent within 0 minutes. In total, 90 per cent of deletions occur within a day, although at times deletions can occur several days later. Those are impressive numbers given the popularity of the microblogging service. How does Weibo manages this task? Wallach and co say their data point to a number hypotheses about what’s going on. Since the highest volume of deletions occur within 5-10 minutes of posting, Weibo must be censoring them in near real time. If an average censor can scan around 50 posts a minute, that would require some 1400 censors at any instant to handle the 70,000 posts pouring in. And if they work 8 hour shifts, that’s a total of 4200 censors on the payroll each day. Even then, this work force must have some technological help. Wallach and co say the data suggests Weibo has a number of techniques in operation. The first is keyword alerting. When a keyword appears, the post is immediately flagged for censors. However, this is no mean feat since the Chinese language is notoriously hard to filter in this way because of the complexity of its alphabet and because of the neologisms and shortened language that is used on Weibo. Wallach and co say that the authorities also target users who have a history of deletions, presumably assuming that they are more likely to post forbidden content in future (just as Wallach and co did). It turns out that these users tend to be censored more quickly than others on the network. “Userswith larger deletion frequencies tend to observe fastercensorship of their work,” they say. Wallach and co have also examined the rate of deletions throughout the 24 hour cycle finding that the censors are less active at night, when presumably fewer are working. They also face a backlog each morning. “They catch up by late morning or early afternoon,” conclude Wallach and co. There is even a slight dip in the censorship rate at 7pm when the national evening news is on television. Wallach and co are also interested in the type of posts that are censored and have examined the content of these for clues. They saythat topics commonly deleted include phrases such as “support Syrian rebels”, “Lying of gov. (Jixiang)”, “One-Child policy abuse” and “group sex”. The topics that trigger mass removal the fastestare those that combine events that are hot topics in Weiboas a whole, such as “sex scandal”, with themes common to sensitive posts, such as government or policeman. That’s a fascinating study that provides a rare but illuminating insight into the nature of Chinese censorship. One question that this study does not address is why the authorities allow uncensored Weibo posts to appear in public at all. Given the formidable censorship machine in operation, why not block publication of all posts for 30 minutes or so, until the censorship is largely complete? Wallach and co seem to suggest that this is possible. They say that on 1 August 2012, they tried to post a message including the phrase “Secretary of the Political and Legislative Committee.” “When we submit a post withthis character string in it, a warning message says”Sorry, since this content violates ‘Sina Weibo regulation rules’ or a related regulation or policy, thisoperation cannot be processed. If you need help,please contact customer service.” So clearly some posts are blocked before they even reach public view. Whatever the reason, clearly more work is needed. Wallach and co say they have several goals for the the future such as attempting to find out more about the way Weibo prioritises content for deletion. All that will depend on the team’s access to data and on the assumption that the authorities won’t be able to track down and stop the team’s accounts and the Tor network links they use to send the data out of the country. Brave work! Ref:arxiv.org/abs/1303.0597: The Velocity of Censorship: High-Fidelity Detection of Microblog PostDeletions |