Learnings from the Meedan Project
OVERVIEW of LESSONS and QUESTIONS DISCUSSED:
- MT to create dialogue (informal), not viable at least not for a while: shift to social translation
- Challenges of the languages – different, Arabic has many dialects, styles of writing, full stops
- MT as scan, first pass - > Jiamin, rating MT, human recommendations got more hits than MT on general feed of headlines
- why is content outside my language community important content for me to access? It is a view into a discussion you couldn’t otherwise get
- helping users know that they have the opportunity to translate, and what happens to their translation
- Demand is very challenging – Ivan – social context of interest is different. One community translating something the other is interested in. TRUST
- Outreach – Arabic volunteer culture a challenge, income levels, Jiamin - students
Rezwanul – Bangla translation
Solana – managing editor GV
Ivan – GV
Rahzeb - Translation information
Ed, Anas, George
Jiamin – Chinese
Ed Synopsis of aims and Some of the mistaken assumptions
-- HOW MEEDAN STARTED
Started with tech relationship with IBM – big company in translation technology
Global challenges across linguistic boundaries
Hybrid distributed approach to translation to build a data set to create high quality translation engine, interest in scaling dialogue
Scraped comment lines off blogs, set out to create 1 million word corpora from blog comments, translators were going to form an engine that would be specific to dialects
Solana – QUESTION: why blog comments? Ed – informal, not news wire
Ed – LESSON: we learned this was not a good approach, text on web was sometimes Romanized, sometimes dialectal – no tidy boundaries
From project start we put a lot of hope in idea of improving MT to point of good enough to enable big scale communications
LESSON 1 – MT is so difficult -> we’ve made change of emphasis
MT only for first pass and indexing – we don’t think it’s viable and won’t be for a few years. Focus on social translation.
New focus – how we do create social translation? --
Solana – QUESTION: where does content come from? --
Ed – editors and journalists, grouping stories according to events
Present translated view point from Arabic press, mainstream media and blogs
Translate both language communities’ versions and offer them up
Ivan – QUESTION: can you search, or how do you choose what to feature?
Ed – code written, not exposed
Anas – LESSON LEARNED 2
We want content to people who would read it, want to be able to cluster information
We are working on swift – crowdsourcing of selection from pre-sorted stories or feeds
Asomudin – QUESTION: what were main problems with translation?
Ed – informal domain – language is so variable, so difficult. Training on news wires is easier, when you are tackling informal domain it is way more difficult
Both eng -> Arabic and Arabic -> eng
Anas LESSON LEARNED 3
– the languages are so different, hard to train MT on such different languages
Jiamin – machine aided translation is more useful. We have opportunities for users to help improve the translation.
Typical types of mistakes
Comprehensive error rate measures the translation quanlity – humans rate the errors
Anas – a human reads the text, and rates all the errors, this is ok – this is not. You generate a quality score which you put beside the article so users know how good it is.
- Did you check what is the state of translation between English and another language in its family group?
Ed – most translation comes from the US defence department, which is why Eng-Arabic is equal to the much-longer studied romance languages
-- George – LESSON 4 – how to display to your users that this is translatable and what happens to your translation - this is a major challenge
Anas – web 3.0 – cross-language web
- Need participatory nature of web 2.0 with cross-lingual browsing
- You tend to be part of networks that speak the language you speak.
- Social networks are reinforcing language based networks
George – growth of English is a challenge – users are asking why would I want to translate when I want to improve my English
Anas – big communities that don’t have English (65 percent in Saudi)
Ed – LESSON 5– why is content outside my language community important content for me to access?
We now have an opportunity to get content from people close to something that’s going on – being able to filter on location and time, get content first passed through MT, community of concerned users can come behind it
We’ve had users translating farsi tweets
George – LESSON 6: MT might still have a role, but on part of the puzzle - perhaps MT will be used for scanning in new real time fast literacy web
Jiamin – we used Google translate to help translate headlines, but people didn’t pay attention to those recommendations, they preferred human recommendations to searching through MT
People chose / paid attention to human recommended articles
Solana: QUESTION: How many translators do you have?
George – 50 core who come back, wider community beyond that – we need to get the word out, this is a big challenge
Solana – QUESTION: how far do translators guide the technology?
Anas – we try to spend lots of time developing the tools for translators on a technology side
Ed – as we get more volunteer translators
Ivan – QUESTION: we've talked about supply side, what about demand? How responsive are people? Are there any lessons there?
Ed – Google analytics
George – key moments, Obama speech, Iran, Al Zaidi shoe
Ivan – what are your assumptions, are you working for readers or translators?
Anas – I think it’s both, we have a core plus a periphery who comment briefly and read, then passive receivers
Ivan – do you have any assumptions that you are putting out there?
George – comments, social translation
Anas – challenge of volunteering culture in the Middle East, income levels lower – affects input
You have usual suspects, what are peoples’ understandings of their societies when they spend so much time on the web? They skew our vision of the blogosphere in that part of the world
Ivan QUESTION: – you create a platform, it doesn’t mean that people will be able to use it. A story is the rhetorical tool for getting from your space to something outside of you. People don’t want to share the minutia of everything, what it is that attracts me to something out of the Arabic language blogosphere, it is not just someone who reacts to an event, but someone who has a way of explaining something both within and outside their community – a rhetorical element there.
Ed – there is an enormous quality to reading something that you know it wasn’t written for you, in Arabic blogger for Arabic audience – it’s something you weren’t supposed to know. A quality of looking into a place you didn’t know about.
Anas – bloggers tend to go further in their discussions when they are speaking in their own community.
George – the context is changing -> real time web, more Arabic on the web
Asmosudin – what is more of interest, Arabic –English or English to Arabic?
Anas – huge mistrust of Western media in Arab world, people tend to distance themselves Al Hurra zero viewership, they don’t trust their own media either (Ivan)
Anas – Al Arabiya v Al Jazeera
Anas – more interest from the English
George – musalsal thread on Bab al Harra got huge attention
ED – LESSON learned – what it means for bloggers to write in English. This is a device – bloggers write in English things they wouldn’t write in Arabic.
George – people ask, why do you need Meedan with so many cross-language papers? There are instances of mis translation or editing out stuff in Arabic
Do we have the right to do that? Are we endangering them? This could create unrest??
Ivan – you can’t control what you put out there, it will be translated
Anas – you need to challenge all assumptions
There is still need for Romanization, standardized Romanization
Fuzzy logic goes into language detection
Jiamin - Community translations: students are a big constituency from 22 – 28. Young they have passion, they have spare time, they have no need for money, they are interested in exploring – job opportunities.
Community is 5000 translators – readers don’t need to register. We have 5 million pageviews per month.
Solana – what are the most important stories?
Jiamin – technology, social stories
Solana – where do they get the tech stories?
Jiamin – most translators are volunteer, we have some income from specific projects
Solana – some editors of most 7/8 lingua sites are paid a small fee to keep working with the volunteers. You can scale in some languages more than others – each language editor feels they have representative community, but you get a different dynamic in each language. Great success in French, Spanish, Arabic is growing, Chinese is good – Taiwanese are very eager