Learnings from the Meedan Project

From Ott09 Wiki
Jump to: navigation, search

NOTES:

OVERVIEW of LESSONS and QUESTIONS DISCUSSED:

- MT to create dialogue (informal), not viable at least not for a while: shift to social translation

- Challenges of the languages – different, Arabic has many dialects, styles of writing, full stops

- MT as scan, first pass - > Jiamin, rating MT, human recommendations got more hits than MT on general feed of headlines

- why is content outside my language community important content for me to access? It is a view into a discussion you couldn’t otherwise get

- helping users know that they have the opportunity to translate, and what happens to their translation

- Demand is very challenging – Ivan – social context of interest is different. One community translating something the other is interested in. TRUST

- Outreach – Arabic volunteer culture a challenge, income levels, Jiamin - students

PEOPLE

Rezwanul – Bangla translation

Asomudin

Solana – managing editor GV

Ivan – GV

Rahzeb - Translation information

Ed, Anas, George

Jiamin – Chinese

Kamol

CONVERSATION

Ed Synopsis of aims and Some of the mistaken assumptions

-- HOW MEEDAN STARTED

Started with tech relationship with IBM – big company in translation technology

Global challenges across linguistic boundaries

Hybrid distributed approach to translation to build a data set to create high quality translation engine, interest in scaling dialogue

Scraped comment lines off blogs, set out to create 1 million word corpora from blog comments, translators were going to form an engine that would be specific to dialects

--

Solana – QUESTION: why blog comments? Ed – informal, not news wire

---

Ed – LESSON: we learned this was not a good approach, text on web was sometimes Romanized, sometimes dialectal – no tidy boundaries

From project start we put a lot of hope in idea of improving MT to point of good enough to enable big scale communications

LESSON 1 – MT is so difficult -> we’ve made change of emphasis

MT only for first pass and indexing – we don’t think it’s viable and won’t be for a few years. Focus on social translation.

New focus – how we do create social translation? --

Solana – QUESTION: where does content come from? --

Ed – editors and journalists, grouping stories according to events

Present translated view point from Arabic press, mainstream media and blogs

Translate both language communities’ versions and offer them up

--

Ivan – QUESTION: can you search, or how do you choose what to feature?

Ed – code written, not exposed

Anas – LESSON LEARNED 2

We want content to people who would read it, want to be able to cluster information

We are working on swift – crowdsourcing of selection from pre-sorted stories or feeds

--

Asomudin – QUESTION: what were main problems with translation?

Ed – informal domain – language is so variable, so difficult. Training on news wires is easier, when you are tackling informal domain it is way more difficult

Both eng -> Arabic and Arabic -> eng

Anas LESSON LEARNED 3

– the languages are so different, hard to train MT on such different languages

--

Jiamin – machine aided translation is more useful. We have opportunities for users to help improve the translation.

Typical types of mistakes

Comprehensive error rate measures the translation quanlity – humans rate the errors

Anas – a human reads the text, and rates all the errors, this is ok – this is not. You generate a quality score which you put beside the article so users know how good it is.

--

Asomudin QUESTION:

- Did you check what is the state of translation between English and another language in its family group?

Ed – most translation comes from the US defence department, which is why Eng-Arabic is equal to the much-longer studied romance languages

-- George – LESSON 4 – how to display to your users that this is translatable and what happens to your translation - this is a major challenge

Anas – web 3.0 – cross-language web

- Need participatory nature of web 2.0 with cross-lingual browsing

- You tend to be part of networks that speak the language you speak.

- Social networks are reinforcing language based networks

George – growth of English is a challenge – users are asking why would I want to translate when I want to improve my English

Anas – big communities that don’t have English (65 percent in Saudi)

Ed – LESSON 5– why is content outside my language community important content for me to access?

We now have an opportunity to get content from people close to something that’s going on – being able to filter on location and time, get content first passed through MT, community of concerned users can come behind it

We’ve had users translating farsi tweets

---

George – LESSON 6: MT might still have a role, but on part of the puzzle - perhaps MT will be used for scanning in new real time fast literacy web

Jiamin – we used Google translate to help translate headlines, but people didn’t pay attention to those recommendations, they preferred human recommendations to searching through MT

People chose / paid attention to human recommended articles

--

Solana: QUESTION: How many translators do you have?

George – 50 core who come back, wider community beyond that – we need to get the word out, this is a big challenge

Solana – QUESTION: how far do translators guide the technology?

Anas – we try to spend lots of time developing the tools for translators on a technology side

Ed – as we get more volunteer translators

Ivan – QUESTION: we've talked about supply side, what about demand? How responsive are people? Are there any lessons there?

Ed – Google analytics

George – key moments, Obama speech, Iran, Al Zaidi shoe

Ivan – what are your assumptions, are you working for readers or translators?

Anas – I think it’s both, we have a core plus a periphery who comment briefly and read, then passive receivers

Ivan – do you have any assumptions that you are putting out there?

George – comments, social translation

Anas – challenge of volunteering culture in the Middle East, income levels lower – affects input

You have usual suspects, what are peoples’ understandings of their societies when they spend so much time on the web? They skew our vision of the blogosphere in that part of the world


Ivan QUESTION: – you create a platform, it doesn’t mean that people will be able to use it. A story is the rhetorical tool for getting from your space to something outside of you. People don’t want to share the minutia of everything, what it is that attracts me to something out of the Arabic language blogosphere, it is not just someone who reacts to an event, but someone who has a way of explaining something both within and outside their community – a rhetorical element there.

Ed – there is an enormous quality to reading something that you know it wasn’t written for you, in Arabic blogger for Arabic audience – it’s something you weren’t supposed to know. A quality of looking into a place you didn’t know about.

Anas – bloggers tend to go further in their discussions when they are speaking in their own community.

--

George – the context is changing -> real time web, more Arabic on the web

--

Asmosudin – what is more of interest, Arabic –English or English to Arabic?

Anas – huge mistrust of Western media in Arab world, people tend to distance themselves Al Hurra zero viewership, they don’t trust their own media either (Ivan)

Anas – Al Arabiya v Al Jazeera

Anas – more interest from the English

George – musalsal thread on Bab al Harra got huge attention

ED – LESSON learned – what it means for bloggers to write in English. This is a device – bloggers write in English things they wouldn’t write in Arabic.

George – people ask, why do you need Meedan with so many cross-language papers? There are instances of mis translation or editing out stuff in Arabic

Do we have the right to do that? Are we endangering them? This could create unrest??

Ivan – you can’t control what you put out there, it will be translated

Anas – you need to challenge all assumptions

There is still need for Romanization, standardized Romanization

Fuzzy logic goes into language detection

--

Jiamin - Community translations: students are a big constituency from 22 – 28. Young they have passion, they have spare time, they have no need for money, they are interested in exploring – job opportunities.

Community is 5000 translators – readers don’t need to register. We have 5 million pageviews per month.

Solana – what are the most important stories?

Jiamin – technology, social stories

Solana – where do they get the tech stories?

Jiamin – most translators are volunteer, we have some income from specific projects

Solana – some editors of most 7/8 lingua sites are paid a small fee to keep working with the volunteers. You can scale in some languages more than others – each language editor feels they have representative community, but you get a different dynamic in each language. Great success in French, Spanish, Arabic is growing, Chinese is good – Taiwanese are very eager