The Narro project: optimization

Showing posts with label optimization. Show all posts

Wednesday, October 5, 2011

Where does performance come from

First of all, the machine has to be good. You may be dealing with millions of rows in a table with indexes. Don't install it on your laptop.

Here's a comparison between two machines with the same database (+4 million rows in table):

A virtual machine: https://l10n.mozilla.org/narro/translate.php?l=vi&p=&f=&t=1&s=&o=&h=1&m=10&i=0#i

First page load: 9 seconds
Average load when scrolling down: 10 seconds

Cheap shared hosting: http://narro.alexxed.com/lmo2/translate.php?l=vi&p=28&f=&t=1&s=&o=&h=1&m=10&i=0

First page load: 3 seconds
Average load when scrolling down: 2 seconds

If you're thinking of having multiple active languages and huge translation projects, it might be best to have separate installations per language.

What I can still try and will try to make it better is:

have a table for each language rather than all languages in a table
play with indexes and see what works best
try PostgreSQL

Query ran for the tests above:

SELECT
*
FROM `narro_context_info` AS `t0`
LEFT JOIN `narro_context` AS `t1` ON `t0`.`context_id` = `t1`.`context_id` 
LEFT JOIN `narro_file` AS `t2` ON `t1`.`file_id` = `t2`.`file_id` 
LEFT JOIN `narro_text` AS `t3` ON `t1`.`text_id` = `t3`.`text_id` 
LEFT JOIN `narro_project` AS `t4` ON `t1`.`project_id` = `t4`.`project_id` 
LEFT JOIN `narro_suggestion` AS `t5` ON `t0`.`valid_suggestion_id` = `t5`.`suggestion_id`
WHERE (
   (
   `t0`.`language_id` = 60 AND 
   `t1`.`active` != 0 AND
    `t2`.`active` != 0
    ) AND 
    `t1`.`project_id` = 28 AND 
    `t0`.`has_suggestions` = 0
) 
LIMIT 20

Been there, done that ? Your comments would help.

Friday, July 29, 2011

A preview of the next version

There's been some heavy rewriting going on.

After you log in, you'll have all the permissions, so feel free to experiment, but with caution and expect failures.

You can pick your favourite Firefox extension, add a project from the project list and and use the xpi, as a source for texts and for translation (web link, or upload, whatever).

The speed is really improved and the translation process is really easier now. You just open the translation page, start translating and the translation is saved automatically as soon as you move to the next text. You can use the tab key to move forward.

Use scroll to load more texts. Or press the more button.

http://narro.alexxed.com/narro/translate.php?l=ro&p=10&f=&t=1&s=&o=&h=1&m=10&i=0#i

Wednesday, June 29, 2011

It's too complicated

I'm currently experimenting with important UI changes meant to simplify things. The ideas so far are:

- reduce the number of pages to the barely minimum; only one page for translating
- drop the pagination and load new content when scrolling down; it feels more natural to the user
- show only the original text and a text input for translating it; everything else only on demand

Any ideas or complaints are warmly welcomed!

Monday, May 9, 2011

Why do you have to go and make things so complicated?

Remember, localizers are usually volunteers. When I started Narro, my idea was to make their life easier by allowing them to receive translations from anyone and to pull/push translations without knowing SVN, Mercurial, or any other technical stuff.

The idea setup would be:

The localizer gets the translation files through Narro import
Translation begins (offline or online)
The localizer pushes the translation files through Narro export

And no, it's not that I'd like to introduce a new tool instead of running svn commit, it's just that I'd like to spare people of the nightmares of merging and maintaining branches when all they want to do is translate.

The current Mozilla setup is:

Pull the entire Mozilla repository (> 500MB)
Scan its directory for localization files and build through symlinks a folder structure that resembles the one that localizers need to push in their repository
Run an import from that folder
Translation begins
Translation is exported in XPI format for testing purposes
There's a script made by Axel Hecht, compare-locales that does some checks for validity, because translations can really break the product. To help, I'm running that after every export and post a link to a diff file that contains links to the texts that need fixing
Translation is exported in the structure that should be comitted to Mercurial
The localizer needs to have Mercurial through SSH with key authentication set up
The localizer commits the structure exported from Narro (hg pull .../xx-XX.zip && unzip xx-XX.zip && hg commit && hg push)
The localizer checks the tinderbox for any build failures
The localizer checks the dashboard for any problems detected by compare-locales
The localizer does a signoff (pushes a button for the revision he wants to release) in the dashboard

I may have forgotten some steps, but obviously this is not the perfect setup. I'm keen on helping making things easier as I'm a localizer myself but there are some blockers here:

I can commit via Narro, but that causes problems because several localizers commit with the same account. I can't let localizers commit with their credentials because that would mean uploading their private SSH key and that defeats the purpose of a SSH key. From my experience, getting rid of that is a real advantage as you often want to fix only a few strings while you're on the road and don't feel like hg pull, hg commit, hg push or don't have Mercurial installed or your SSH key set up.
I don't have access only to the localization files. I have to pull the entire repository.
To build a language pack I need the whole repository.

Oh, there are the web pages as well. Fortunately I'm close to handling those in Narro as well.

Now don't think that Mozilla is the only software that has this complicated setup. But this localizer tool should be able to cope even with this situation, because it's not a perfect world we live in. It's a challenge.

Thursday, September 23, 2010

Scale up

Narro was designed with one pair of languages in mind. And at that time, running an import on a big project on a modern computer for half an hour was acceptable. The only performance goal was to have a maximum page loading time of 1 second. There was no goal set for the import process.

Back then: 1 target language * 1 big project * 1/2 hours

Times are changing and the instance hosted on Mozilla grew to 61 languages. For each language I run an import on 4 big projects.

Right now: 61 target languages * 4 big projects * 1/2 hours

As you see, running the import process in these conditions is almost impossible, so the much postponed step of optimizing the code is absolutely necessary right now.

Note that the import process is running in background and eating up all the CPU power available on the server.

Fortunately, the code is ready for such optimizations which are:

cache database results whenever possible
do stuff only when necessary
use regular expressions only when really needed

So that's what I'll be working on before doing any release.