About Avi
Categories
- Chronicles (46)
- Community and Society (74)
- Ecology & Environment (7)
- Essays (32)
- Events (25)
- Gourmet (26)
- Info & Biz Technology (252)
- Linux & Open Source (91)
- Linux Journal Index (14)
- Mobility (16)
- Multimedia (9)
- OpenDocument Format (59)
- Web 2.0 (58)
- Linux & Open Source (91)
- Metaphysics (25)
- Misc (5)
- Music & Podcasts (36)
- Podcast: brazilian jazz (8)
- Podcast: general (13)
- Travels (61)
- Central Asia 2007 (28)
- Vegetarianism (9)
Archive for tag “lang:en”
Subscribe to this tag or category
The Iron Man movie is all about programmers. Being challenged and have to invent something that will make you get rid of hand work, the adventure of a project, to mold, test and debug, and the joy of having a prototype inspiring the next perfect design.
Just don’t miss it. The metal man movie has a lot of Heavy Metal music and Robert Downey Jr., both fit perfectly.
Oh, and stay until after the final credits to see what are the plans for the next movie.
Isn’t this web app very neat? You must interact to feel it.
A post by Michael DeHaan and others inspired me to say a few words about the importance of ISO standards for developers:
Many developers are clamming that good specifications are more important than standards, specially now that the OOXML process opened to the public the dungeons of ISO processes and how the organization produces standards. This is a clear example on how ISO’s reputation is compromised.
But unfortunately “ISO standards” are what governments tend to use. These are the words they put in their Request For Proposals when they are going to buy things. These “ISO standards” are the words they use to claim how they’ll interoperate and trade across borders. And also how public institutions will interoperate with private institutions. Over time, it defines also how private institutions use technology.
So yes, ISO standards for Information Technology do matter from a developer standpoint because they are crucial in shaping the flow of information in the society in general.
Lets put it this way: technical specifications stand for research and development team while to be an ISO standard equals to have a strong sales force. An IT product needs both to succeed.
So you, developer, can use whatever you want or like. But if you want to interoperate with non-developer folks, you — the smartest guy in this context — will have to use what non-developers use, and they use what has a stronger marketing force as “this is an ISO standard”. This is why ISO standards (good or bad) should be in the focus of our attention.
Since the world already has an International Standards Organization, and since we learned it must be patched, it is our responsability — we, the developers, geeks, sysadmins etc — to be aware of and get involved with what ISO is standardizing right now and help the technical process of standardization to happen solely with technical arguments and not political interests.
Otherwise the future world will still be divided in two groups of developers: those that develop with/for good stuff and those that develop with/for stuff used by non-developers just because a powerful company had the strength to standardize it whatever it takes.

There was an ISO/IEC/JTC1 meeting in Oslo, Norway wednesday and the local community protested against OOXML and demanded more integrity in ISO’s processes. I couldn’t agree more.
This is my first post in the new WordPress 2.5.
Upgrade was as easy as:
$ cd avi.alkalay.net
$ svn sw http://svn.automattic.com/wordpress/tags/2.5/
Everything worked without any changes, including my experimental content-optimized Plasma theme.
The new WordPress has lots of improvements, specially in the administration part and I recommend it.
I am learning a lot from all this standardization process.
I was a member of the brazilian committee and I also analyzed the specification. My country disapproved OOXML and I think this was a decision based on logic lead by the process.
The JTC1 rules page 49 item 9.8 says:
NBs may reply in one of the following ways:
- Approval of the technical content of the DIS as presented (editorial or other comments may be appended);
- Disapproval of the DIS (or DAM) for technical reasons to be stated, with proposals for changes that would make the document acceptable (acceptance of these proposals shall be referred to the NB concerned for confirmation that the vote can be changed to approval);
- Abstention (see 9.1.2).
[Note: Conditional approval should be submitted as a disapproval vote.]
In other words, from my understanding, if there is one or more technical problems, the NB must disapprove the DIS. Many countries found many technical problems in OOXML that are still unresolved even after the BRM.
I also understand that such an important matter as an International Standard for Office Documents can’t be defined by 10 or 30 opinions collected as votes in a committee. Thats why the JTC1 process above talks about technical content, not opinion or vote. What I learned from studying the OOXML specification is that it is not ready for acceptance since many countries have found and reached consensus that the spec has problems, even after the BRM. If the NB-leveraged technical team — formed by people that would vote YES and NO — has produced a list of submitted problems in the spec, this list is by itself the consensus that the spec is still problematic.
I would like to understand why an NB that has produced technical comments voted YES or ABSTAINED. I thought abstention is a position for countries that were not able to create a committee to technically discuss the specification for reasons such as logistics or lack of quorum.
I am learning about all this and I’d like to have more solid arguments to build an opinion about these NB’s maturity to run a well documented process.
Check it out:
From: “***FONDONORMA***” <in*&%@fondonorma.org.ve>
To: <ki*%$#@itscj.ipsj.or.jp>, <br*%$#%@iso.org>, <tak*%$#%@iso.org>, <ga*%$#%@iso.org>, “Maria Teresa Saccucci” <mar*%$#%@fondonorma.org.ve>, “Norma Arias” <norm*%$#%@fondonorma.org.ve>, “***FONDONORMA***” <in*%$#%@fondonorma.org.ve>
Date: Fri, 28 Mar 2008 14:04:26 -0430
Subject: Modification to the vote on DIS 29500 - Venezuela (FONDONORMA)
Toshiko Kimura
Secretary ISO/IEC JTC 1/SC 34Att.: Mr. Keith Brannon, Mr. Maho Takahashi, Ms. Martine Gaillen
Dear Mr. Kimura,
Attached please find a letter from Mrs. María Teresa Saccucci, Standardization Manager of FONDONORMA (Venezuela) through which Venezuela wishes to modify its position on DIS 29500, Information technology - Office Open XML file formats from “Approval with comments” to “Disapproval with comments”.
Best regards,
Leonardo Di Bartolo
Coordinator of International Relations
FONDONORMA
Venezuela
This NO was a result of a very difficult meeting. Consensus was not reached and a lot of confusion happened. Exactly as in the first brazilian NO.
The NB had to decide the final vote based on technical issues still open in OOXML. The problem was the method of deciding and those technical points were not presented. So people only had in their head the Microsoft arguments that I already knew were part half trues and part complete lies.
I participated in the OOXML process in 3 countries and I was able to build an opinion on how normalization happens today in the world, and I’ll show some ideas in a future post.
By the way, this is how Caracas looks like, a city surrounded by huge green mountains that have behind them the Caribbean Sea.

Finally Jomar, one of the brazilian delegate that went to OOXML’s BRM in Geneva has started to tell all the dirty little details of what really happened in that meeting and the surreal modus operandi of how 120 people can discuss 1027 issues in 5 days. Have fun in english and portuguese.
Oh, and talking about dirty playing, check the domain www.DocumentFreedomDay.com but remember that the original one is www.DocumentFreedom.org. The first one really deserves a DDoS attack.
Open Malasia blog has a user friendly post with a map of the voting. See the tables and understand the voting criterias.
It is now official. Brazilian vote was decided by consensus of the entire technical team, including Microsoft crew’s: OOXML does not deserve to be an international ISO standard.
Our first vote, in august, was also NO, due to the same reasons: OOXML is an awful specification.
That outcome was expected because we simply followed the process: technically analyze the OOXML specification, make comments, wait for responses, analyze them and see if all problems were fixed. Is there any single remaining unresolved problem? Vote NO. And in fact there were many many unresolved problems.
If every country followed this simple process, OOXML would receive a NO from 100% of them.
But in some countries, how is the process? Invite a few companies and simply count their votes. The problem here: 10, 20 or 80 votes can never represent what is the best for that country. Only, maybe, if you collect one vote for each citizen.
What I am trying to say is that in this case a decision must be reached by technical consensus, not vote. It is not a matter of will, but a technical issue that can only be reached by rational analysis and deliberation.
In Chile for example, 21 voting companies will define a 15 million people country vote in ISO. How easy is to lobby these 21 companies with Power Point presentations telling complete plain lies ?
Technically speaking, if your country’s vote was YES or ABSTENTION, one of these possibilities happened:
- Nobody had time to analyze the OOXML specification and the ABSTENTION was the right choice.
- Nobody had time to analyze the OOXML specification and a few people decided for you to vote YES, based on ideology or a result of lobby, not technology benefits.
- Even having time to analyze the OOXML specification, a few people decided for you to vote YES, based on ideology or result of lobby, not technology benefits.
Seems stupid, but these are exactly the 2 possibilities of OOXML getting YES votes in ISO. It is still an awful specification.
By the way, Brazil would vote NO again and again and again even if all OOXML’s technical issues could be resolved. OOXML would still have legal issues and also serious overlap problems with the OpenDocument Format ISO standard.
I was not present in today’s meeting in ABNT because I already knew what would be the result, since the process of analysis and deliberation in Brazil was very strictly followed. Hopefully Jomar will write about it and you can check more details.
Can Open Source Software be more ubiquitous than this ?
A few days ago I was playing extensively with Apple’s iPhone, investigating each sub-menu and little details. There is a section listing legal stuff and software being used with each license. GPL, LGPL, BSD and other Open Source licensed software rule the iPhone.
Some I have noted:
- BSD kernel
- PPPD
- poll emulation by Brian Clapper
- stack_protector by Hiroaki Etoh
- FreeBSD software
- libgcc
- libstdc++
- libm by Sun Microsystems
- libiconv
- ncurses
- FreeType
- zlib
- SHA2 by Aaron Gifford
- AES and SHA2 by Brian Gladman
- SQLite
- JPEG lib by Thomas Lane
- TIFF lib by SGI
- Kerberos, WebDAV, install-sh by MIT
- Spidermonkey by Netscape
- OpenSSH
- OpenSSL
- OSF’s DCE
- libpng
- Eric Raymond’s giflib
- bzip2
- libuuid by Theodore Ts’o
- Perl Compatible Regular Expressions
- libxml2
- tidylib
- WebKit
- ipsec-tools and racoon

This is real Interoperability
- Technical specifications for global interoperability and freedom
- Interoperate through standards
- Software product is less important
- Switch products and keep access
Microsoft started their so called “interoperability” initiative. With a deeper look we find that the main objective of the released specifications is to let developers interact with their products only, something I like to call Intra-operability.

This is Intraoperability
- Technical specifications for interaction with and favor the product
- Interoperate through the product
- Software product is the main player
- Switch product and loose access
See the difference?
Microsoft technical specifications have serious technical and legal issues, and are being released not to increase interoperability across people, organizations and applications, but to leverage the use of their products. For the standard document format scenario: OOXML is about allowing other people to interact with files that are primarily generated and manipulated with Microsoft Office. It is not about full interoperability, which would enable competition with Microsoft Office, for obvious reason.
For each purpose or scope, better open standards exist and should be adopted and used instead: Java versus .NET, XHTML versus IE-campatible-only DHTML, ODF versus OOXML, etc
My friend Cezar Taurion has also written some words about Intra-operability (in portuguese). Bob Sutor also put together some words about it.
These graphics are available in a friendly Creative Commons license, in this animated ODF file, in case you want to integrate them in your presentations. PDF export also available.
One of the most critical and discussed points of the whole OOXML subject is how the specification lets you include binary proprietary information.
Let me show you how it happens with a piece of an OOXML document, the red-marked text is the problematic part (see for yourself, §6.2.2.14, paper page 4,813, lines 7–13):
<v:shape>
<o:ink i="AMgFHQSWC+YFASAAaAwAAAAAAMA…” annotation=”t” contentType=”application/x-ms-ink”/>
</v:shape>
So you, as a programmer, please tell me what to do if you are developing an application that must read and generate this kind of document. How can I find documentation about this encoded binary stream? Is this a good practice in XML? Anyway, this kind of (bad) design appears in many parts of the OOXML spec. Want more examples of bad design? Have fun.
Suppose I really want to develop this kind of support in my application, I am a master programmer and reverse-engineered a few examples generated with a copy of MS Office 2007 that I had to buy. Or maybe I just found the specification of this proprietary application/x-ms-ink type and go to develop a library to handle it.
Inevitably, my library will reimplement aspects of some Microsoft library with same functionality, and according to a Software Freedom Law Center report, the Microsoft Open Specification Promise (OSP, basically the not-open-enough license Microsoft lawyers wrote for the usage and implementation of OOXML) cover’s the specification only, not code.
I will be sued for patent infringement. On software rewrite, not on specification usage.
Then the rape begins. Microsoft is claiming that this was solved in the ISO’s BRM meeting. This is the solution, from the ridiculous 12 pages resolutions, page 7:
Resolution 25: The BRM decides to accept the editing instructions contained in http://www.itscj.ipsj.or.jp/sc34/def/BRM/Response_0135_bitfields.doc in place of R 135, replacing “deprecated” by “transitional”, and with the following addition: The Editor shall ensure that all existing attributes defining the bitfields described above shall be “transitional”—so resolved.
Who reads this resolution with high level eyes may think that all binary fields will be removed from the specification. But please, you, as a programmer, tell me how the so called Editor will find all and every single part of a 6000+ pages specification containing bitfields? How he’ll expand all those bitfields in an XML subspecification? Will he invent some? Is he the right person (or team) to do that?
When done and if done, the specification will be something completely new, full of new parts. Will jump from 6063 pages to maybe 7500. Oh, and did I mention it will be something that even MS Office 2007 or 2008 don’t support today? Supporting or not, implemented or not, this new unexistent specification is the “thing” that countries’ National Bodies are voting right now, without even seeing it, without checking if it was corrected, without having time for this because they didn’t receive it for revision.
They won’t have time to review 6000+ pages because the deadlines defined by the ISO’s Fast Track process are over.
So the question is how ISO/IEC and JTC1 let such a big and problematic specification enter the light Fast Track process? My answare: ISO was raped.
I’ve been talking to several people that will define their country’s vote and their mindset is “are you really putting ISO in such a bad position?” Well, yes. You know, in the end ISO is not god. They are a bunch of people that, like you and me, have religions, aspirations, problems, family, go to the bathroom etc. Like you and me, they may be also naive in regard to some subjects, particularly document formats. Then comes Microsoft excellent speakers showing PowerPoint charts that are plain lies (e.g. “OpenOffice.org supports OOXML”) and people believes them.
People have two choices: to question ISO’s reputation in the OOXML case, or question IBM, Sun, Oracle, Red Hat, Free Software Law Center, ODF Alliance and many other institutions’ reputation when they massively scream in chorus that OOXML has serious technical, legal and standardization process issues.
This is a world where organizations like ECMA has completely lost the respect of the technical community. But this is not a big problem, because we have other similar, still reliable bodies as OASIS, W3C, OSF, etc.
This is a world where we will have to work hard to make ISO regain its (currently damaged) reliable status. This is a big problem, because we only have one (and only need one) International Standards Organization.
This is your responsibility. To start this work get involved with your country’s National Body for standardization and promote the creation of a formal letter to ISO about the OOXML process, its problems and how ISO let that happen. INCITS in USA, ABNT in Brazil, INN in Chile etc.
The title was the phrase written in a CD left in a Crown Plaza Hotel room I stood this days.
Its excellent content is a relaxation program with very soft music to listen and do after you go to bed and before you fall asleep, with the aim to make you sleep better.
I found it so effective that I read more through the CD cover. It is produced by a company called SoundSleep with the help of a certain Dr. Michael Breus.
The cover also contained some other “holistic approach to sleep principles” which I also find useful and would like to share:
- Relax before bedtime. Stress can make you miserable and restless. Take some time for a pre-sleep ritual to break the connection between stress and bedtime. Try listening to the enclosed Sleep CD, reading, meditating, light stretching, lavender aromatherapy or a hot shower.
- Watch the caffeine. Coffee and many teas and sodas contain caffeine and may keep you up. If you’ve already had too much, consider eating some carbohydrates like bread or crackers to help reduce the effects.
- Watch the alcohol. Alcohol may initially help you fall asleep, but as your body clears it from your system, it can also cause nightmares, sweats and headache. Drink one glass of water for every alcoholic beverage consumed to try to reduce these symptoms.
- Exercise at the right time. Regular exercise relieves stress and encourages good sleep. However, if a little exercise really gets your blood pumping, you’d be wise to not work out in the evening or just before bedtime.
- Cut down on noise, light and extreme temperatures. Try earplugs, a night light, an eye mask or drape clip. The best temperature for sleep is 20 to 22°C (68 to 72°F).
- East right, and sleep tight. Avoid eating a large meal just before bedtime or going to bed hungry. It’s about balance. Some foods that promote sleep include: milk, pumpkin, artichokes, avocados, almonds, eggs, peaches, walnuts, apricots, oats, aspargus, potatoes and bananas.
- Understand jet lag. Before you cross time zones, try waking up later or earlier to help the body adjust to time difference. And remember, it takes a few days for the body to catch up.
- Remember the purpose of the bed. Avoid TV, eating, and emotional discussions in bed. The mind and body associate bedtime activities with being in bed. So don’t let a bad habit keep you awake.
- No drinks after 8 p.m. It’s a fact; most of us cannot simultaneously go to the bath room and sleep. So shut down your fluid intake before 8 p.m. so you can get your rest.
- Nap smart. A little 20-minute power nap during the early part of the day can really be refreshing. But sleep too much, and you may spend the night staring at the ceiling.

This is an example of an old Excel bug that was transported to OOXML for “compatibility” reasons.
In new OOXML, there are several non-standard ways to represent dates. The main one is this, that requires that 1900 is a leap year contradicting the Gregorian calendar, used for centuries now.
Around early 1995 I was thinking about to stop eating meat. I thought that would do good for my body and my mind.
Then I was waiting in a room, randomly took a book and randomly open it in the part that finally defined my decision. Later a saw many people becoming vegetarian because of the same part from the same book, so I found it important to share. The author was Sri Yukteswar and the book name is The Holy Science, from 1894.
This is the part of the book that influenced my decision that time (copy and pasted from Sain Louis’ Guide to Vegetarian Life):
What is natural food for man?
First, to select our natural food, our observation should be directed to the formation of the organs that aid in digestion and nutrition, the teeth and digestive canal; to the natural tendency of the organs of sense which guide animals to their food; and to the nourishment of the young.
Observation of teeth
By observation of the teeth we find that in carnivourous animals the incisors are little developed, but the canines are of striking length, smooth and pointed, to seize the prey. The molars also are pointed; these points, however, do not meet, but fit closely side by side to separate the muscular fibers.
In the herbivorous animals the incisors are strikingly developed, the canines are stunted (though occasionally developed into weapons, as in elephants), the molars are broad-topped and furnished with enamel on the sides only.
In the frugivorous all the teeth are of nearly the same height; the canines are little projected, conical, and blunt (obviously not inteded for seizing prey but for exertion of strenth), The molars are broad-topped and furnished at the top with enamel folds to prevent waste caused by their side motion, but not pointed for chewing flesh.
In Omnivorous animals such as bears, on the other hand, the incisors resemble those of the herbivorous, the canines are like those of the carnivorous, and the molars are both pointed and broad-topped to serve a twofold purpose.
Now if we observe the formation of the teeth in man we find that they do not resemble those of the carnivores, neither do they resemble the teeth of the herbivorous or th eomnivorous. They do resemble, eactly, those of the frugivorous animals. The reasonable inference, therefore, is that man is a frugivorous or fruit-eating animal.
Observation of the digestive canal
By observation of the digestive canal we find that the bowels of carnivorous animals are 3 to 5 times the length of their body, measuring from the mouth to the anus; and their stomach is almost spherical. The bowels of the herbivores are 20 to 28 times the length of their body and their stomach is more extended and of compound build. But the bowels of the frugivorous animals are 10 to 12 times the length of their body; their stomach is somewhat broader than that of the carnivorous and has a continuation in the duodenum serving the purpose of a second stomach.
This is exactly the formation we find in human beings, though anatomy says that the human bowels are 3 to 5 times the length of man’s body - making a mistake by measuring the body from the crown to the soles, instead of from mouth to anus. Thus we can again draw the inference that man is, in all probability, a frugivorous animal.
Observation of organs of sense
By observation of the natural tendency of the organs of sense - the guideposts for determining what is nutritious - by which all animals are directed to their food, we find that when the carnivorous animal finds prey, he becomes so much delighted that his eyes begin to sparkle; he boldly seizes the prey and greedily laps the jetting blood. On the contrary, the herbivorous animal refuses even his natural food, leaving it untouched, if it is sprinkled with a little blood. His senses of smell and sight lead him to select grasses and other herbs for his food, which he tastes with delight. Similarly with the frugivorous animals, we find that their senses always direct them to fruits of the trees and field.
In men of all races we find that their senses of smell, sound, and sight never lead them to slaughter animals; on the contrary they cannot bear even the sight of such killings. Slaughterhouses are always recommended to be removed far from the towns; men often pass strict ordinances forbidding the uncovered transportation of flesh meats. Can flesh then be considered the natural food of man, when both his eyes and his nose are so much against it, unless deceived by flavors of spices, salt, and sugar? On the other hand, how delightful do we find the fragrance of fruits, the very sight of which often makes the mouth water! It may also be noticed that various grains and roots possess an agreeable odor and taste, though faint, even when unprepared. Thus again, we are led to infer from these observations that man was intended to be a frugivorous animal.
Observation of the nourishment of the young
By observation of the nourishment of the young we find that milk is undoubtedly the food of the newborn babe. Abundant milk is not supplied in the breasts of the mother if she does not take fruits, grains, and vegetables as her natural food.
Cause of disease. Hence from these observations the only conclusion that can reasonably be drawn is that various grains, fruits, roots, and - for beverage - milk, and pure water openly exposed to air and sun are decidedly the best natural food for man. These, being congenial to the system when taken according to the power of the digestive organs, well chewed and mixed with saliva, are always easily assimilated.
Other foods are unnatural to man and being uncongenial to the system are necessarily foreign to it; when they enter the stomach, they are not properly assimilated. Mixed with the blood, they accumulate in the excretory and other organs not properly adapted to them. When they cannot find their way out, they subside in tissue crevices by the law of gravitation; and being fermented, produce diseases, mental and physical, and ultimately lead to premature death.
By the way, I am not 100% vegetarian today. I still have seafood very rarely, when is very difficult to find veggy options in some fancy restaurants.
This is a numerical photography of what is the Microsoft Office Open XML wannabe specification:
- 6063 pages written in 12 months
- Only 6 months of analysis pointed 3500 issues that span from intellectual property, to standard date & time representation, to malformed XML definitions, to binary proprietary bits
- 3500 issues where summarized in “only” 1027
- ECMA proposed 2293 pages of mutually conflicting changes to make happy those 3500 comments
- Only 30 days to analyze those 6063+2293 pages against those 3500 comments
- Only 5 days of BRM to discuss 6063+2293 pages
- Only 18% of these 1027 problems where discussed in BRM
- 82% of problems were not even discussed
- BRM produced a short 12 pages document containing alterations to those 6063 pages long spec based on 2293+thousands of pages of comments from all countries+19 months of technical writing and global analysis work. By the way, those short 12 pages propose such nebulous alterations — as Resolution 25 on page 7 — that it is difficult to foretell how they will technically fit in the new specification. If applied, those alterations could turn the original spec upside down, so deeply that it could become a completely new specification, maybe even worse than the first one.
- Now countries must vote if they approve or not this still unfinished new OOXML that will be divided in 4 independent parts for strict conformance class+4 parts for transitional (deprecated) conformance class. These final multipart wannabe standards don’t exist yet and will be released about the same day countries will have to vote for it.
Or graphically:

And you still think OOXML is ready to become an International Standard Format for storing YOUR documents ?
Come on, give me a break !
Some friends asked so the following is how I encode (rip) DVDs.
Choosing the file format: .AVI, .OGG, .MP4 or .MKV ?
The ripped video file format is a decision you must make. Currently my format of choice is .MKV or Matroska. I’ll explain why.
It is quite idiotic to say that an .MP4 movie has better quality than a .AVI or vice-verse (or any other combination of comparisons). OGG, MP4 (MPEG-4 Part 14), MKV (Matroska), AVI, WMV (or ASF) are just containers, envelopes. Video quality depends on what goes inside it.
“Multimedia” has this name because you have multiple types of media: video in multiple angles, multiple audio options including different languages and channels (stereo, 5.1, 6 channels etc), subtitles in several languages, chapter information, menu etc. Think about a DVD. So this is a graphical view of how things are organized inside a 900MB movie file in a modern format as MKV or MP4:
| Header with tags, track names, chapters info, seek positions | Main Video track (MPEG-4 AVC/H.264) | Attachments as JPG images, documents, scripts or text files | |||||||
| Video segment showing another angle (MPEG-4 ASP/Xvid/DivX) | |||||||||
| Audio track: English Dolby Surround 5.1 (AC3) | |||||||||
| Audio track: Director’s comments stereo (MP3) | |||||||||
| Audio track: Portuguese Dolby Surround 5.1 (DTS) | |||||||||
| Subtitle track: Portuguese (Unicode text) | |||||||||
| Subtitle track: Chinese (Unicode text) | |||||||||
| Subtitle track: English (VobSub) | |||||||||
| byte 100K | byte 100M | byte 200M | byte 310M | byte 420M | byte 530M | byte 650M | byte 780M | byte 895M | byte 900M |
A digital multimedia file format must be capable to contain all this different medias and multiplex them in parallel so you won’t have the video in the first 500MB of the file and the audio on the following 500MB (this can’t work for streaming). And this is exactly what modern file formats as MP4 and MKV do: they carry all your movie-related data together.
This is a comparison of all these file formats based on my personal experience with them (a more formal comparison can be found in Wikipedia):
| .MKV | .MP4 | .AVI | |
|---|---|---|---|
| Industry support | Almost none | Good and increasing, specially on Apple platforms, the mobile scene and Nero Digital ecosystem | Treated as legacy popular format |
| Usage on the web | Very popular on HD or high quality DVD rips | Very popular on HD or high quality DVD rips, supported by Flash Player, YouTube, Google Video | Popular amongst low-quality DVD rips |
| Support for advanced video formats and multiple video angles | Yes. MPEG-4 ASP (DivX, Xvid), MPEG-4 AVC (a.k.a. H.264) etc | Yes. Only MPEG-4 systems and a few others | Problematic and No |
| Support for multiple audio tracks (channels, formats, languages and “director’s comments”) | Yes | Yes. Formats are only MP3, AAC and a few others not very popular | Yes |
| Support for tags (artist, title, composer, etc as MP3’s ID3) | Yes | Can be supported by MP4 extensibility but this is not standardized across authoring tools (iTunes, GPAC etc) and players (Amarok, Media Player Classic, iPod, Windows Media Player etc) | No |
| Support for attachments with mime-types (used to attach movie posters images or other files) | Yes | No | |
| Support for chapter marks | Yes | No | |
| Support for multiple language embedded soft-subtitles | Yes. VobSub (as extracted from DVDs), plain timed UTF-8 text (SRT, SUB) etc | No | |
| Support for naming tracks with human names as “Director’s comments” or “Portuguese subtitles” etc | Yes | No | No |
| Support for menus (as in DVDs) and interaction | Yes through an XML idiom, but unsupported by most players | Yes through SVG, but unsupported by most players | No |
| The container overhead in bytes in the final file | Very small | Very small | Very big |
| Supported by free and Open Source multiplatform authoring tools | Perfect on Linux, Unix, Windows and Mac | Yes, with some intellectual property issues and tools need to mature | Yes |
Personally I believe MP4 is the multimedia file format for the future because since it is getting popular, all these unstandardized features will get stabilized. MP4 is an ISO standard and the increasing industry support can be felt on iPods and portable devices, and most notable on home DVD players capable of playing the 700MB MP4 video file burned in a CD.
By the way, remember this:
- MP4 is not an evolution of MP3. AAC (MPEG-4 Part 3) is.
- MP5 and MP6 (used to classify portable media players) are things that simply doesn’t exist in the multimedia scene.
- .M4A, .M4V, .MOV and .3GP files can safely be renamed to .MP4. MP4 is the generic standard name.
Meanwhile, MKV wins everything but on the Industry Support category. But this doesn’t really matter, and I’ll explain why. Since MKV is just a container, the large video, audio etc streams can be extracted and repackaged into MP4 and vice-versa in seconds. No transcoding (decoding followed by an encoding into another format) is needed.
So today I store my videos in the most feature rich and well supported by players format: MKV.
OGG or OGM (the container file format) is practically dead in my opinion. They were created as part of the Xiph initiative for a complete open source patent-free multimedia framework, but seems nobody uses it anymore for video. From the same family, Vorbis (the audio codec compared to MP3, a.k.a. .OGG) is very good but also very not popular. Theora (the video codec) is frequently comparable to old MPEG-1 in terms of quality and compression ration so currently, if you want quality and are not concerned about patents, MPEG-4 AVC is the best choice. FLAC, Xiph’s lossless audio codec, is the winner of the family: very popular, massively used, and recommended.
Encoding the DVD
I use HandBrake, the most practical Open Source (and overall) movie encoder. It runs on Linux, Mac and Windows and uses the same Open Source libraries as ffmpeg, mplayer/mencoder, xine, etc. While these programs are generic video handlers (with thousands of confusing configuration parameters to sustain this generalistic status) HandBrake is optimized only for ripping so it is very easy to use, yet extremely powerful.
#!/bin/bash
##
## This is the script I use to make hifi DVD rips including chapter markers and
## subtitles. It uses Handbrake.
## Contains what I found to be the best quality ripping parameters and
## also let me set simple parameters I need.
##
## Avi Alkalay <avi at unix dot sh>
## http://avi.alkalay.net/2008/03/mpeg4-dvd-rip.html
##
## $Id$
##
#set -vx
HANDBRAKE=${HANDBRAKE:=~/bin/HandBrakeCLI}
#HANDBRAKE=${HANDBRAKE:="/cygdrive/c/Program Files/Handbrake/HandBrakeCLI.exe"}
## Where is the Handrake encoder executable.
## Handbrake is the most practical free, OSS, DVD riper available.
## Download HandBrake for Linux, Mac or Windows at http://HandBrake.fr
INPUT=${INPUT:=/dev/dvd}
## What to process. Can also be a mounted DVD image or simply '/dev/dvd'
TITLE=${TITLE:=L}
## The title number to rip, or empty or "L" to get the longest title
#CHAPTERS=${CHAPTERS:=7}
## Example: 0 or undefined (all chapters), 7 (only chapter 7), 3-6 (chapters 3 to 6)
#VERBOSE=${VERBOSE:="yes"}
## Wether to be verbose while processing.
SIZE=${SIZE:=1200}
## Target file size in MB. The biggest the file size, the best the quality.
## I use to use from 1000MB to 1400MB for astonishing high quality H.264 rips.
OUTPUT=${OUTPUT:="/tmp/output.mkv"}
## Output file. This will also define the file format.
## MKV (Matroska) is currently the best but MP4 is also good.
AUDIO=${AUDIO:="-E ac3 -6 dpl2 -D 1"} # For AC3 passthru (copy).
#AUDIO=${AUDIO:="-E lame -B 160"} # For MP3 reencoding. Good when input is DTS.
## Audio parameters. If input is AC3, use it without transcoding.
## If is DTS, reencode to MP3.
MATRIX=${MATRIX:=`dirname $0`/eqm_avc_hr.cfg}
## x264 matrix to use. The matrix file may increase encoding speed and quality.
## This one is Sharktooth's as found
## at http://forum.doom9.org/showthread.php?t=96298
######### Do not change anything below this line ##############
## Make some calculations regarding title and chapters based on parameters.
SEGMENT=""
if [[ "$TITLE" == "L" || -z "$TITLE" ]]; then
SEGMENT="-L"
else
SEGMENT="-t $TITLE"
fi
[[ -n "$CHAPTERS" && "$CHAPTERS" -ne 0 ]] && SEGMENT+=" -c $CHAPTERS"
[[ "$VERBOSE" != "no" ]] && VERB="-v"
# Define args for the x264 encoder. These are some values I found on the net
# which give excelent results.
X264ARGS="ref=3:mixed-refs:bframes=6:b-pyramid=1:bime=1:b-rdo=1:weightb=1"
X264ARGS+=":analyse=all:8×8dct=1:subme=6:me=umh:merange=24:filter=-2,-2"
X264ARGS+=":ref=6:mixed-refs=1:trellis=1:no-fast-pskip=1"
X264ARGS+=":no-dct-decimate=1:direct=auto"
[[ -n "$MATRIX" ]] && X264ARGS+=":cqm=$MATRIX"
# Encode…
"$HANDBRAKE" $VERB -i "$INPUT" -o "$OUTPUT" \
-S $SIZE \
-m $SEGMENT \
$AUDIO \
-e x264 -2 -T -p \
-x $X264ARGS
# Repackage to optimize file size, to include seek and to include this
# this script as a way to document the rip…
echo $OUTPUT | grep -qi ".mkv"
if [[ $? && -x `which mkvmerge` && -f $OUTPUT ]]; then
mv $OUTPUT $OUTPUT.mkv
mkvmerge -o $OUTPUT $OUTPUT.mkv \
–attachment-name "The ripping script" \
–attachment-description "How this movie was created from original DVD" \
–attachment-mime-type application/x-sh \
–attach-file $0
[[ -f $OUTPUT ]] && rm $OUTPUT.mkv
fi
The script seems long because it is fully documented but it actually only collects some parameters and simply runs the HandBrake encoder like this (passed parameters are in red):
~/bin/HandBrakeCLI -v -i /dev/dvd -o /tmp/output.mkv \ -S 1200 \ -m -L \ -E lame -B 160 \ -e x264 -2 -T -p \ -x ref=3:mixed-refs:bframes=6:b-pyramid=1:bime=1:b-rdo=1:weightb=1:analyse=all:8×8dct=1:subme=6:me=umh:merange=24:filter=-2,-2:ref=6:mixed-refs=1:trellis=1:no-fast-pskip=1:no-dct-decimate=1:direct=auto:cqm=~/src/randomscripts/videotools/eqm_avc_hr.cfg
All the rest is what I found to be the best encoding parameters.
The resulting video file (/tmp/output






