Good luck to Kyndryl

To all friends that I’ve worked with at IBM and that are now moving to Kyndryl, I wish you success and good luck. The Cloud and IT services opportunity will continue to be huge forever. The countdown you have promoted here was warm and vibrant.

For the still-on-IBM friends, please keep on doing such a great company that always was and continues to be a brilliant reference to the world, not just IT. IBM is an unforgettable school for me and for anybody else that has spent even just a minute working there.

Business worldwide, as we know it, is shaped by companies such as IBM, even if you’ve never heard about it (well, that’s quite impossible).

Also on LinkedIn.

How programmers should record time

We the data people immediately identify a poorly designed system when we see it handling date and time as plain local time, instead of the number of seconds since January 1st 1970 of time zone 0.

  • This post was published on 1,626,425,523 (UTC, always UTC).
  • Jesus was born -62,399,513,432.
  • Man visited the moon between -14,552,880 and 93,172,200.
  • And so on…

Just your daily dose of nerdy facts…

Also on my LinkedIn

What means to be Driven By Data

I’ve seen companies saying they have Big Data because they implemented Hadoop or a data lake and maybe Spark.

That’s just wrong.

Big Data, or more precisely, to be Data Driven, is a state where the data a company produces can be reused, as soon as possible, to optimize itself. And there are many ways to reuse data: all meetings and decisions happen with abundance of data, or recently generated data instantly feeds machine learning algorithms to optimize transactions, just to name a few situations.

To be Driven by Data is part culture and part infrastructure. On the infrastructure side, IT teams still struggle with limited visions about how data should flow pervasively and how access should be granted. They fear about security and performance while they should fear of missing out the data opportunity.

Data Streaming is a breakthrough recent technology that is here to help with more fluent data access. For an agile and effective data architecture, Data Streaming is much more strategic and important than just a bigger data warehouse because it is the component that can unleash your data and finally make it useful.

On my LinkedIn

What is Apache Spark

Apache Spark is like Python’s Pandas and is like SQL databases. It can manipulate datasets, filter, integrate, transform.

But Spark was designed from scratch with horizontal scalability and parallelism in mind, which makes it capable of handling datasets with billions or even unknown number of rows — even if a bit less flexible than Pandas.

This is not new in the industry. Enterprise editions of commercial SQL databases are parallel and scalable since a very long time, being also very expensive in all levels of the stack: service/support, software and hardware.

But Spark is free software. And can use Hadoop — also a free software — as scalable and highly available storage, on cheap commodity hardware. In addition, it has a vibrant community and a democratic ecosystem of services and support.

As with all Open Source, Apache Spark changes the economic landscape of massive data processing systems market, taking money out of a few proprietary HW and SW vendors and pulverizing it locally on people and support.

From my LinkedIn

List of Hard Skills for Data Professionals

2020 list of desired hard skills for data professionals. From the most essential to the more difficult ones.

  1. The English language
  2. SQL
  3. Spreadsheets
  4. Descriptive Statistics (median, variance, correlation etc)
  5. Notions of Data visualization
  6. Notions of Time Series
  7. Handling computer files and folders (this one entered the list because we observed many people simply don’t have it)
  8. Notions of digital information storage (numbers and their limits, time, time zones, text, Unicode, compression)
  9. Probability
  10. Probability Distributions
  11. Linear and Logistic Regressions
  12. Python libraries ecosystem, pip, PyPi
  13. Python’s Pandas, DataFrame and Series wrangling
  14. Linux and the computer command line
  15. NoSQL, JSON, YAML, XML, SVG, APIs, HTTP, protocols and data representation
  16. Cloud and infrastructure as code
  17. Notions of symmetric and asymmetric cryptography, digital signatures and applications
  18. “Big data” systems (Hadoop, Spark)
  19. Software Engineering (classes, modularisation, versioning, containerisation, packaging, DevOps)
  20. Inferential Statistics (confidence intervals, hypothesis testing)
  21. Machine Learning algorithms for regression and classification
  22. Calculus and Numerical Calculus (integrals, derivaties)
  23. Natural Language Processing
  24. Computer vision
  25. Neural Networks

Please remember this list has only hard skills. Ethics, domain and industry knowledge, communication are very important soft skills that won’t fit in this list.

Generally speaking, beginning of the list is where Data Analysts are (up to ≈11). Data Engineers get up to the middle of list (up to ≈18). And Scientists get all the list.

There is also the following graph that I’ve produced:

data professions competencies

Jupyter and Data Science on a Mac (without Anaconda)

macOS Catalina doesn’t ship with Python 3, only 2. But you can still get 3 from Apple, updated regularly through system’s official update methods. You don’t need to get the awful Anaconda on you Mac to play with Python.

Python 3 is shipped by Xcode Command Line Tools. To get it installed (without the heavy Xcode GUI), type this in your terminal:

xcode-select --install

This way, every time Apple releases an update, you’ll get it.

Settings window will pop so wait 5 minutes for the installation to finish.

If you already have complete Xcode installed, this step was unnecessary (you already had Python 3 installed) and you can continue to the next section of the tutorial.

Clean Old Python Modules

In case you already have Python installed under your user and modules downloaded with pip, remove it:

rm -rf ${HOME}/Caches/com.apple.python/${HOME}/Library/Python \
${HOME}/Library/Python/ \
${HOME}/Library/Caches/pip

Install Python Modules

Now that you get a useful Python 3 installation, use pip3 to install Python modules that you’ll need. Don’t forget to use –user to get things installed on your home folder so you won’t pollute your overall system. For my personal use, I need the complete machine learning, data wrangling and Jupyter suite:

pip3 install --user sqlalchemy
pip3 install --user matplotlib
pip3 install --user pandas
pip3 install --user jupyterlab
pip3 install --user PyMySQL
pip3 install --user configobj
pip3 install --user requests
pip3 install --user seaborn
pip3 install --user bs4
pip3 install --user xgboost
pip3 install --user scikit_learn

But you might need other things as Django or other sqlalchemy drivers. Set yourself at home and install them with pip3.

For modules that require compilation and special library, say crypto, do it like this:

CFLAGS="-I/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/include" \
LDFLAGS="-L/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib" \
pip3 install --user pycrypto

Use Correct Python 3 Binary

For some reason, Apple installs many different Python 3 binaries in different places of the system. The one that gets installed on /usr/bin/python3 has problems loading some libraries and instrumentation with install_name_tool would be required. So lets just use the binary that works better:

export PATH=/Library/Developer/CommandLineTools/usr/bin:$PATH

Run Jupyter Lab on your Mac

Commands installed by pip3 will be available in the ~/Library/Python/3.7/bin/ folder, so just add it to your PATH:

export PATH=$PATH:~/Library/Python/3.7/bin/

Now I can simply type jupyter-lab anywhere in the terminal or command line to make it fire my browser and get a Jupyter environment.

More about Xcode Command Line Tools

Xcode Command Line Tools will get you a full hand of other useful developer tools, such as git, subversion, GCC and LLVM compilers and linkers, make, m4 and a complete Python 3 distribution. You can see most of its installation on /Library/Developer/CommandLineTools folder.

For production and high end processing I’ll still use Python on Linux with my preferred distribution’s default packages (no Anaconda). But this method of getting Python on macOS is fastest and cleanest to get you going on your own data scientist laptop without a VM nor a container.

Jupyter Notebook on Fedora with official packages and SSL

Jupyter Notebooks are the elegant way that Data Scientists work and all software needed to run them are already pre-packaged on Fedora (and any other Linux distribution). It is encouraged to use your distribution’s packaging infrastructure to install Python packages. Avoid at any cost installing Python packages with pip, conda, anaconda and from source code. The reasons for this good practice are security, ease of use, to keep the system clean and to make installation procedures easily reproducible in DevOps scenarios.

Jupyter Notebook on Fedora with MathJax and Python
Jupyter Notebook on Fedora with MathJax and Python

Read More

The difference between Open Source and Open Governance

When Sun and then Oracle bought MySQL AB, the company behind the original development, MySQL open source database development governance gradually closed. Now, only Oracle writes updates. Updates from other sources — individuals or other companies — are ignored. MySQL is still open source, but it has a closed governance.

MySQL is one of the most popular databases in the world. Every WordPress and Drupal website runs on top of MySQL, as well as the majority of generic Ruby, Django, Flask and PHP apps which have MySQL as their database of choice.

When an open source project becomes this popular and essential, we say it is gaining momentum. MySQL is so popular that it is bigger than its creators. In practical terms, that means its creators can disappear and the community will take over the project and continue its evolution. It also means the software is solid, support is abundant and local, sometimes a commodity or even free.

In the case of MySQL, the source code was forked by the community, and the MariaDB project started from there. Nowadays, when somebody says he is “using MySQL”, he is in fact probably using MariaDB, which has evolved from where MySQL stopped in time.

Open source vs. open governance

Open source software’s momentum serves as a powerful insurance policy for the investment of time and resources an individual or enterprise user will put into it. This is the true benefit behind Linux as an operating system, Samba as a file server, Apache HTTPD as a web server, Hadoop, Docker, MongoDB, PHP, Python, JQuery, Bootstrap and other hyper-essential open source projects, each on its own level of the stack. Open source momentum is the safe antidote to technology lock-in. Having learned that lesson over the last decade, enterprises are now looking for the new functionalities that are gaining momentum: cloud management software, big data, analytics, integration middleware and application frameworks.

On the open domain, the only two non-functional things that matter in the long term are whether it is open source and if it has attained momentum in the community and industry. None of this is related to how the software is being written, but this is exactly what open governance is concerned with: the how.

Open source governance is the policy that promotes a democratic approach to participating in the development and strategic direction of a specific open source project. It is an effective strategy to attract developers and IT industry players to a single open source project with the objective of attaining momentum faster. It looks to avoid community fragmentation and ensure the commitment of IT industry players.

The value of momentum

Open governance alone does not guarantee that the software will be good, popular or useful (though formal open governance only happens on projects that have already captured some attention of IT industry leaders). A few examples of open source projects that have formal open governance are CloudFoundry, OpenStack, JQuery and all the projects under the Apache Software Foundation umbrella.

For users, the indirect benefit of open governance is only related to the speed the open source project reaches momentum and high popularity.

Open governance is important only for the people looking to govern or contribute. If you just want to use, open source momentum is far more important.

Also published on Thoughts on Cloud.

Also published on LinkedIn.

Advanced Multimedia on the Linux Command Line

There was a time that Apple macOS was the best platform to handle multimedia (audio, image, video). This might be still true in the GUI space. But Linux presents a much wider range of possibilities when you go to the command line, specially if you want to:

  • Process hundreds or thousands of files at once
  • Same as above, organized in many folders while keeping the folder structure
  • Same as above but with much fine grained options, including lossless processing, pixel perfectness that most GUI tools won’t give you

The Open Source community has produced state of the art command line tools as ffmpeg, exiftool and others, which I use every day to do non-trivial things, along with Shell advanced scripting. Sure, you can get these tools installed on Mac or Windows, and you can even use almost all these recipes on these platforms, but Linux is the native platform for these tools, and easier to get the environment ready.

These are my personal notes and I encourage you to understand each step of the recipes and adapt to your workflows. It is organized in Audio, Video and Image+Photo sections.

I use Fedora Linux and I mention Fedora package names to be installed. You can easily find same packages on your Ubuntu, Debian, Gentoo etc, and use these same recipes.

Audio

Show information (tags, bitrate etc) about a multimedia file

ffprobe file.mp3
ffprobe file.m4v
ffprobe file.mkv

Lossless conversion of all FLAC files into more compatible, but still Open Source, ALAC

ls *flac | while read f; do
	ffmpeg -i "$f" -acodec alac -vn "${f[@]/%flac/m4a}" < /dev/null;
done

Convert all FLAC files into 192kbps MP3

ls *flac | while read f; do
   ffmpeg -i "$f" -qscale:a 2 -vn "${f[@]/%flac/mp3}" < /dev/null;
done

Convert all FLAC files into ~256kbps VBR AAC with Fraunhofer AAC encoder

First, make sure you have Negativo17 build of FFMPEG, so run this as root:

dnf config-manager --add-repo=http://negativo17.org/repos/fedora-multimedia.repo
dnf update ffmpeg

Now encode:

ls *flac | while read f; do
   ffmpeg -i "$f" -vn -c:a libfdk_aac -vbr 5 -movflags +faststart "${f[@]/%flac/m4a}" < /dev/null;
done

Has been said the Fraunhofer AAC library can’t be legally linked to ffmpeg due to license terms violation. In addition, ffmpeg’s default AAC encoder has been improved and is almost as good as Fraunhofer’s, specially for constant bit rate compression. In this case, this is the command:

ls *flac | while read f; do
   ffmpeg -i "$f" -vn -c:a aac -b:a 256k -movflags +faststart "${f[@]/%flac/m4a}" < /dev/null;
done

Same as above but under a complex directory structure

This is one of my favorites, extremely powerful. Very useful when you get a Hi-Fi, complete but useless WMA-Lossless collection and need to convert it losslesslly to something more portable, ALAC in this case. Change the FMT=flac to FMT=wav or FMT=wma (only when it is WMA-Lossless) to match your source files. Don’t forget to tag the generated files.

FMT=flac
# Create identical directory structure under new "alac" folder
find . -type d | while read d; do
   mkdir -p "alac/$d"
done

find . -name "*$FMT" | sort | while read f; do
   ffmpeg -i "$f" -acodec alac -vn "alac/${f[@]/%$FMT/m4a}" < /dev/null;
   mp4tags -E "Deezer lossless files (https://github.com/Ghostfly/deezDL) + 'ffmpeg -acodec alac'" "alac/${f[@]/%$FMT/m4a}";
done

Embed lyrics into M4A files

iPhone and iPod music player can display the file’s embedded lyrics and this is a cool feature. There are several ways to get lyrics into your music files. If you download music from Deezer using SMLoadr, you’ll get files with embedded lyrics. Then, the FLAC to ALAC process above will correctly transport the lyrics to the M4A container. Another method is to use beets music tagger and one of its plugins, though it is very slow for beets to fetch lyrics of entire albums from the Internet.

The third method is manual. Let lyrics.txt be a text file with your lyrics. To tag it into your music.m4a, just do this:

mp4tags -L "$(cat lyrics.txt)" music.m4a

And then check to see the embedded lyrics:

ffprobe music.m4a 2>&1 | less

Convert APE+CUE, FLAC+CUE, WAV+CUE album-on-a-file into a one file per track ALAC or MP3

If some of your friends has the horrible tendency to commit this crime and rip CDs as 1 file for entire CD, there is an automation to fix it. APE is the most difficult and this is what I’ll show. FLAC and WAV are shortcuts of this method.

  1. Make a lossless conversion of the APE file into something more manageable, as WAV:
    ffmpeg -i audio-cd.ape audio-cd.wav
  2. Now the magic: use the metadata on the CUE file to split the single file into separate tracks, renaming them accordingly. You’ll need the shnplit command, available in the shntool package on Fedora (to install: yum install shntool). Additionally, CUE files usually use ISO-8859-1 (Latin1) charset and a conversion to Unicode (UTF-8) is required:
    iconv -f Latin1 -t UTF-8 audio-cd.cue | shnsplit -t "%n · %p ♫ %t" audio-cd.wav
  3. Now you have a series of nicely named WAV files, one per CD track. Lets convert them into lossless ALAC using one of the above recipes:
    ls *wav | while read f; do
       ffmpeg -i "$f" -acodec alac -vn "${f[@]/%wav/m4a}" < /dev/null;
    done

    This will get you lossless ALAC files converted from the intermediary WAV files. You can also convert them into FLAC or MP3 using variations of the above recipes.

Now the files are ready for your tagger.

Video

Add chapters and soft subtitles from SRT file to M4V/MP4 movie

This is a lossless and fast process, chapters and subtitles are added as tags and streams to the file; audio and video streams are not reencoded.

  1. Make sure your SRT file is UTF-8 encoded:
    bash$ file subtitles_file.srt
    subtitles_file.srt: ISO-8859 text, with CRLF line terminators
    

    It is not UTF-8 encoded, it is some ISO-8859 variant, which I need to know to correctly convert it. My example uses a Brazilian Portuguese subtitle file, which I know is ISO-8859-15 (latin1) encoded because most latin scripts use this encoding.

  2. Lets convert it to UTF-8:
    bash$ iconv -f latin1 -t utf8 subtitles_file.srt > subtitles_file_utf8.srt
    bash$ file subtitles_file_utf8.srt
    subtitles_file_utf8.srt: UTF-8 Unicode text, with CRLF line terminators
    
  3. Check chapters file:
    bash$ cat chapters.txt
    CHAPTER01=00:00:00.000
    CHAPTER01NAME=Chapter 1
    CHAPTER02=00:04:31.605
    CHAPTER02NAME=Chapter 2
    CHAPTER03=00:12:52.063
    CHAPTER03NAME=Chapter 3
    …
    
  4. Now we are ready to add them all to the movie along with setting the movie name and embedding a cover image to ensure the movie looks nice on your media player list of content. Note that this process will write the movie file in place, will not create another file, so make a backup of your movie while you are learning:
    MP4Box -ipod \
           -itags 'track=The Movie Name:cover=cover.jpg' \
           -add 'subtitles_file_utf8.srt:lang=por' \
           -chap 'chapters.txt:lang=eng' \
           movie.mp4
    

The MP4Box command is part of GPac.

OpenSubtitles.org has a large collection of subtitles in many languages and you can search its database with the IMDB ID of the movie. And ChapterDB has the same for chapters files.

Add cover image and other metadata to a movie file

Since iTunes can tag and beautify your movie files in Windows and Mac, libmp4v2 can do the same on Linux. Here we’ll use it to add the movie cover image we downloaded from IMDB along with some movie metadata for Woody Allen’s 2011 movie Midnight in Paris:

mp4tags -H 1 -i movie -y 2011 -a "Woody Allen" -s "Midnight in Paris" -m "While on a trip to Paris with his..." "Midnight in Paris.m4v"
mp4art -k -z --add cover.jpg "Midnight in Paris.m4v"

This way the movie file will look good and in the correct place when transferred to your iPod/iPad/iPhone.

Of course, make sure the right package is installed first:

dnf install libmp4v2

File extensions MOV, MP4, M4V, M4A are the same format from the ISO MPEG-4 standard. They have different names just to give a hint to the user about what they carry.

Decrypt and rip a DVD the loss less way

  1. Make sure you have the RPMFusion and the Negativo17 repos configured
  2. Install libdvdcss and vobcopy
    dnf -y install libdvdcss vobcopy
  3. Mount the DVD and rip it, has to be done as root
    mount /dev/sr0 /mnt/dvd;
    cd /target/folder;
    vobcopy -m /mnt/dvd .

You’ll get a directory tree with decrypted VOB and BUP files. You can generate an ISO file from them or, much more practical, use HandBrake to convert the DVD titles into MP4/M4V (more compatible with wide range of devices) or MKV/WEBM files.

Convert 240fps video into 30fps slow motion, the loss-less way

Modern iPhones can record videos at 240 or 120fps so when you’ll watch them at 30fps they’ll look slow-motion. But regular players will play them at 240 or 120fps, hiding the slo-mo effect.

We’ll need to handle audio and video in different ways. The video FPS fix from 240 to 30 is loss less, the audio stretching is lossy.

# make sure you have the right packages installed
dnf install mkvtoolnix sox gpac faac
#!/bin/bash

# Script by Avi Alkalay
# Freely distributable

f="$1"
ofps=30
noext=${f%.*}
ext=${f##*.}

# Get original video frame rate
ifps=`ffprobe -v error -select_streams v:0 -show_entries stream=r_frame_rate -of default=noprint_wrappers=1:nokey=1 "$f" < /dev/null  | sed -e 's|/1||'`
echo

# exit if not high frame rate
[[ "$ifps" -ne 120 ]] && [[ "$ifps" -ne 240 ]] && exit

fpsRate=$((ifps/ofps))
fpsRateInv=`awk "BEGIN {print $ofps/$ifps}"`

# loss less video conversion into 30fps through repackaging into MKV
mkvmerge -d 0 -A -S -T \
	--default-duration 0:${ofps}fps \
	"$f" -o "v$noext.mkv"

# loss less repack from MKV to MP4
ffmpeg -loglevel quiet -i "v$noext.mkv" -vcodec copy "v$noext.mp4"
echo

# extract subtitles, if original movie has it
ffmpeg -loglevel quiet -i "$f" "s$noext.srt"
echo

# resync subtitles using similar method with mkvmerge
mkvmerge --sync "0:0,${fpsRate}" "s$noext.srt" -o "s$noext.mkv"

# get simple synced SRT file
rm "s$noext.srt"
ffmpeg -i "s$noext.mkv" "s$noext.srt"

# remove undesired formating from subtitles
sed -i -e 's|<font size="8"><font face="Helvetica">\(.*\)</font></font>|\1|' "s$noext.srt"

# extract audio to WAV format
ffmpeg -loglevel quiet -i "$f" "$noext.wav"

# make audio longer based on ratio of input and output framerates
sox "$noext.wav" "a$noext.wav" speed $fpsRateInv

# lossy stretched audio conversion back into AAC (M4A) 64kbps (because we know the original audio was mono 64kbps)
faac -q 200 -w -s --artist a "a$noext.wav"

# repack stretched audio and video into original file while removing the original audio and video tracks
cp "$f" "${noext}-slow.${ext}"
MP4Box -ipod -rem 1 -rem 2 -rem 3 -add "v$noext.mp4" -add "a$noext.m4a" -add "s$noext.srt" "${noext}-slow.${ext}"

# remove temporary files 
rm -f "$noext.wav" "a$noext.wav" "v$noext.mkv" "v$noext.mp4" "a$noext.m4a" "s$noext.srt" "s$noext.mkv"

1 Photo + 1 Song = 1 Movie

If the audio is already AAC-encoded (may also be ALAC-encoded), create an MP4/M4V file:

ffmpeg -loop 1 -framerate 0.2 -i photo.jpg -i song.m4a -shortest -c:v libx264 -tune stillimage -vf scale=960:-1 -c:a copy movie.m4v

The above method will create a very efficient 0.2 frames per second (-framerate 0.2) H.264 video from the photo while simply adding the audio losslessly. Such very-low-frames-per-second video may present sync problems with subtitles on some players. In this case simply remove the -framerate 0.2 parameter to get a regular 25fps video with the cost of a bigger file size.

The -vf scale=960:-1 parameter tells FFMPEG to resize the image to 960px width and calculate the proportional height. Remove it in case you want a video with the same resolution of the photo. A 12 megapixels photo file (around 4032×3024) will get you a near 4K video.

If the audio is MP3, create an MKV file:

ffmpeg -loop 1 -framerate 0.2 -i photo.jpg -i song.mp3 -shortest -c:v libx264 -tune stillimage -vf scale=960:-1 -c:a copy movie.mkv

If audio is not AAC/M4A but you still want an M4V file, convert audio to AAC 192kbps:

ffmpeg -loop 1 -framerate 0.2 -i photo.jpg -i song.mp3 -shortest -c:v libx264 -tune stillimage -vf scale=960:-1 -c:a aac -strict experimental -b:a 192k movie.m4v

See more about FFMPEG photo resizing.

There is also a more efficient and completely lossless way to turn a photo into a video with audio, using extended podcast techniques. But thats much more complicated and requires advanced use of GPAC’s MP4Box and NHML. In case you are curious, see the Podcast::chapterize() and Podcast::imagify() methods in my  music-podcaster script. The trick is to create an NHML (XML) file referencing the image(s) and add it as a track to the M4A audio file.

Image and Photo

Move images with no EXIF header to another folder

mkdir noexif;
exiftool -filename -T -if '(not $datetimeoriginal or ($datetimeoriginal eq "0000:00:00 00:00:00"))' *HEIC *JPG *jpg | while read f; do mv "$f" noexif/; done

Set EXIF photo create time based on file create time

Warning: use this only if image files have correct creation time on filesystem and if they don’t have an EXIF header.

exiftool -overwrite_original '-DateTimeOriginal< ${FileModifyDate}' *CR2 *JPG *jpg

Rotate photos based on EXIF’s Orientation flag, plus make them progressive. Lossless

jhead -autorot -cmd "jpegtran -progressive '&i' > '&o'" -ft *jpg

Rename photos to a more meaningful filename

This process will rename silly, sequential, confusing and meaningless photo file names as they come from your camera into a readable, sorteable and useful format. Example:

IMG_1234.JPG2015.07.24-17.21.33 • Max playing with water【iPhone 6s✚】.jpg

Note that new file name has the date and time it was taken, whats in the photo and the camera model that was used.

  1. First keep the original filename, as it came from the camera, in the OriginalFileName tag:
    exiftool -overwrite_original '-OriginalFileName<${filename}' *CR2 *JPG *jpg
  2. Now rename:
    exiftool '-filename<${DateTimeOriginal} 【${Model}】%.c.%e' -d %Y.%m.%d-%H.%M.%S *CR2 *HEIC *JPG *jpg
  3. Remove the ‘0’ index if not necessary:
    \ls *HEIC *JPG *jpg *heic | while read f; do
        nf=`echo "$f" | sed -e 's/0.JPG/.jpg/i; s/0.HEIC/.heic/i'`;
        t=`echo "$f" | sed -e 's/0.JPG/1.jpg/i; s/0.HEIC/1.heic/i'`;
        [[ ! -f "$t" ]] && mv "$f" "$nf";
    done

    Alternative for macOS without SED:

    \ls *HEIC *JPG *jpg *heic | perl -e '
    	while (<>) {
    		chop; $nf=$_; $t=$_;
    		$nf=~s/0.JPG/.jpg/i; $nf=~s/0.HEIC/.heic/i;
    		$t=~s/0.JPG/1.jpg/i; $t=~s/0.HEIC/1.heic/i;
    		rename($_,$nf) if (! -e $t);
    	}'
  4. Optional: make lower case extensions:
    \ls *HEIC *JPG | while read f; do
        nf=`echo "$f" | sed -e 's/JPG/jpg/; s/HEIC/heic/'`;
        mv "$f" "$nf";
    done
  5. Optional: simplify camera name, for example turn “Canon PowerShot G1 X” into “Canon G1X” and make lower case extension at the same time:
    \ls *HEIC *JPG *jpg *heic | while read f; do
        nf=`echo "$f" | sed -e 's/Canon PowerShot G1 X/Canon G1X/;
          s/iPhone 6s Plus/iPhone 6s✚/;
          s/iPhone 7 Plus/iPhone 7✚/;
          s/Canon PowerShot SD990 IS/Canon SD990 IS/;
          s/HEIC/heic/;
          s/JPG/jpg/;'`;
        mv "$f" "$nf";
    done

You’ll get file names as 2015.07.24-17.21.33 【Canon 5D Mark II】.jpg. If you took more then 1 photo in the same second, exiftool will automatically add an index before the extension.

Even more semantic photo file names based on Subject tag

\ls *【*】* | while read f; do
	s=`exiftool -T -Subject "$f"`;
	if [[ " $s" != " -" ]]; then         
		nf=`echo "$f" | sed -e "s/ 【/ • $s 【/; s/\:/∶/g;"`;
		mv "$f" "$nf";
	fi;
done

Copy Subject tag to Title-related tags

exiftool -overwrite_original '-ImageDescription< ${Subject}' '-XPTitle<${Subject}' '-Title<${Subject}' '-Description<${Subject}' '-Caption-Abstract<${Subject}' *jpg

Full rename: a consolidation of some of the previous commands

exiftool '-filename<${DateTimeOriginal} • ${Subject} 【${Model}】%.c.%e' -d %Y.%m.%d-%H.%M.%S *CR2 *JPG *HEIC *jpg *heic

Set photo “Creator” tag based on camera model

  1. First list all cameras that contributed photos to current directory:
    exiftool -T -Model *jpg | sort -u

    Output is the list of camera models on this photos:

    Canon EOS REBEL T5i
    DSC-H100
    iPhone 4
    iPhone 4S
    iPhone 5
    iPhone 6
    iPhone 6s Plus
  2. Now set creator on photo files based on what you know about camera owners:
    CRE="John Doe";    exiftool -overwrite_original -creator="$CRE" -by-line="$CRE" -Artist="$CRE" -if '$Model=~/DSC-H100/'            *.jpg
    CRE="Jane Black";  exiftool -overwrite_original -creator="$CRE" -by-line="$CRE" -Artist="$CRE" -if '$Model=~/Canon EOS REBEL T5i/' *.jpg
    CRE="Mary Doe";    exiftool -overwrite_original -creator="$CRE" -by-line="$CRE" -Artist="$CRE" -if '$Model=~/iPhone 5/'            *.jpg
    CRE="Peter Black"; exiftool -overwrite_original -creator="$CRE" -by-line="$CRE" -Artist="$CRE" -if '$Model=~/iPhone 4S/'           *.jpg
    CRE="Avi Alkalay"; exiftool -overwrite_original -creator="$CRE" -by-line="$CRE" -Artist="$CRE" -if '$Model=~/iPhone 6s Plus/'      *.jpg

Recursively search people in photos

If you geometrically mark people faces and their names in your photos using tools as Picasa, you can easily search for the photos which contain “Suzan” or “Marcelo” this way:

exiftool -fast -r -T -Directory -FileName -RegionName -if '$RegionName=~/Suzan|Marcelo/' .

-Directory, -FileName and -RegionName specify the things you want to see in the output. You can remove -RegionName for a cleaner output.
The -r is to search recursively. This is pretty powerful.

Make photos timezone-aware

Your camera will tag your photos only with local time on CreateDate or DateTimeOriginal tags. There is another set of tags called GPSDateStamp and GPSTimeStamp that must contain the UTC time the photos were taken, but your camera won’t help you here. Hopefully you can derive these values if you know the timezone the photos were taken. Here are two examples, one for photos taken in timezone -02:00 (Brazil daylight savings time) and on timezone +09:00 (Japan):

exiftool -overwrite_original '-gpsdatestamp<${CreateDate}-02:00' '-gpstimestamp<${CreateDate}-02:00' '-TimeZone<-02:00' '-TimeZoneCity<São Paulo' *.jpg
exiftool -overwrite_original '-gpsdatestamp<${CreateDate}+09:00' '-gpstimestamp<${CreateDate}+09:00' '-TimeZone<+09:00' '-TimeZoneCity<Tokio' Japan_Photos_folder

Use exiftool to check results on a modified photo:

exiftool -s -G -time:all -gps:all 2013.10.12-23.45.36-139.jpg
[EXIF]          CreateDate                      : 2013:10:12 23:45:36
[Composite]     GPSDateTime                     : 2013:10:13 01:45:36Z
[EXIF]          GPSDateStamp                    : 2013:10:13
[EXIF]          GPSTimeStamp                    : 01:45:36

This shows that the local time when the photo was taken was 2013:10:12 23:45:36. To use exiftool to set timezone to -02:00 actually means to find the correct UTC time, which can be seen on GPSDateTime as 2013:10:13 01:45:36Z. The difference between these two tags gives us the timezone. So we can read photo time as 2013:10:12 23:45:36-02:00.

Geotag photos based on time and Moves mobile app records

Moves is an amazing app for your smartphone that simply records for yourself (not social and not shared) everywhere you go and all places visited, 24h a day.

  1. Make sure all photos’ CreateDate or DateTimeOriginal tags are correct and precise, achieve this simply by setting correctly the camera clock before taking the pictures.
  2. Login and export your Moves history.
  3. Geotag the photos informing ExifTool the timezone they were taken, -08:00 (Las Vegas) in this example:
    exiftool -overwrite_original -api GeoMaxExtSecs=86400 -geotag ../moves_export/gpx/yearly/storyline/storyline_2015.gpx '-geotime<${CreateDate}-08:00' Folder_with_photos_from_trip_to_Las_Vegas

Some important notes:

  • It is important to put the entire ‘-geotime’ parameter inside simple apostrophe or simple quotation mark (), as I did in the example.
  • The ‘-geotime’ parameter is needed even if image files are timezone-aware (as per previous tutorial).
  • The ‘-api GeoMaxExtSecs=86400’ parameter should not be used unless the photo was taken more than 90 minutes of any detected movement by the GPS.

Concatenate all images together in one big image

  • In 1 column and 8 lines:
    montage -mode concatenate -tile 1x8 *jpg COMPOSED.JPG
  • In 8 columns and 1 line:
    montage -mode concatenate -tile 8x1 *jpg COMPOSED.JPG
  • In a 4×2 matrix:
    montage -mode concatenate -tile 4x2 *jpg COMPOSED.JPG

The montage command is part of the ImageMagick package.

Docker on Bluemix with services

Docker on Bluemix with automated full-stack deploys and delivery pipelines

Introduction

This document explains working examples on how to use Bluemix platform advanced features such as:

  • Docker on Bluemix, integrated with Bluemix APIs and middleware
  • Full stack automated and unattended deployments with DevOps Services Pipeline, including Docker
  • Full stack automated and unattended deployments with cf command line interface, including Docker

For this, I’ll use the following source code structure:

github.com/avibrazil/bluemix-docker-kickstart

The source code currently brings to life (as an example), integrated with some Bluemix services and Docker infrastructure, a PHP application (the WordPress popular blogging platform), but it could be any Python, Java, Ruby etc app.

This is how full stack app deployments should be

Before we start: understand Bluemix 3 pillars

I feel it is important to position what Bluemix really is and which of its parts we are going to use. Bluemix is composed of 3 different things:

  1. Bluemix is a hosting environment to run any type of web app or web service. This is the only function provided by the CloudFoundry Open Source project, which is an advanced PaaS that lets you provision and de-provision runtimes (Java, Python, Node etc), libraries and services to be used by your app. These operations can be triggered through the Bluemix.net portal or by the cf command from your laptop. IBM has extended this part of Bluemix with functions not currently available on CloudFoundry, notably the capability of executing regular VMs and Docker containers.
  2. Bluemix provides pre-installed libraries, APIs and middleware. IBM is constantly adding functions to the Bluemix marketplace, such as cognitive computing APIs in the Watson family, data processing middleware such as Spark and dashDB, or even IoT and Blockchain-related tools. These are high value components that can add a bit of magic to your app. Many of those are Open Source.
  3. DevOps Services. Accessible from hub.jazz.net, it provides:
    • Public and private collaborative Git repositories.
    • UI to build, manage and execute the app delivery pipeline, which does everything needed to transform your pure source code into a final running application.
    • The Track & Plan module, based on Rational Team Concert, to let your team mates and clients exchange activities and control project execution.

This tutorial will dive into #1 and some parts of #3, while using some services from #2.

The architecture of our app

Docker on Bluemix with services

When fully provisioned, the entire architecture will look like this. Several Bluemix services (MySQL, Object store) packaged into a CloudFoundry App (bridge app) that serves some Docker containers that in turns do the real work. Credentials to access those services will be automatically provided to the containers as environment variables (VCAP_SERVICES).

Structure of Source Code

The example source code repo contains boilerplate code that is intentionally generic and clean so you can easily fork, add and modify it to fit your needs. Here is what it contains:

bridge-app folder and manifest.yml file
The CloudFoundry manifest.yml that defines app name, dependencies and other characteristics to deploy the app contents under bridge-app.
containers
Each directory contains a Dockerfile and other files to create Docker containers. In this tutorial we’ll use only the phpinfo and wordpress directories, but there are some other useful examples you can use.
.bluemix folder
When this code repository is imported into Bluemix via the “Deploy to Bluemix” button, metadata in here will be used to set up your development environment under DevOps Services.
admin folder
Random shell scripts, specially used for deployments.

Watch the deployment

The easiest way to deploy the app is through DevOps Services:

  1. Click to deploy

    Deploy to Bluemix

  2. Provide a unique name to your copy of the app, also select the target Bluemix space
    Deploy to Bluemix screen
  3. Go to DevOps Services ➡ find your project clone ➡ select Build & Deploy tab and watch
    Full Delivery Pipeline on Bluemix

Under the hood: understand the app deployment in 2 strategies

Conceptually, these are the things you need to do to fully deploy an app with Docker on Bluemix:

  1. Instantiate external services needed by your app, such as databases, APIs etc.
  2. Create a CloudFoundry app to bind those services so you can handle them all as one block.
  3. Create the Docker images your app needs and register them on your Bluemix private Docker Registry (equivalent to the public Docker Hub).
  4. Instantiate your images in executable Docker containers, connecting them to your backend services through the CloudFoundry app.

The idea is to encapsulate all these steps in code so deployments can be done entirely unattended. Its what I call brainless 1-click deployment. There are 2 ways to do that:

  • A regular shell script that extensively uses the cf command. This is the admin/deploy script in our code.
  • An in-code delivery pipeline that can be executed by Bluemix DevOps Services. This is the .bluemix/pipeline.yml file.

From here, we will detail each of these steps both as commands (on the script) and as stages of the pipeline.

  1. Instantiation of external services needed by the app…

    I used the cf marketplace command to find the service names and plans available. ClearDB provides MySQL as a service. And just as an example, I’ll provision an additional Object Storage service. Note the similarities between both methods.

    Deployment Script
    cf create-service \
      cleardb \
      spark \
      bridge-app-database;
    
    cf create-service \
      Object-Storage \
      Free \
      bridge-app-object-store;
    Delivery Pipeline

    When you deploy your app to Bluemix, DevOps Services will read your manifest.yml and automatically provision whatever is under the declared-services block. In our case:

    declared-services:
      bridge-app-database:
        label: cleardb
        plan: spark
      bridge-app-object-store:
        label: Object-Storage
        plan: Free
    
  2. Creation of an empty CloudFoundry app to hold together these services

    The manifest.yml file has all the details about our CF app. Name, size, CF build pack to use, dependencies (as the ones instantiated in previous stage). So a plain cf push will use it and do the job. Since this app is just a bridge between our containers and the services, we’ll use minimum resources and the minimum noop-buildpack. After this stage you’ll be able to see the app running on your Bluemix console.

    Deployment Script
    Delivery Pipeline
    Stage named “➊ Deploy CF bridge app” simply calls cf push;
  3. Creation of Docker images

    The heavy lifting here is done by the Dockerfiles. We’ll use base CentOS images with official packages only in an attempt to use best practices. See phpinfo and wordpress Dockerfiles to understand how I improved a basic OS to become what I need.

    The cf ic command is basically a clone of the well known docker command, but pre-configured to use Bluemix Docker infrastructure. There is simple documentation to install the IBM Containers plugin to cf.

    Deployment Script
    cf ic build \
       -t phpinfo_image \
       containers/phpinfo/;
    
    cf ic build \
       -t wordpress_image \
       containers/wordpress/;
    
    
    Delivery Pipeline

    Stages handling this are “➋ Build phpinfo Container” and “➍ Build wordpress Container”.

    Open these stages and note how image names are set.

    After this stage, you can query your Bluemix private Docker Registry and see the images there. Like this:

    $ cf ic images
    REPOSITORY                                          TAG     IMAGE ID      CREATED     SIZE
    registry.ng.bluemix.net/avibrazil/phpinfo_image     latest  69d78b3ce0df  3 days ago  104.2 MB
    registry.ng.bluemix.net/avibrazil/wordpress_image   latest  a801735fae08  3 days ago  117.2 MB
    

    A Docker image is not yet a container. A Docker container is an image that is being executed.

  4. Run containers integrated with previously created bridge app

    To make our tutorial richer, we’ll run 2 sets of containers:

    1. The phpinfo one, just to see how Bluemix gives us an integrated environment
      Deployment Script
      cf ic run \
         -P \
         --env 'CCS_BIND_APP=bridge-app-name' \
         --name phpinfo_instance \
         registry.ng.bluemix.net/avibrazil/phpinfo_image;
      
      
      IP=`cf ic ip request | 
          grep "IP address" | 
          sed -e "s/.* \"\(.*\)\" .*/\1/"`;
      
      
      cf ic ip bind $IP phpinfo_instance;
      Delivery Pipeline

      Equivalent stage is “➌ Deploy phpinfo Container”.

      Open this stage and note how some environment variables are defined, specially the BIND_TO.

      Bluemix DevOps Services default scripts use these environment variables to correctly deploy the containers.

      The CCS_BIND_APP on the script and BIND_TO on the pipeline are key here. Their mission is to make the bridge-app’s VCAP_SERVICES available to this container as environment variables.

      In CloudFoundry, VCAP_SERVICES is an environment variable containing a JSON document with all credentials needed to actually access the app’s provisioned APIs, middleware and services, such as host names, users and passwords. See an example below.

    2. A container group with 2 highly available, monitored and balanced identical wordpress containers
      Deployment Script
      cf ic group create \
         -P \
         --env 'CCS_BIND_APP=bridge-app-name' \
         --auto \
         --desired 2 \
         --name wordpress_group_instance \
         registry.ng.bluemix.net/avibrazil/wordpress_image
      
      
      cf ic route map \
         --hostname some-name-wordpress \
         --domain $DOMAIN \
         wordpress_group_instance

      The cf ic group create creates a container group and runs them at once.

      The cf ic route map command configures Bluemix load balancer to capture traffic to http://some-name-wordpress.mybluemix.net and route it to the wordpress_group_instance container group.

      Delivery Pipeline

      Equivalent stage is “➎ Deploy wordpress Container Group”.

      Look in this stage’s Environment Properties how I’m configuring container group.

      I had to manually modify the standard deployment script, disabling deploycontainer and enabling deploygroup.

See the results

At this point, WordPress (the app that we deployed) is up and running inside a Docker container, and already using the ClearDB MySQL database provided by Bluemix. Access the URL of your wordpress container group and you will see this:

WordPress on Docker with Bluemix

Bluemix dashboard also shows the components running:

Bluemix dashboard with apps and containers

But the most interesting evidence you can see accessing the phpinfo container URL or IP. Scroll to the environment variables section to see all services credentials available as environment variables from VCAP_SERVICES:

Bluemix VCAP_SERVICES as seen by a Docker container

I use these credentials to configure WordPress while building the Dockerfile, so it can find its database when executing:

.
.
.
RUN yum -y install epel-release;\
	yum -y install wordpress patch;\
	yum clean all;\
	sed -i '\
		         s/.localhost./getenv("VCAP_SERVICES_CLEARDB_0_CREDENTIALS_HOSTNAME")/ ; \
		s/.database_name_here./getenv("VCAP_SERVICES_CLEARDB_0_CREDENTIALS_NAME")/     ; \
		     s/.username_here./getenv("VCAP_SERVICES_CLEARDB_0_CREDENTIALS_USERNAME")/ ; \
		     s/.password_here./getenv("VCAP_SERVICES_CLEARDB_0_CREDENTIALS_PASSWORD")/ ; \
	' /etc/wordpress/wp-config.php;\
	cd /etc/httpd/conf.d; patch < /tmp/wordpress.conf.patch;\
	rm /tmp/wordpress.conf.patch
.
.
.

So I’m using sed, the text-editor-as-a-command, to edit WordPress configuration file (/etc/wordpress/wp-config.php) and change some patterns there into appropriate getenv() calls to grab credentials provided by VCAP_SERVICES.

Dockerfile best practices

The containers folder in the source code presents one folder per image, each is an example of different Dockerfiles. We use only the wordpress and phpinfo ones here. But I’d like to highlight some best practices.

A Dockerfile is a script that defines how a container image should be built. A container image is very similar to a VM image, the difference is more related to the file formats that they are stored. VMs uses QCOW, VMDK etc while Docker uses layered filesystem images. From the application installation perspective, all the rest is almost the same. But only only Docker and its Dockerfile provides a super easy way to describe how to prepare an image focusing mostly only on your application. The only way to automate this process on the old Virtual Machine universe is through techniques such as Red Hat’s kickstart. This automated OS installation aspect of Dockerfiles might seem obscure or unimportant but is actually the core of what makes viable a modern DevOps culture.

  1. Being a build script, it starts from a base parent image, defined by the FROM command. We used a plain official CentOS image as a starting point. You must select very carefully your parent images, in the same way you select the Linux distribution for your company. You should consider who maintains the base image, it should be well maintained.
  2. Avoid creating images manually, as running a base container, issuing commands manually and then committing it. All logic to prepare the image should be scripted in your Dockerfile.
  3. In case complex file editing is required, capture edits in patches and use the patch command in your Dockerfile, as I did on wordpress Dockerfile.
    To create a patch:

    diff -Naur configfile.txt.org configfile.txt > configfile.patch

    Then see the wordpress Dockerfile to understand how to apply it.

  4. Always that possible, use official distribution packages instead of downloading libraries (.zip or .tar.gz) from the Internet. In the wordpress Dockerfile I enabled the official EPEL repository so I can install WordPress with YUM. Same happens on the Django and NGINX Dockerfiles. Also note how I don’t have to worry about installing PHP and MySQL client libraries – they get installed automatically when YUM installs wordpress package, because PHP and MySQL are dependencies.

When Docker on Bluemix is useful

CloudFoundry (the execution environment behind Bluemix) has its own Open Source container technology called Warden. And CloudFoundry’s Dockerfile-equivalent is called Buildpack. Just to illustrate, here is a WordPress buildpack for CloudFoundry and Bluemix.

To chose to go with Docker in some parts of your application means to give up some native integrations and facilities naturally and automatically provided by Bluemix. With Docker you’ll have to control and manage some more things for yourself. So go with Docker, instead of a buildpack, if:

  • If you need portability, you need to move your runtimes in and out Bluemix/CloudFoundry.
  • If a buildpack you need is less well maintained then the equivalent Linux distribution package. Or you need a reliable and supported source of pre-packaged software in a way just a major Linux distribution can provide.
  • If you are not ready to learn how to use and configure a complex buildpack, like the Python one, when you are already proficient on your favorite distribution’s Python packaging.
  • If you need Apache HTTPD advanced features as mod_rewrite, mod_autoindex or mod_dav.
  • If you simply need more control over your runtimes.

The best balance is to use Bluemix services/APIs/middleware and native buildpacks/runtimes whenever possible, and go with Docker on specific situations. Leveraging the integration that Docker on Bluemix provides.

US bank account for non-US citizens

I’ve searched for a long time and finally found a US regular bank that will let me open a free checking account. It is BBVA Compass bank.

All these services are free: ATM withdraw and deposit (BBVA’s and AllPoint ATMs), full featured Internet banking, full featured mobile banking, Visa debit card, Apple Pay and more. The non-free services are listed here and exact rates depend on the US state where the account was opened.

bbva-logo

To open a checking account, you must personally visit a physical branch in US and spend 40 minutes on an interview. You will leave the branch with an open account and routing numbers containing a $26 balance plus valid user and password that can be used on BBVA’s app and Internet banking. Free Visa debit card will arrive to some US address in a week or two, so no ATM until then.

They have 2 free checking account types. You should chose the one that includes free or charge AllPoint ATM usage which are very popular throughout US, and can be found in almost every 7 Eleven store. Use the AllPoint app to find one near you. Read More

WordPress on Fedora with RPM, DNF/YUM

WordPress is packaged for Fedora and can be installed as a regular RPM (with DNF/YUM). The benefits of this method are that you don’t need to mess around with configuration files, filesystem permissions and since everything is pre-packaged to work together, additional configurations are minimal. At the end of this 3 minutes tutorial, you’ll get a running WordPress under an SSL-enabled Apache using MariaDB as its backend.

All commands need to be executed as root. Read More

WordPress Community is in Pain

I don’t know about you senior bloggers but I’m starting to hate the way the WordPress community has evolved and what it became.

From a warm and advanced blogging software and ecosystem it is now an aberration for poor site makers. Themes are now mostly commercial, focused on institutional/marketing sites and not blogs anymore. WordPress is simply a very poor tool for this purpose. You can see this when several themes are getting much more complex than WordPress per se. Read More

OpenShift for Platform as a Service Clouds

OpenShift-LogoAt the Fedora 20 release party another guy stepped up and presented+demonstrated OpenShift, which was the most interesting new feature from Red Hat for me. First of all I had to switch my mindset about cloud from IaaS (infrastructure as a service, where the granularity are virtual machines) to PaaS. I heard the PaaS buzzword before but never took the time to understand what it really means and its implications. Well, I had to do that at that meeting so I can follow the presentation, of course hammering the presenter with questions all the time.
Read More

GMail as mail relay for your Linux home server

Since my Fedora Post-installation Configurations article, some things have changed in Fedora 20. For example, for security and economy reasons, Sendmail does not get installed anymore by default. Here are the steps to make your Linux home computer be able to send system e-mails as alerts or from things that run on cron. All commands should be run as user root. This is certified to work on Fedora 21.

Read More

Install OS X on a Mac computer from an ISO file

For some reason nobody published a simple guide like this. Maybe nobody tryied this way. I just tryied and it works with OS X Mountain Lion on a Mid 2012 MacBook Air.

If you have a Mac computer or laptop and want to install OS X, and all that you have is the operating system installation ISO image, you just need an external USB storage (disk or pen drive) of 5GB minimum size. Those regular 120GB or 1TB external disks will work too.

Just remember that all data on this external storage will be erased, even if the Mac OS X installation ISO is just 4.7GB. So make a backup of your files and after installtion you can re-format the external disk and recover the files on it.

To make the OS X installation ISO image file usable and bootable from the external storage, use the Mac OS terminal app or, on Linux, use the command line. This is the magic command:

dd if="OS X Install DVD.iso" of=/dev/disk1 bs=10m

You might want to change the red part of this command to the disk name that you get when inserted the external storage. Remember to not use things like disk1s1 or, on Linux, sdc1. The highlighted blue part on these examples are the partition name, and you don’t want that. You want to use the whole storage, otherwise it will not boot the computer.

After the command finishes execution, boot the Mac computer with the alt/option key pressed. Several devices will appear on screen for you to choose wich one to boot. Select the one with the USB logo and called “EFI Boot“.

Mac OS X installation app will boot and you can start the process. Remember that the default behavior here is to upgrade the installed system. If you want a clean install, select the Disk Utility app on the menu and make sure you erase and create a new partition on the Mac internal storage.

As a side technical note, this is all possible because ISO images — primarily designed for optical disks — can also be written to regular other storages as pen drives. And Apple has also put the right bits on these ISO images to allow it to boot from non-optical disks too.

iPhone Call History Database

Either if you are doing forensics or just want better reports about your call patterns, the iPhone Call History database can be very handfull.

If you have a jailbroken iPhone, you can access the database file directly. If you are not, you can still access it offline simply copying the file from an unencrypted iTunes backup to some other folder on you computer to manipulate it. Here are the real files path inside the iPhone and their counterparts on an iTunes backup folder:
Read More

Drupal is Gonna Change Your World

Forget expensive and proprietary MS Access. Forget about applications built on top of complex muiltitab spreadsheets. Drupal with Content Construction Kit, Views and Faceted Search are the right and way better solution for you.

Forget about building Flash-only web sites. Drupal and its modules is a better and semantically correct way for your Web 2.0 site.

Forget about PHP, ASP, JSP development from scratch. Drupal and its modules will put your site running faster with near zero programming.

This is a just a note for people building websites and general applications.

Organize fast and precisely your MP3 files with ID3v2 tags

This is a set of personal notes and a tutorial for everyone about how to correctly organize and tag MP3 files using the id3 command line tool.

General way to tag MP3 files:

id3 -M -2 [-v] [-t title] [-a artist] [-l album] [-n tracknr] [-y year] [-g genre] [-c comment] file.mp3

Recursively tag with ID3v2 a tree with many directories containing MP3 files, setting artist and genre:

id3 -v -2 -R -a "João Gilberto" -g "Bossa Nova" *mp3

Rewrite the Title tag of each file capitalizing the first letter of each word:

id3 -v -2 -t %+t *mp3

Rename files based on track number and song name (as “02 – Song Name.mp3”) padding a zero to track numbers smaller than 10:

id3 -v -2 -f "%#n - %t.mp3" *mp3

Add a suffix to the current Author tag:

id3 -2 -a "%a e Spokfrevo Orquestra" *mp3

Copy current Author tag to the Composer tag:

id3 -v -2 -wTCOM %a *mp3

Use the “Artist” (TPE1) and “Album Artist” (TPE2) tags in a different way to correctly group songs by album on your MP3 player:

id3 -2 -wTPE2 "Various Artists"  Café_Del_Mar_*/*mp3

or, alternatively with the id3v2 program:

id3v2 --TPE2 "Various Artists" Café_Del_Mar_*/*mp3

Scan track number (%n) and song name (%t) from each file name and set them as ID3 respectivelly along with additional artist name and album name:

id3 -2 -a "The Artist Name" -l "The Album Name" -g "The Genre Name" -m "%n - %t.mp3"

The id3 program is available for multiplatforms, including Linux and Windows. You can find RPM packages for Fedora Linux on my site.

A Media Center at Home

Since we got a 52″ Samsung LCD TV almost a year ago as a gift from relatives, I knew it was time to attach to it a dedicated computer and have a full digital media experience in the living room. I’ll tell you here my experiences building and running this thing that makes all my guests very impressed and desiring one.

Things you can do with a Media Center

  1. Play all your digital music (MP3, M4A, FLAC etc) as albums, custom play lists or randomly.
  2. Browse all your digital music semantically, by Genre or Artist or Song Name or Album. This is very practical and much faster than searching for a CD on your shelf.
  3. Tune hundreds of Internet radios that play all kinds of specific music as New Age, 80’s, 70’s, Classical, Flamenco, etc.
  4. Watch movies downloaded from the Internet in Full HD quality (1080p) or almost (720p) with or without subtitles. Who needs Blu-ray ?
  5. Play last trip photos as a slideshow in a 52″ TV. Who needs to develop photos in paper anymore? You can also play in the background music from your MP3 collection while watching the slideshow.
  6. Browse photos by trip, year and people that appear on them (if you tag them).
  7. Watch in a 52″ TV the clips from your last trip.
  8. Download a collection of 80’s music clips, invite your friends and make a very funny multimedia 80’s party.
  9. Watch YouTube videos in a 52″ TV.
  10. Browse Google Maps in 52″ TV.
  11. Control all the above using a nice handy $20 remote control.
  12. Let your iPhone/iPod browse, access and play all your music as it is loaded on your iPhone, through UPnP and PlugPlayer.

How to build a Media Center

Its easy and cheap to build a Media Center. In fact, the most expensive component is the TV, not the computer. You can do it with whatever operating system you like: Linux, Windows Vista or Mac. I wanted to do it with Linux because I am more fluent with this platform, but I had to use Vista because Linux audio drivers for my computer were not ready at that time. I’ll put bellow all the conceptual components in an modular way so you can understand what is important on each. But usually you will find them together in a single board very well integrated. In fact, unless you know what you are doing, I recommend using integrated components as motherboards that have a good GPU plus audio integrated in a single HDMI output connector.

The physical ingredients to build a Media Center are:

  1. An LCD TV. Looks like Plasma is an obsolete technology but I’m not the right person to ask about that. An LCD or Plasma TV is a plain big computer monitor, there is no big differences when compared to the computer monitor you are using right now to read this. Make sure the TV you buy has HDMI input connector, is Full HD (that is, its physical resolution goes up 1920×1080 (a.k.a. 1080p) or more) or at least is Full HD Ready (its maximum physical resolutions is less than 1920×1080 but can handle 1920×1080 signals with distortion), has a VGA input connector and a stereo audio input connector.
  2. A regular dedicated computer with at least a dual core CPU and 2GB RAM. This will be connected to the TV and forget about using it as a regular desktop. Intel or AMD will do here. If you will play only those low-quality, old, 700MB DivX/Xvid files, a generation before dual core (as AMD Turion) will do, but if you are going to enter the HD world with H.264 (a.k.a x264), MP4, MKV, you’ll need at least 2 cores. About the 2GB RAM, this is a guess and you may play well with a bit less too, but never tested. My system is a Quadcore AMD Phenom, 4GB RAM (because I use it for other purposes too) into a XFX 8200 HDMI-enabled motherborad (this board has unsolved issues with audio over HDMI and high power CPUs, thus I would recommend you look for another brand or model).
  3. A video card/chip that can go up to 1920×1080 resolution with DVI or HDMI output connector. People keep saying that you need NVidia and this is a lie, let me explain. NVidia or ATI GPUs (graphical processing units) have capabilities and hardware accelerators used by advanced 3D games, not by video players. So unless you are going to use this PC also as an advanced playing station, any GPU (a.k.a. graphic card/chip) will do the job, including those very popular Intel GPUs found on board in laptops. Just make sure to configure your BIOS and set video RAM to the maximum, otherwise you will have video delay problems playing Full HD (1080p) videos. If the video card only has VGA output, thats fine too but be aware that you’ll need extra cables for audio. Read next item to understand.
  4. An audio card that outputs 7, 8 or 13 channels of sound. Stereo (2 channels) is old school. Today’s any regular DVD has 5.1 (6 channels) surround audio (2 front, 2 rear, 1 center and 1 sub-woofer) and you want to take advantage of that. This is today very common and easy to find in stores, just make sure this component is integrated with the video component above and both use one single HDMI output connector.
  5. Remote Control. Your folks will call you a complete geek if they’ll see you browsing photos and music with a keyboard and mouse. Out of fashion. I bought a simple but effective infrared remote control that has a receiver that plugs into the USB for about $20. It has specific buttons for Pictures, Video, Music and works well with Vista Media Center.
  6. Lots of storage. If you are going to collect HD movies, rip DVDs, store photos and rip all your CDs, start with at least 1TB hard drive. Also make sure you have internal space in your computer to receive additional hard drives because you will run out of space sooner or latter. Another option is to have a motherboard with external SATA connectors (similar to USB connectors) and connect external SATA hard drives for increased speed and flexibility. An example of such an external SATA storage is Sagate’s FreeAgent XTreame.
  7. A silent power supply. Nobody thinks about that but I believe this is very important. Since this PC will stay in your living room or some place for multimedia contemplation, you don’t want to be disturbed by the computer’s fan noise while listening to your collection of zen Ambient music. Spend a few dollars more and make sure your power supply is quiet. I am a happy and zen user of a 450W Huntkey power supply.
  8. HDMI cable. This is the single cable you should use to connect the Media Center PC to your TV. This single cable should carry Full HD video and 13 channels audio, it should costs $20 and is a clean and modern solution.


Good network layout for a home Media Center

These are the aproximate brazilian prices I pay for the hardware parts

Description Part Number Price US$
Motherboard XFX 8200 GeForce MI-A78S-8209 $172.22
AMD Phenom Quadcore 9750 HD9750WCGHBOXSN $338.89
Seagate Barracuda 750GB 9BX156-303 $205.56
4GB RAM $133.33
HUNTKEY Power supply 14CM EPS12V LW-6450SG 450W $94.44
HDMI cable $16.67
Nice PC case $138.89
Gotec Remote Control 3801 for Media Center $26.61
Total $1,126.61

Home Networking

You may want to have Media Center(s) in several spots of your home playing media from a central network file server located somewhere else.

You should pay attention to not overload your home wireless network. I had bad experiences streaming HD media from one computer to another over WiFi. A single wall in between can dramatically decrease the kilobits per second the wireless signal can carry, to a level that is lower than your movie’s kilobits per second. The result are unwatchable movies while streaming. Big photos will also take longer to load to a point that will affect negatively your ambient slideshow.

To avoid that:

  1. Have your files physically connected to your Media Center. This can be a plain internal disk (this is my choice) or an external SATA or FireWire or USB attached disk. Remember that USB is much slower (even than FireWire) and file transfers (as copying lots of movies to/from a frined) will take longer time.
  2. Have a separate file server but connect it to your Media Center over a wired network.

Bad network layout for a home Media Center

Software Requirements

Your Media Center will have several simultaneous purposes. The most visible one is to feed your TV with content, but I also use it as a host to run several virtual machines, a web server, file server and to download things. I use mine 40% as a visible Media Center, 30% as a Media Server (to serve media to other computers) and 30% as a host for other purposes.

Forget about using your Media Center as a regular PC with keyboard and mouse. It is simply not practical and will prevent your wife and kids to use it because you are locking its TV. You can connect to and work with it remotely though, with SSH, VNC, Desktop Sharing, Remote Desktop or whatever technology your platform supports. And this can happen while your folks are watching a movie. I found this way of managing my Media Center very practical and productive.

  • Linux-based Media Center

    Linux would be my preferred platform for running a Media Center. It is highly configurable and gives its owner a lot of power. To feed your TV, use MythTV or XBMC. Just make sure that devices as remote control, audio and HDMI interface have drivers and will work on Linux. I had problems with that.

  • Mac OS-based Media Center

    If you are an Apple person, a Mac mini will do the job. It is compact, silent, has a strong enough processor and comes with a nice remote control. If Mac OS is your platform of choice, use FrontRow or XBMC. You will also need a codecs to play all types of media, so download the free Perian codec pack. I don’t know much people that use Mac OS as a Media Center, let me know if you do. You can also use an Apple machine to run Windows.

  • Windows Vista-based Media Center

    Windows Vista has a lot of improvements for managing media when compared to Windows XP. The native File Explorer support for MP3 and photo tagging is excelent, uses open standards as ID3v2 (MP3) and EXIF and IPTC (JPEG photo) and Vista Media Center has partial support for browsing you media collection through these tags (album, artist, genre, date picture was taken, IPTC tags etc). Strangelly, Vista Media Center does not support browsing by multiple genres and multiple artists so an album simultaneously tagged with genres “Samba” and “MPB” will appear only when you list by “Samba”, not by “MPB”.

    Microsoft locks their desktop operating systems in a way that multiple users can’t use it simultaneously, even if there are multiple users created on the OS. This can be fixed installing a small terminal services-related patch. There is also a post-SP1 version of the hack.

    So the modus operandi is to create one user called Media that will automatically login and run the Media Center program at boot, and another one for me to login remotely with Remote Desktop and run stuff simultaneously. The Media user has to be administrator and codec packs and plugin must be installed by him.

    To play advanced and HD audio and video, H.264, MKV, MP4, DivX/Xvid, FLAC etc, you will also need a codec pack for Windows. I recommend the K-Lite Codec Pack and I use its Mega edition. Having that, Vista Media Center will play any type of media.

    I must tell that Windows alone can’t satisfy all my media management needs. Thats why I run a Linux as a virtual machine on the Media Center to make massive manipulations of MP3, photos, video compression, etc.

Still on Vista Media Center, I use several useful plugins:

  • Media Control. Improves usability of the remote control and lets you set subtitle and audio languages, enables fast forwarding etc while playing video.
  • Google Maps for Windows Media Center. Turns my 52″ TV into an interactive map that I can control with my remote control. I don’t know how life was before this.
  • Yougle. Lets you access Internet media from Vista Media Center. In other words, lets you browse and watch YouTube videos, Flickr photos, Internet radios etc.

Happy entertainment !