Handling Inbox spam

Handling Inbox spam

I’m a serial unsubscriber — absolutely ruthless when it comes to keeping my inbox in order. If I get a new ad or newsletter on my inbox I immediately scroll to the end of it to click on the tiny “unsubscribe” link. I admit I have great pleasure doing this without even seeing the ad.

Read More
My time on the IBM Linux Impact Team, and legacy

My time on the IBM Linux Impact Team, and legacy

In this extensive article, Jon “MadDog” delves into the behind-the-scenes narrative of how Linux and Open Source gained acceptance within the corporate sphere, eventually establishing itself as the dominant platform in today’s enterprise information technology. It has become the operating system powering contemporary cloud infrastructure and, most notably, has transformed into the primary methodology for driving software innovation.

Read More

Story of the first digital computers

The story of the first digital computers, since a phenomenon observed in Edison’s light bulb, through 2 bit logic operations, through triodes and vacuum tubes, up to the ENIAC, capable of doing astonishing 500 math operations per second, and running without failure for a maximum of 116 hours.

As a matter of comparison, your smartphone can do almost one trillion math operations per second with just a tiny fraction of the required energy. Your smartphone is 2 billion times faster than the first, commercial large and expensive digital computers of the 1940’s.

Veritasium nailed it again.

Also on my LinkedIn.

LinkedIn Inferences About You

Export all your LinkedIn data (on computer, select Me ➔ Settings & Privacy ➔ Data Privacy ➔ Get a copy of your data ➔ Larger data archive) and then check the Inferences_about_you.csv file.

As the file name says, it is how LinkedIn AI models see you. Do you have career stability? Are you in the early stages of your career? Are you a people or senior leader? Business owner?

These classifications are certainly used by recruiters to search for people. And you should use it to check if there are things you must change in your profile.

UPDATE: LinkedIn apparently isn’t providing this information anymore. It was being provided until a few days before my post.

Also on my LinkedIn.

Data Scientists should develop their software engineering skills

Yes, Data Scientists should develop their software engineering skills. Let me react to a LinkedIn post by Neil Leiser.

But Data Scientists can’t do it alone, or by themselves. Read on.

I see that software engineering, IT architecture is a touchy subject amongst even the best data scientists, usually because they came from other knowledge domains as economy, statistics, pure math, physics, biology etc. This is a normal evolution. Data Science demands a wide broad skill set, sometimes too wide and too broad. Data Scientists need to handle Docker and HTTP APIs along with outliers, RMSE, ROC curves and Gaussian distributions. Go figure…

ML engineers — usually folks that have more software engineering background — should help here.

But the most important thing ➔ it is the mission of the CDO, tech lead or CTO with strategic vision to clearly detect these gaps and design a roadmap to handle them, not just with conventional training but also encouraging mixed squads whose members will exchange skills and knowledge, leveraging multi-disciplinar environments where everybody grows together.

Related posts:

Also on my LinkedIn.

GPT me

This is what GPT “knows” about me. More precisely, this is the sequence of words GPT generates when asked with that specific prompt.

First paragraph is 100% correct.

Second is kind of 50% (in)correct and outdated. I do Fedora, not Debian nor Ubuntu, I’ve contributed to several FOSS projects, but never to Apache HTTPD, and I did work for IBM, but never to Red Hat.

Third paragraph he completely confused me with one of my relatives that have same last name but different first name.

Also, I think GPT would have a different perspective about me if blog posts in social media, such as Facebook, would be part of its training dataset. But it can’t because Meta won’t allow open access to their platform even if I post openly there.

Also on my LinkedIn.

Clouds are super expensive

While clouds are the natural go-to choice for an early-stage startup, staying 100% in clouds with substantial infrastructure may sink a company as it and its infrastructure grow.

This study shows that the monthly infrastructure cost of clouds would be more than 10 times higher than a collocation with self-designed infrastructure. Not to mention the taylor-made possibilities.

Your CTOs and tech leaders must provide clever ways to use public clouds, avoiding their typical lock-ins, so you can leave [and reduce vast amounts of infrastructure costs] whenever you may need.

Benefits of public clouds are flexibility and agility, not costs.

Also in my LinkedIn.

7 Habits of Highly Effective People by Stephen Covey, summary by getAbstract

I read the summary of this book in getAbstract. There is also an audio version of the summary on their page. Here is a my personal copy.

Cover of book 7 Habits of Highly Effective People by Stephen Covey


In this updated edition of the late Stephen R. Covey’s bestseller, Sean Covey draws on ancient wisdom, modern psychology and 20th century science and wraps the mix in a distinctively American can-do program of easy-looking steps calling mostly for self-discipline. This classic – now in a new 30th anniversary edition with a foreword by Jim Collins – is a popular, trusted manual for self-improvement, although you still may find some prescriptions easier to agree with than to act upon.

Read More

iPadOS external display support

With the release of iPadOS 16.2 last December, M1-powered devices can now be used as more beefed up terminals, complete with external physical keyboard, mouse/trackpad and extended screen that can display content and apps different from the main iPad screen (as shows the photo).

iPadOS 16.2 external display, keyboard and mouse

Minimum device that supports this is the iPad Air 5th generation (2022) which already features an USB-C port instead of lightning. Then, on this port, you can plug a dongle with HDMI output, power source and more USB ports to connect your human interaction devices. Or connect them through Bluetooth.

This opens the possibility for road warriors to have an even lighter and inexpensive terminal with the iPad, instead of a regular (and problematic) laptop. Then, when at home or office, they can dock it to KVM (keyboard, video, mouse) to experience a more productive workstation.

And yes, I know Android phones can do similar things since long ago. But it doesn’t get widespread or even real until this feature lands on the popular iPad.

Also on my LinkedIn

Command Line in Windows

Command Line in Windows

Command line on Windows (10+) nowadays doesn’t have to be only PuTTY to a remote Linux machine. In fact many Linux concepts were incorporated on Windows.

Windows Subsystem for Linux

First, activate WSL. Since I enjoy using Fedora, and not Ubuntu, this guide by Jonathan Bowman has helped me to set WSL exactly as I like.

Windows native SSH clients

Yes, it has tools from OpenSSH, such as the plain ssh client, ssh-agent and others. No need for PutTTY.

This guide by Chris Hastie explains how to activate SSH Agent with your private key. I’m not sure it is fairly complete, since I didn’t test yet if it adds your key in session startup for a complete password-less experience. I’m still trying.

Windows Terminal

The old command prompt is very limited, as we know, and obsolete. Luckily, Microsoft has released a new, much improved, Terminal application that can be installed from the Store.

Command Line in Windows

It allows defining sessions with custom commands as ‘wsl‘ (to get into the Fedora WSL container installed above), ‘cmd‘, ‘ssh‘. I use tmux in all Linux computers that I connect, so my default access command is:

ssh -l USERNAME -A -t HOSTNAME "tmux new-session -s default -n default -P -A -D"

Windows Terminal app is highly customizable, with colors and icons. And this repo by Mark Badolato contains a great number of terminal color schemes. Select a few from the windowsterminal folder and paste their JSON snippet into the file %HOME%\AppData\Local\Packages\Microsoft.WindowsTerminal_8wekyb3d8bbwe\LocalState\settings.json.

Data Scientist × Data Analyst

Analysts inform, explain and visualize DATA THAT EXISTS in order to help business executives make strategic decisions. Thus, data analysts live in business meetings, talk to a lot of people and create data visualizations to help others understand what is going on. Tools: SQL, BI, spreadsheets, PowerPoint.

Scientists infer and calculate INFORMATION THAT STILL DOESN’T EXIST, such as the future, usually in order to optimize each and every business transaction. Example: if you like this one product, you might also like that other product. Example: according to data from surroundings, this house price should be around $X. Example: I learned how cars look like, so there is 98% chance there is a car in this photo. Thus, they create or improve digital products using machine learning and applied statistics. To create such improved user experiences, first data scientists use advanced exploratory data analysis techniques, create data visualization only for themselves, only for their better comprehension of what is going on. Tools: SQL, Pandas, math and statistics, git, programing, containers, Linux.

Data analysts tend to have a more glamorous job, while data scientists job is more hard skills oriented. Both need to work with large amounts of information, such as tables with millions or billions of data points.

There is also the Data Engineer role, which is as important as these other data professions, and focused on data availability, consistency and performance.

Inspired by Gerson Lerner’s post, I thought I should give my take on the subject too.


5G Download Speed

5G download speed at home in São Paulo today. 420 megabits per second (mbps), equivalent to 52 megabytes per second.

It means that it takes about 10 seconds to download 1 hour of hi-fi music without any compression. But since compression is everywhere, just 2 seconds will be enough.

Upload speed gives me 10 mbps. Pretty good, though we know this is probably not for long.

What 4G, 5G speeds do you get and where?

Also in my LinkedIn

State of the Windows Laptop Market

The Windows-based laptop market is a bad joke of confusing, overlapping offerings. It operates almost like a scam to underskilled consumers because manufacturers try hard to increase their profit around a purely commodity product. The results are “creative” but quite useless features as detachable keyboards, pens and tablet PCs. If you have one of those, think about the rare situations you actually used them in a comfortable way.

For a general use laptop, a $1000 MacBook Air has all the features you need, in order of importance: great high density screen (a.k.a. Retina display, most important feature, always), light and small and elegant, fast internal storage, outstanding global customer service, enough RAM (8GB minimum, 16GB recommended), modern connectivity with USB-C. Oh, and a good CPU too.

Don’t go for less than that and be aware that a similar feature set in the Windows universe will have same price, if not more. But it will be hidden under a pile of confusing, overlapping and oversized configurations.

This post was written for your private life laptop consumer self, to help you buy your next good laptop. Not for your corporate self.



Passwordless Sign-in

Get ready to say goodbye to password managers or even all your passwords. Thanks to FIDO, the industry is shifting to open standards password-less authentication everywhere.

Who’s been using macOS, iOS credential management, integration and synchronization already have an idea about how it works across devices, apps and websites. But now the experience will be improved, extended and made even easier.


Also in my LinkedIn

Power solution to rule them all

The one single power and connectivity kit needed in your laptop backpack.

① One +65W USB-C power charger
② One USB-C 2m/6ft cable with Power Delivery
③ One USB-C kit of adapters to old USB and Micro USB
④ One USB-C adapter to Apple Lightning

This kit: Powers your modern laptop through USB-C. Charges your phone through Lightning or USB-C. Charges eventual other devices on their old USB ports. Connects all devices to one another.

Portable batteries are obsolete. Instead, use your large and powerful laptop battery to charge your phone on the road.

From my LinkedIn

Caution with Streamlit

Streamlit (streamlit.io) is a lovely Python module that helps data scientists build interactive dataviz apps.

Use it when a BI is overkill — as this Streamlit dashboard that I wrote to manage my personal investments —, or where there is no BI, such as very small companies. Or where there is no interactive app developers to create a native app.

Personal finance app written with Streamlit

Streamlit proliferation in mid to large size companies might however be a bad sign of several things:

1️⃣ Application and/or integration developer’s job wrongly assigned to Data Scientists
2️⃣ Lack of a solid BI platform and practice
3️⃣ Siloed data that isn’t flowing due to lack of data streaming or API architecture
4️⃣ All the above.

Use Streamlit with caution; we don’t want it to become the new, data science-era spreadsheet for corporate reporting, with all the burden that spreadsheet proliferation have caused.

Best Data Scientist’s time is spent getting insights from Exploratory Data Analysis, and then using it to model outstanding estimators and predictors. Definitively not writing nice looking apps.

Also in my LinkedIn

Good luck to Kyndryl

To all friends that I’ve worked with at IBM and that are now moving to Kyndryl, I wish you success and good luck. The Cloud and IT services opportunity will continue to be huge forever. The countdown you have promoted here was warm and vibrant.

For the still-on-IBM friends, please keep on doing such a great company that always was and continues to be a brilliant reference to the world, not just IT. IBM is an unforgettable school for me and for anybody else that has spent even just a minute working there.

Business worldwide, as we know it, is shaped by companies such as IBM, even if you’ve never heard about it (well, that’s quite impossible).

Also on LinkedIn.

How programmers should record time

We the data people immediately identify a poorly designed system when we see it handling date and time as plain local time, instead of the number of seconds since January 1st 1970 of time zone 0.

  • This post was published on 1,626,425,523 (UTC, always UTC).
  • Jesus was born -62,399,513,432.
  • Man visited the moon between -14,552,880 and 93,172,200.
  • And so on…

Just your daily dose of nerdy facts…

Also on my LinkedIn

What means to be Driven By Data

I’ve seen companies saying they have Big Data because they implemented Hadoop or a data lake and maybe Spark.

That’s just wrong.

Big Data, or more precisely, to be Data Driven, is a state where the data a company produces can be reused, as soon as possible, to optimize itself. And there are many ways to reuse data: all meetings and decisions happen with abundance of data, or recently generated data instantly feeds machine learning algorithms to optimize transactions, just to name a few situations.

To be Driven by Data is part culture and part infrastructure. On the infrastructure side, IT teams still struggle with limited visions about how data should flow pervasively and how access should be granted. They fear about security and performance while they should fear of missing out the data opportunity.

Data Streaming is a breakthrough recent technology that is here to help with more fluent data access. For an agile and effective data architecture, Data Streaming is much more strategic and important than just a bigger data warehouse because it is the component that can unleash your data and finally make it useful.

On my LinkedIn

What is Apache Spark

Apache Spark is like Python’s Pandas and is like SQL databases. It can manipulate datasets, filter, integrate, transform.

But Spark was designed from scratch with horizontal scalability and parallelism in mind, which makes it capable of handling datasets with billions or even unknown number of rows — even if a bit less flexible than Pandas.

This is not new in the industry. Enterprise editions of commercial SQL databases are parallel and scalable since a very long time, being also very expensive in all levels of the stack: service/support, software and hardware.

But Spark is free software. And can use Hadoop — also a free software — as scalable and highly available storage, on cheap commodity hardware. In addition, it has a vibrant community and a democratic ecosystem of services and support.

As with all Open Source, Apache Spark changes the economic landscape of massive data processing systems market, taking money out of a few proprietary HW and SW vendors and pulverizing it locally on people and support.

From my LinkedIn

List of Hard Skills for Data Professionals

2020 list of desired hard skills for data professionals. From the most essential to the more difficult ones.

  1. The English language
  2. SQL
  3. Spreadsheets
  4. Descriptive Statistics (median, variance, correlation etc)
  5. Notions of Data visualization
  6. Notions of Time Series
  7. Handling computer files and folders (this one entered the list because we observed many people simply don’t have it)
  8. Notions of digital information storage (numbers and their limits, time, time zones, text, Unicode, compression)
  9. Probability
  10. Probability Distributions
  11. Linear and Logistic Regressions
  12. Python libraries ecosystem, pip, PyPi
  13. Python’s Pandas, DataFrame and Series wrangling
  14. Linux and the computer command line
  15. NoSQL, JSON, YAML, XML, SVG, APIs, HTTP, protocols and data representation
  16. Cloud and infrastructure as code
  17. Notions of symmetric and asymmetric cryptography, digital signatures and applications
  18. “Big data” systems (Hadoop, Spark)
  19. Software Engineering (classes, modularisation, versioning, containerisation, packaging, DevOps)
  20. Inferential Statistics (confidence intervals, hypothesis testing)
  21. Machine Learning algorithms for regression and classification
  22. Calculus and Numerical Calculus (integrals, derivaties)
  23. Natural Language Processing
  24. Computer vision
  25. Neural Networks

Please remember this list has only hard skills. Ethics, domain and industry knowledge, communication are very important soft skills that won’t fit in this list.

Generally speaking, beginning of the list is where Data Analysts are (up to ≈11). Data Engineers get up to the middle of list (up to ≈18). And Scientists get all the list.

There is also the following graph that I’ve produced:

data professions competencies

Jupyter and Data Science on a Mac (without Anaconda)

macOS Catalina doesn’t ship with Python 3, only 2. But you can still get 3 from Apple, updated regularly through system’s official update methods. You don’t need to get the awful Anaconda on you Mac to play with Python.

Python 3 is shipped by Xcode Command Line Tools. To get it installed (without the heavy Xcode GUI), type this in your terminal:

xcode-select --install

This way, every time Apple releases an update, you’ll get it.

Settings window will pop so wait 5 minutes for the installation to finish.

If you already have complete Xcode installed, this step was unnecessary (you already had Python 3 installed) and you can continue to the next section of the tutorial.

Clean Old Python Modules

In case you already have Python installed under your user and modules downloaded with pip, remove it:

rm -rf ${HOME}/Caches/com.apple.python/${HOME}/Library/Python \
${HOME}/Library/Python/ \

Install Python Modules

Now that you get a useful Python 3 installation, use pip3 to install Python modules that you’ll need. Don’t forget to use –user to get things installed on your home folder so you won’t pollute your overall system. For my personal use, I need the complete machine learning, data wrangling and Jupyter suite:

pip3 install --user sqlalchemy
pip3 install --user matplotlib
pip3 install --user pandas
pip3 install --user jupyterlab
pip3 install --user PyMySQL
pip3 install --user configobj
pip3 install --user requests
pip3 install --user seaborn
pip3 install --user bs4
pip3 install --user xgboost
pip3 install --user scikit_learn

But you might need other things as Django or other sqlalchemy drivers. Set yourself at home and install them with pip3.

For modules that require compilation and special library, say crypto, do it like this:

CFLAGS="-I/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/include" \
LDFLAGS="-L/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib" \
pip3 install --user pycrypto

Use Correct Python 3 Binary

For some reason, Apple installs many different Python 3 binaries in different places of the system. The one that gets installed on /usr/bin/python3 has problems loading some libraries and instrumentation with install_name_tool would be required. So lets just use the binary that works better:

export PATH=/Library/Developer/CommandLineTools/usr/bin:$PATH

Run Jupyter Lab on your Mac

Commands installed by pip3 will be available in the ~/Library/Python/3.7/bin/ folder, so just add it to your PATH:

export PATH=$PATH:~/Library/Python/3.7/bin/

Now I can simply type jupyter-lab anywhere in the terminal or command line to make it fire my browser and get a Jupyter environment.

More about Xcode Command Line Tools

Xcode Command Line Tools will get you a full hand of other useful developer tools, such as git, subversion, GCC and LLVM compilers and linkers, make, m4 and a complete Python 3 distribution. You can see most of its installation on /Library/Developer/CommandLineTools folder.

For production and high end processing I’ll still use Python on Linux with my preferred distribution’s default packages (no Anaconda). But this method of getting Python on macOS is fastest and cleanest to get you going on your own data scientist laptop without a VM nor a container.