LinkedIn Inferences About You

Export all your LinkedIn data (on computer, select Me ➔ Settings & Privacy ➔ Data Privacy ➔ Get a copy of your data ➔ Larger data archive) and then check the Inferences_about_you.csv file.

As the file name says, it is how LinkedIn AI models see you. Do you have career stability? Are you in the early stages of your career? Are you a people or senior leader? Business owner?

These classifications are certainly used by recruiters to search for people. And you should use it to check if there are things you must change in your profile.

UPDATE: LinkedIn apparently isn’t providing this information anymore. It was being provided until a few days before my post.

Also on my LinkedIn.

Data Scientists should develop their software engineering skills

Yes, Data Scientists should develop their software engineering skills. Let me react to a LinkedIn post by Neil Leiser.

But Data Scientists can’t do it alone, or by themselves. Read on.

I see that software engineering, IT architecture is a touchy subject amongst even the best data scientists, usually because they came from other knowledge domains as economy, statistics, pure math, physics, biology etc. This is a normal evolution. Data Science demands a wide broad skill set, sometimes too wide and too broad. Data Scientists need to handle Docker and HTTP APIs along with outliers, RMSE, ROC curves and Gaussian distributions. Go figure…

ML engineers — usually folks that have more software engineering background — should help here.

But the most important thing ➔ it is the mission of the CDO, tech lead or CTO with strategic vision to clearly detect these gaps and design a roadmap to handle them, not just with conventional training but also encouraging mixed squads whose members will exchange skills and knowledge, leveraging multi-disciplinar environments where everybody grows together.

Related posts:

Also on my LinkedIn.

Melhorias para o Pix do BaCen

Melhorias para o Pix do BaCen

O Banco Central do Brasil acertou em cheio com o Pix, inovação bancária digna de ser copiada por qualquer BC do mundo. Mas ainda acho o Pix bem burocrático de ser usado. Vejo que ele é um sucesso porque era algo muitíssimo desejado, não por ter boa usabilidade nem por promover boas práticas. Minha veia de designer de aplicações não pode deixar de sugerir algumas melhorias que poderiam ser feitas numa próxima revisão, especialmente em relação a usabilidade.

Read More

GPT me

This is what GPT “knows” about me. More precisely, this is the sequence of words GPT generates when asked with that specific prompt.

First paragraph is 100% correct.

Second is kind of 50% (in)correct and outdated. I do Fedora, not Debian nor Ubuntu, I’ve contributed to several FOSS projects, but never to Apache HTTPD, and I did work for IBM, but never to Red Hat.

Third paragraph he completely confused me with one of my relatives that have same last name but different first name.

Also, I think GPT would have a different perspective about me if blog posts in social media, such as Facebook, would be part of its training dataset. But it can’t because Meta won’t allow open access to their platform even if I post openly there.

Also on my LinkedIn.

Clouds are super expensive

While clouds are the natural go-to choice for an early-stage startup, staying 100% in clouds with substantial infrastructure may sink a company as it and its infrastructure grow.

This study shows that the monthly infrastructure cost of clouds would be more than 10 times higher than a collocation with self-designed infrastructure. Not to mention the taylor-made possibilities.

Your CTOs and tech leaders must provide clever ways to use public clouds, avoiding their typical lock-ins, so you can leave [and reduce vast amounts of infrastructure costs] whenever you may need.

Benefits of public clouds are flexibility and agility, not costs.

Also in my LinkedIn.

iPadOS external display support

With the release of iPadOS 16.2 last December, M1-powered devices can now be used as more beefed up terminals, complete with external physical keyboard, mouse/trackpad and extended screen that can display content and apps different from the main iPad screen (as shows the photo).

iPadOS 16.2 external display, keyboard and mouse

Minimum device that supports this is the iPad Air 5th generation (2022) which already features an USB-C port instead of lightning. Then, on this port, you can plug a dongle with HDMI output, power source and more USB ports to connect your human interaction devices. Or connect them through Bluetooth.

This opens the possibility for road warriors to have an even lighter and inexpensive terminal with the iPad, instead of a regular (and problematic) laptop. Then, when at home or office, they can dock it to KVM (keyboard, video, mouse) to experience a more productive workstation.

And yes, I know Android phones can do similar things since long ago. But it doesn’t get widespread or even real until this feature lands on the popular iPad.

Also on my LinkedIn

Command Line in Windows

Command Line in Windows

Command line on Windows (10+) nowadays doesn’t have to be only PuTTY to a remote Linux machine. In fact many Linux concepts were incorporated on Windows.

Windows Subsystem for Linux

First, activate WSL. Since I enjoy using Fedora, and not Ubuntu, this guide by Jonathan Bowman has helped me to set WSL exactly as I like. The guide points to some old Fedora images, so pay attention to its links to get a newer one. Then, the guide also explains how to initialize the Fedora image, customize it as default, configure your user etc.

Windows native SSH clients

Yes, it has tools from OpenSSH, such as the plain ssh client, ssh-agent and others. No need for PuTTY.

This guide by Chris Hastie explains how to activate SSH Agent with your private key. I’m not sure it is fairly complete, since I didn’t test yet if it adds your key in session startup for a complete password-less experience. I’m still trying.

Basically, you need to activate a Windows service and have your private key in $HOME\.ssh\id_rsa, exactly like under Linux.

Windows Terminal

The old command prompt is very limited, as we know, and obsolete. Luckily, Microsoft has released a new, much improved, Terminal application that can be installed from the Store. On Windows 11, the Terminal app is already there for you.

Command Line in Windows

It allows defining sessions with custom commands as wsl (to get into the Fedora WSL container installed above), cmd, ssh. I use tmux in all Linux computers that I connect, so my default access command is:

ssh -l USERNAME -A -t HOSTNAME "tmux new-session -s default -n default -P -A -D"

Windows Terminal app is highly customizable, with colors and icons. And this repo by Mark Badolato contains a great number of terminal color schemes. Select a few from the windowsterminal folder and paste their JSON snippet into the file %HOME%\AppData\Local\Packages\Microsoft.WindowsTerminal_8wekyb3d8bbwe\LocalState\settings.json.

Data Scientist × Data Analyst

Data Scientist × Data Analyst

Analysts inform, explain and visualize DATA THAT EXISTS in order to help business executives make strategic decisions. Thus, data analysts live in business meetings, talk to a lot of people and create data visualizations to help others understand what is going on. Tools: SQL, BI, spreadsheets, PowerPoint.

Scientists infer and calculate INFORMATION THAT STILL DOESN’T EXIST, such as the future, usually in order to optimize each and every business transaction. Example: if you like this one product, you might also like that other product. Example: according to data from surroundings, this house price should be around $X. Example: I learned how cars look like, so there is 98% chance there is a car in this photo. Thus, they create or improve digital products using machine learning and applied statistics. To create such improved user experiences, first data scientists use advanced exploratory data analysis techniques, create data visualization only for themselves, only for their better comprehension of what is going on. Tools: SQL, Pandas, math and statistics, git, programing, containers, Linux.

Data analysts tend to have a more glamorous job, while data scientists job is more hard skills oriented. Both need to work with large amounts of information, such as tables with millions or billions of data points.

There is also the Data Engineer role, which is as important as these other data professions, and focused on data availability, consistency and performance.

Inspired by Gerson Lerner’s post, I thought I should give my take on the subject too.

https://www.linkedin.com/posts/avibrazil_datascientist-dataanalyst-data-activity-7006727421861224448-9bt0

Ad Guard bloqueador de propaganda

Em todos os meus navegadores, celular e laptop, eu uso o Ad Guard para me livrar de propagandas e rastreadores indesejados. Ele também elimina popups e paywalls de vários sites. Já usei diversos outros ad blockers e este achei o melhor. A internet fica muito mais leve com ele. Ninguém me pagou para falar nada disso; recomendo porque acho muito eficiente e essencial.

https://m.facebook.com/story.php?story_fbid=pfbid0ELc1aK9XytmvL4xpU8KkhFnNYoTyK5RCNnDKvJBrxCDfGeYsSmyWyjgdKvR8142Tl&id=543888243

Filtre propagandas do Uber

Na app do Uber, vá em 👤Conta ➔ ⚙️Ajustes ➔ 🟡Privacidade ➔ 🔴Notificações e deixe só o último ligado.

Assim você pára de receber as propagandas irritantes deles, só recebe as notificações de suas corridas.

Repare que a função mais importante para o usuário é a última da lista. E as 3 primeiras são só as bobagens dispensáveis deles. Depois é “usuário em primeiro lugar”. Sei.

Biometria facial nos aeroportos

Por que o Serpro e os Aeroportos do Brasil adotaram biometria facial ao invés de leitor de dedo?

Porque o leitor de dedo, como sabemos, é anti-higiênico, dissemina doenças e fluidos indesejados. E porque o seu rosto é a informação mais pública que você possui. É como se houvesse alguém com memória facial infinita reconhecendo pessoas na fila, só de olhar para elas.

Além do mais:

[…] os viajantes poderão optar entre o sistema [biométrico] e os procedimentos tradicionais de check-in e embarque, que continuam disponíveis.

Não sei vocês, mas eu jamais registro ou uso meus dedos em catraca de prédio comercial. Da perspectiva de segurança da informação aquilo é um engodo, além de ser bem nojento. A única excessão que faço é em governos e para documentos, por ser situação não-banal e que se faz uma única vez.

https://www1.folha.uol.com.br/mercado/2022/08/como-funciona-a-ponte-aerea-com-embarque-biometrico-entre-rio-e-sao-paulo.shtml

Publicado também no Facebook e no LinkedIn.

5G Download Speed

5G download speed at home in São Paulo today. 420 megabits per second (mbps), equivalent to 52 megabytes per second.

It means that it takes about 10 seconds to download 1 hour of hi-fi music without any compression. But since compression is everywhere, just 2 seconds will be enough.

Upload speed gives me 10 mbps. Pretty good, though we know this is probably not for long.

What 4G, 5G speeds do you get and where?

Also in my LinkedIn

State of the Windows Laptop Market

The Windows-based laptop market is a bad joke of confusing, overlapping offerings. It operates almost like a scam to underskilled consumers because manufacturers try hard to increase their profit around a purely commodity product. The results are “creative” but quite useless features as detachable keyboards, pens and tablet PCs. If you have one of those, think about the rare situations you actually used them in a comfortable way.

For a general use laptop, a $1000 MacBook Air has all the features you need, in order of importance: great high density screen (a.k.a. Retina display, most important feature, always), light and small and elegant, fast internal storage, outstanding global customer service, enough RAM (8GB minimum, 16GB recommended), modern connectivity with USB-C. Oh, and a good CPU too.

Don’t go for less than that and be aware that a similar feature set in the Windows universe will have same price, if not more. But it will be hidden under a pile of confusing, overlapping and oversized configurations.

This post was written for your private life laptop consumer self, to help you buy your next good laptop. Not for your corporate self.

https://www.macrumors.com/2022/07/05/windows-laptop-makers-worried-by-macbook-air/

https://www.linkedin.com/posts/avibrazil_windows-laptop-makers-worried-about-new-activity-6950380428016988160-6KdS?utm_source=share&utm_medium=member_ios

Passwordless Sign-in

Passwordless Sign-in

Passwordless Sign-in

Get ready to say goodbye to password managers or even all your passwords. Thanks to FIDO, the industry is shifting to open standards password-less authentication everywhere.

Who’s been using macOS, iOS credential management, integration and synchronization already have an idea about how it works across devices, apps and websites. But now the experience will be improved, extended and made even easier.

https://www.iclarified.com/85854/apple-microsoft-google-announce-plans-to-expand-support-for-passwordless-signin

Also in my LinkedIn

Power solution to rule them all

The one single power and connectivity kit needed in your laptop backpack.

① One +65W USB-C power charger
② One USB-C 2m/6ft cable with Power Delivery
③ One USB-C kit of adapters to old USB and Micro USB
④ One USB-C adapter to Apple Lightning

This kit: Powers your modern laptop through USB-C. Charges your phone through Lightning or USB-C. Charges eventual other devices on their old USB ports. Connects all devices to one another.

Portable batteries are obsolete. Instead, use your large and powerful laptop battery to charge your phone on the road.

From my LinkedIn

Chatice ufanista do LinkedIn

A chatice do LinkedIn vem do fato de que as pessoas o usam, majoritariamente, para divulgar grandes conquistas e outros egocentrismos, como se suas empresas fossem perfeitas, quando sabemos que elas operam só para ganhar o máximo de dinheiro que conseguirem, a qualquer custo.

Parece-me que as pessoas são levadas à rede por inveja — “deixa ver o que meu colega fez que eu não fiz” — “deixa eu também mostrar as minhas conquistas maiores ainda”.

A rede seria muito mais útil e interessante se usassem-na para divulgar Ciência (no sentido amplo da palavra) e Conhecimento.

Tudo isso, claro, é só a minha opinião.

Good luck to Kyndryl

To all friends that I’ve worked with at IBM and that are now moving to Kyndryl, I wish you success and good luck. The Cloud and IT services opportunity will continue to be huge forever. The countdown you have promoted here was warm and vibrant.

For the still-on-IBM friends, please keep on doing such a great company that always was and continues to be a brilliant reference to the world, not just IT. IBM is an unforgettable school for me and for anybody else that has spent even just a minute working there.

Business worldwide, as we know it, is shaped by companies such as IBM, even if you’ve never heard about it (well, that’s quite impossible).

Also on LinkedIn.

Meus Alunos de Dados

Que satisfação ver meus alunos da Digital House ingressando em novas empresas, em cargos de dados.

Como eu vivo dizendo a eles, Dados é uma oportunidade continental, equivalente a descoberta do Novo Mundo em 1492. E é também como sexo de adolescente: todo mundo diz que está fazendo, mas na verdade ninguém, praticamente ninguém mesmo, está fazendo direito.

E é essa geração de profissionais de dados que farão acontecer.

Mãos à obra!

Também no LinkedIn.

Como Escolher e Comprar um Laptop

Como Escolher e Comprar um Laptop

Um laptop funcional, poderoso e elegante para a maioria das pessoas custa em torno de $1000. Um MacBook da Apple nessa faixa de preços deve atender bem 95% das pessoas: navegar na Internet, editar documentos, editar fotos/videos/multimídia simples, assistir a filmes/videos, jogos comuns e ter bateria de longa duração. Mas se você quer ir de Windows, prepare-se para atravessar a nado um oceano de ofertas confusas que fabricantes despejam no mercado em seu esforço para se diferenciarem para conquistar clientes, com características que o consumidor poderia dispensar.

Read More

How programmers should record time

We the data people immediately identify a poorly designed system when we see it handling date and time as plain local time, instead of the number of seconds since January 1st 1970 of time zone 0.

  • This post was published on 1,626,425,523 (UTC, always UTC).
  • Jesus was born -62,399,513,432.
  • Man visited the moon between -14,552,880 and 93,172,200.
  • And so on…

Just your daily dose of nerdy facts…

Also on my LinkedIn

What means to be Driven By Data

I’ve seen companies saying they have Big Data because they implemented Hadoop or a data lake and maybe Spark.

That’s just wrong.

Big Data, or more precisely, to be Data Driven, is a state where the data a company produces can be reused, as soon as possible, to optimize itself. And there are many ways to reuse data: all meetings and decisions happen with abundance of data, or recently generated data instantly feeds machine learning algorithms to optimize transactions, just to name a few situations.

To be Driven by Data is part culture and part infrastructure. On the infrastructure side, IT teams still struggle with limited visions about how data should flow pervasively and how access should be granted. They fear about security and performance while they should fear of missing out the data opportunity.

Data Streaming is a breakthrough recent technology that is here to help with more fluent data access. For an agile and effective data architecture, Data Streaming is much more strategic and important than just a bigger data warehouse because it is the component that can unleash your data and finally make it useful.

On my LinkedIn

What is Apache Spark

Apache Spark is like Python’s Pandas and is like SQL databases. It can manipulate datasets, filter, integrate, transform.

But Spark was designed from scratch with horizontal scalability and parallelism in mind, which makes it capable of handling datasets with billions or even unknown number of rows — even if a bit less flexible than Pandas.

This is not new in the industry. Enterprise editions of commercial SQL databases are parallel and scalable since a very long time, being also very expensive in all levels of the stack: service/support, software and hardware.

But Spark is free software. And can use Hadoop — also a free software — as scalable and highly available storage, on cheap commodity hardware. In addition, it has a vibrant community and a democratic ecosystem of services and support.

As with all Open Source, Apache Spark changes the economic landscape of massive data processing systems market, taking money out of a few proprietary HW and SW vendors and pulverizing it locally on people and support.

From my LinkedIn

DecisionDesk prevê que Biden venceu as eleições presidenciais americanas de 2020

A Decision Desk é uma empresa especializada em projeções eleitorais baseadas em estatística e dados. Usaram tendências de eleições passadas para criar modelos matemáticos que só aguardavam um influxo de votos da Pensilvânia para atingir um grau de confiança aceitável. Esse influxo chegou na sexta dia 6 de manhã e confirmou que Biden só aumentará sua vantagem naquele estado daqui prá frente. Projetaram também, as 8:50 da manhã de ontem, que Biden vencerá a contagem nacional com 273 pontos.

Este tipo de uso de dados e algoritmos é o mais próximo que a ciência chegou de “prever o futuro”. Chamamos isso de analítica preditiva. Ainda assim é técnica muito frágil e bem específica pois depende de dados dos mais recentes possível. Porque prever o futuro meeeesmo ninguém consegue.

Mas ainda não acabou. Espera-se que Trump judicialize a coisa toda porque é simplesmente um bad loser lunático. Processo que pode lhe custar muito caro ($$$$) pois terá que fazê-lo em múltiplos estados.

Também no meu Facebook.

List of Hard Skills for Data Professionals

2020 list of desired hard skills for data professionals. From the most essential to the more difficult ones.

  1. The English language
  2. SQL
  3. Spreadsheets
  4. Descriptive Statistics (median, variance, correlation etc)
  5. Notions of Data visualization
  6. Notions of Time Series
  7. Handling computer files and folders (this one entered the list because we observed many people simply don’t have it)
  8. Notions of digital information storage (numbers and their limits, time, time zones, text, Unicode, compression)
  9. Probability
  10. Probability Distributions
  11. Linear and Logistic Regressions
  12. Python libraries ecosystem, pip, PyPi
  13. Python’s Pandas, DataFrame and Series wrangling
  14. Linux and the computer command line
  15. NoSQL, JSON, YAML, XML, SVG, APIs, HTTP, protocols and data representation
  16. Cloud and infrastructure as code
  17. Notions of symmetric and asymmetric cryptography, digital signatures and applications
  18. “Big data” systems (Hadoop, Spark)
  19. Software Engineering (classes, modularisation, versioning, containerisation, packaging, DevOps)
  20. Inferential Statistics (confidence intervals, hypothesis testing)
  21. Machine Learning algorithms for regression and classification
  22. Calculus and Numerical Calculus (integrals, derivaties)
  23. Natural Language Processing
  24. Computer vision
  25. Neural Networks

Please remember this list has only hard skills. Ethics, domain and industry knowledge, communication are very important soft skills that won’t fit in this list.

Generally speaking, beginning of the list is where Data Analysts are (up to ≈11). Data Engineers get up to the middle of list (up to ≈18). And Scientists get all the list.

There is also the following graph that I’ve produced:

data professions competencies
Leitor de dedo não

Leitor de dedo não

Podemos agradecer ao virus corona por ajudar a erradicar os leitores de dedo para uso banal como catraca de academias, recepções de prédios comerciais e residenciais etc. Eu nunca relei meus dedos nesses leitores e você deveria fazer o mesmo, pois da perspectiva de Segurança da Informação aquilo é um engodo. E da perspectiva de Higiene, é nojento.

Leitores de dedo são uma “maravilha” para armazenar e socializar germes, bem como sucos vitais que todos nós expelimos (suor e todo tipo de meleca). Se você está envolvido em projetos que implantam esse tipo de tecnologia, prefira outras biometrias superiores que já estão em todo lugar, como reconhecimento facial de múltiplas câmeras.

Este post tem a objetiva intenção de te dar asco de leitores de dedo.

Publicado também no Facebook.

Jupyter and Data Science on a Mac (without Anaconda)

macOS Catalina doesn’t ship with Python 3, only 2. But you can still get 3 from Apple, updated regularly through system’s official update methods. You don’t need to get the awful Anaconda on you Mac to play with Python.

Python 3 is shipped by Xcode Command Line Tools. To get it installed (without the heavy Xcode GUI), type this in your terminal:

xcode-select --install

This way, every time Apple releases an update, you’ll get it.

Settings window will pop so wait 5 minutes for the installation to finish.

If you already have complete Xcode installed, this step was unnecessary (you already had Python 3 installed) and you can continue to the next section of the tutorial.

Clean Old Python Modules

In case you already have Python installed under your user and modules downloaded with pip, remove it:

rm -rf ${HOME}/Caches/com.apple.python/${HOME}/Library/Python \
${HOME}/Library/Python/ \
${HOME}/Library/Caches/pip

Install Python Modules

Now that you get a useful Python 3 installation, use pip3 to install Python modules that you’ll need. Don’t forget to use –user to get things installed on your home folder so you won’t pollute your overall system. For my personal use, I need the complete machine learning, data wrangling and Jupyter suite:

pip3 install --user sqlalchemy
pip3 install --user matplotlib
pip3 install --user pandas
pip3 install --user jupyterlab
pip3 install --user PyMySQL
pip3 install --user configobj
pip3 install --user requests
pip3 install --user seaborn
pip3 install --user bs4
pip3 install --user xgboost
pip3 install --user scikit_learn

But you might need other things as Django or other sqlalchemy drivers. Set yourself at home and install them with pip3.

For modules that require compilation and special library, say crypto, do it like this:

CFLAGS="-I/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/include" \
LDFLAGS="-L/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib" \
pip3 install --user pycrypto

Use Correct Python 3 Binary

For some reason, Apple installs many different Python 3 binaries in different places of the system. The one that gets installed on /usr/bin/python3 has problems loading some libraries and instrumentation with install_name_tool would be required. So lets just use the binary that works better:

export PATH=/Library/Developer/CommandLineTools/usr/bin:$PATH

Run Jupyter Lab on your Mac

Commands installed by pip3 will be available in the ~/Library/Python/3.7/bin/ folder, so just add it to your PATH:

export PATH=$PATH:~/Library/Python/3.7/bin/

Now I can simply type jupyter-lab anywhere in the terminal or command line to make it fire my browser and get a Jupyter environment.

More about Xcode Command Line Tools

Xcode Command Line Tools will get you a full hand of other useful developer tools, such as git, subversion, GCC and LLVM compilers and linkers, make, m4 and a complete Python 3 distribution. You can see most of its installation on /Library/Developer/CommandLineTools folder.

For production and high end processing I’ll still use Python on Linux with my preferred distribution’s default packages (no Anaconda). But this method of getting Python on macOS is fastest and cleanest to get you going on your own data scientist laptop without a VM nor a container.