How to become a data scientist?

To introduce myself, I graduated from Master of Business Analytics (MBAn) program at Massachusetts Institute of Technology (MIT). Though this program is named Business Analytics, it is actually a Data Science program. After graduation, I joined a FLAG company and become a Machine Learning Scientist. I would love to share my views about becoming a data scientist with you.

First of all, the job definition of a data scientist is actually very vague.

  • Person A graduated with a Ph.D. from a prestigious HYPSM university. He works at a hedge fund company in New York. He works on the core trading strategy, with a salary of $200,000 and a bonus of $250,000 at the end of the year.

  • Person B works in the core team of a large Internet company, processing terabytes of data every day, dealing with deep learning models, and reading the latest paper to reproduce results.

  • Person C works in a top consulting company. He meets customers every day and explains models to them. He has great presentation skills and business acumen.

  • Person D works in a bank. He writes SQL every day and uses Tableau for dashboarding. He earns an annual salary of $100,000 and a bonus of $30,000.

Guess what? All of them are titled Data Scientists..

So, the concept of Data Scientist is pretty vague. In fact, some of the data scientists may be quantitative researchers, some might actually be data analysts, some lean towards consultants, and some are actually SDEs. All these positions are related to Data Scientist.

Today I will not talk about how to find a position called “Data Scientist”, but what data science-related skills to possess? (Taking students majoring in Data Science and Business Analytics as an example)

What positions can be found?

What skills are required for these positions and how are these positions paid?

How to prepare and improve these skills?


Let’s get started.

1. What position can I pursue with an DS/BA degree?

DS/BA students have a wide range of career options, which can be both an advantage and a disadvantage.

The advantage is that if all of your skills are strong, you can receive offers in multiple industries, and you have multiple career options. The disadvantage is that if your background is not strong enough, and you are not well prepared for a certain industry, then you might end up with 0 offer.

First of all, these industries are:

(1) Consulting companies

Most DS/BA graduates are actually most suitable for consulting companies.

A DS/BA degree is sufficient for a student to be a data scientist or a technology consultant at a consulting company. In addition, BA students have strong communication and presentation skills, which make it easier for BA graduates to enter consulting companies.

Taking our program as an example, among 44 students, about 10 received offers from BCG gamma, and 5 received offers from McKinsey. The best industry for a BA graduate is undoubtedly consulting.

My friend @Iker Wang He will soon become a data scientist at BCG. @Pace Han He joined McKinsey.

(2) Internet and technology companies

There are many positions to choose within Internet & tech companies.

If your programming skills are as strong as mine, you can directly apply for software engineer positions. Last year, two seniors in our program went to Google software engineers.

If you are a little weaker in programming, and hope that you work more on machine learning and modeling. Data scientist is very suitable for you. However, it is worth noting that most of the company’s data scientists actually prefer to recruit Ph.D. students, and master’s students may find it challenging to get an interview. In amazon, this position is called applied scientist.

If you want to get in touch with more business, you can choose data analyst or product analyst, which is called business intelligence engineer at Amazon. The position analyzes the product market. It is not as hardcore, and SQL is used often.

Of course, you can also choose to become a product manager.

(3) Financial companies

For financial companies, you can choose to be a quantitative researcher, or a data scientist, or a quantitative trader.

(4) Other traditional enterprises

In traditional industries, there are also market analyst or data analyst positions.

2. What skills should be possessed?

In order to find a job as mentioned above, your main skills should be the following:

(1) Data structure and algorithm (such as leetcode, topcoder)

(2) Data science programming language: SQL>Python>R (importance in interview)

(3) Mathematics, statistics and modeling applications: statistics, operations research, machine learning, etc

(4) Application of data knowledge in practical fields: digital marketing, healthcare analytics, machine learning in finance, etc. Commonly known as domain knowledge

(5) Pure IQ, strong brain teaser ability to solve mathematical problems, and quick on-site response

(6) Be able to express, communicate, and present. For Chinese, language proficiency is also essential.

The skill set requirements for each position is different, and the salary package also varies. See the following introduction for details.

3. Salary and skill requirements for each direction of BA employment

It should be noted that the salary level (including company + stock + bonus) here, I refer to the level of top companies in each industry. For example, for quant researcher positions I refer to citadel, two sigma. For SDE positions I refer to facebook and google. For consulting positions I refer to McKinsey and BCG. Other companies may not have such high salaries, but in general, the wage gaps in these industries and positions are similar (for example, a data analyst will have a slightly lower salary than a data scientist).

The unit of salary is US dollars. Data is based on my experiences. If any data is inaccurate, please point it out.

Now let’s discuss the most important part below.

4. How to improve the skill sets mentioned above? (My advice)

Above we mentioned several important skills for BA students to apply for jobs in North America. They are: (1) Data Structure and Algorithms (such as leetcode, topcoder) (2) Data-related programming languages: SQL>Python>R (in interviews (3) Application of mathematics, statistics and modeling: statistics, operations research, machine learning, etc. (4) Application of data knowledge in practical fields: digital marketing, healthcare analytics, machine learning in finance, etc. Commonly known as domain knowledge. (5) Pure IQ, strong brain teaser ability, strong ability to solve math problems, and quick on-the-spot reaction.

Let’s talk about how to improve one by one.

(1) Data Structure and Algorithms(such as leetcode, topcoder)

Data Structure and Algorithms are the core of the software engineer recruitment, which examines the core content of computer programming: the ability to understand and use algorithms and data structures. This field is a nightmare for many BA students who transfer from business and liberal arts. But this skill is valuable, as long as you master data structure and algorithms, you can increase your salary cap a lot. My own job search process benefited a lot from fluent coding. The simplest way to improve the skill is to go through leetcode. Just search for “leetcode” on google, there will be hundreds of guidances to help you kick off. Since I am an information contest player, I could kill all the leetcode hard in 9th grade, so I hardly prepared. For ordinary students, if you want to apply for a position close to SDE, it is recommended that you finish at least 150 questions on Leetcode.

Therefore, this is a well-defined skill that can be improved.

When you can easily write the “Tree chain split”, it should be quite easy to get an SDE offer.



(2) Data science programming languages

Apart from programming questions related to leetcode above, there are two ways to test programming skills. The first is to write SQL, which is what Facebook’s Data Scientist position tests. The second is to use python or R to analyze a dataset and present the results, which involves a lot of data manipulation. So pandas (if you use python) and dplyr (if you use R) are very important. The best way to improve is to practice. If you write a lot of SQL, you will naturally become familiar with it. The skills of pandas are also built on layer by layer, which cannot be improved overnight. To become a Data Analyst or a Data Engineer, SQL will always be one of your most important skills. But do not limit yourself to SQL, otherwise you will become the SQL monkey of others and your career development will be limited.


(3) Mathematics, statistics and modeling applications: statistics, operations research, machine learning, etc.

This type of knowledge will be tested in two ways:

The first is to directly ask conceptual machine learning questions. For example:

a. Why is GDBT different from Random Forest?

b. What is the difference between the l1 norm and the l2 norm?

c. What is Thompson Sampling?

d. What is the objective function of explore and exploit?

e. What is dropout layer, and why is it important?

To answer these questions, I recommend learning machine learning systematically. You can read Li Hang’s “Statistical Learning Methods”, you can read “Elements of statistical learning theory”, and you can read MLAPP. I find these books very helpful. If you are just days away from the interview, I can recommend you reading: “Heard In Data Science Interviews: Over 650 Most Commonly Asked Interview Questions & Answers”

The second type requires you to analyze a dataset and do a take-home test (usually 3 to 7 days) , build models to solve problems, and give a presentation. Take-home tests are really quite challenging and time-consuming, and have a low passing rate! So we will talk about take-home in detail in the next article.



(4) Application of data knowledge in practical fields: digital marketing, healthcare analytics, machine learning in finance, etc. Commonly known as domain knowledge. This ability is generally tested in business questions and case questions. You need to be very familiar with the business scenarios of the company and position, and be able to think from the perspective of a company employee (or consultant). I am not particularly good at these topics. If you are interviewing with a consulting company, and you don’t know what cases you will be facing, it is best to mock interviews a lot. If you are dealing with a technology company, such as Uber, it is best to talk to current employees in the company to see how they view problems. For example, Uber’s business analysis must have two aspects, one is the driver sider, and the other is the rider sider.(According to what I learned from the current employees). For example, when analyzing Facebook’s business question, “Network Effect” related analysis might cater to their taste (I assume). I personally recommend going to Youtube to watch speeches from senior data scientists or marketing VPs, which can quickly help you grasp the analytics ideas of that company.

(5) Pure IQ, strong brain teaser ability, strong ability to solve math problems, and quick on-the-spot reaction.

In quant-related positions, you are often asked math problems that require IQ to solve. Some are related to probability, some are related to statistics, and some are pure inference. I had interviews with four Hedge Funds and all encountered such problems. If the position is quant researcher, there will be fewer math questions, but if it is quant trader, then I’m sorry, 80% of your interview questions are like these. The ability to answer brainteaser questions, and the ability to solve math problems on the spot, can be improved by practicing. The more you interview, the more you are familiar with the game. I also recommend the book “A Practical Guide to Quantitative Finance Interviews”. As for pure IQ. . . I don’t know how to improve. I hope someone can tell me when they find the answer.

(6) Be able to express, communicate, and present

Improve your English in all aspects! Do projects with your foreign classmates! Don’t stay with Chinese people all day long! Speak English if you have the chance! Try to practice presentations and use video recording to keep improving! More mock interviews! Learn what others say while helping others mock! In the next article, I will review the questions asked by various companies based on my job search experiences, so that everyone has a general “perception” of interviews with different companies. At the same time, I will also share my experiences with you on some specific topics: take-home tests, salary negotiation, job search mentality, etc.


Stay tuned.

Previous
Previous

My takeaways from winning many data science competitions (with grand prize!)