Data Scientist And Machin-Learning Data Science Jobs/What is Data Life Cycle?
How do data scientists get useful insights from data?
So it's all starts with data exploration. Whenever a data scientist comes across any challenging question or any sort of challenging situation, they become detectives so the investigative leads and they try to understand the different patterns or the different characteristics of the data. They try to get all the information that they can from the data and then Then they use it for the betterment of the organization or business.
who is a data scientist?
So guys the data scientists have to be able to view data through a quantitative length. So guys knowing math is one of the very important skills of data scientists. So mathematics is important because in order to find a solution you're going to build a lot of predictive models and these predictive models are going to be based on hard math. So you have to be able to understand all the Underlying mechanics with these models most of the predictive models most of the algorithms require mathematics.
Now, there's a major misconception that data science is all about statistics. Now, I'm not saying that statistics is important. It is very important, but it's not the only type of math that is utilized in data science. There are actually many machine learning algorithms that are based on linear algebra. So guys overall you need to have a good understanding of math and apart from that data scientist. Eli's technology, so data scientists have to be really good with technology. So their main work is they utilize all the technology so that they can analyze these enormous data sets and work with complex algorithms. So all of this requires tools, which are much more sophisticated than Excel so there are data scientists that need to be very efficient with coding languages and few of the core languages have associated with data science include SQL python R & sass. It is also important for a data scientist. Be a tactical business consultant. So, guys, business problems can be on a sword by data scientist since our data scientists work so closely with data they know everything about the business.
If you have a business and you give the entire data set of your business stored data scientist, he knows each and every aspect of your business. Okay. That's how data scientists work.
They get the entire data set.
They study the data set the analyze it and then we see where things are going wrong or what needs to be done more or what?
So guys having this business Acumen is just as important as having skills in algorithms or being good with math and technology. So guys business is also as important as these other fields now, you know who our data scientist is. Let's look at the skill sets that a data scientist names. Okay, it always starts with Statistics statistics that will give you the numbers from the data. So a good understanding of Statistics is very important for becoming a data scientist. You have to be familiar with satisfaction. Contest distributions maximum likelihood estimators and all of that apart from that you should also have a good understanding of probability theory and descriptive statistics. These Concepts will help you make Better Business decisions. So no matter what type of company or role you're interviewing for. You're going to be expected to know how to use the tools of the trade. This means that you have to know a statistical programming language like our or Python and also you'll need to know or database. Wiring language like SQL now the main reason why people prefer our and python is because of the number of packages that these languages have and these predefined packages have most of the algorithms in them.
So you don't have to actually sit down and code the algorithms instead. You can just load one of these packages from their libraries and run it. So programming languages is a must at the minimum. You should know our or python and a database query language now, let's move on to data extraction and processing. So guys That you have multiple data sources like MySQL database Mongo database. So what you have to do is you have to extract from such sources and then in order to analyze and query this database you have to store it in a proper format or a proper structure.
data wrangling:
So this entire process is called extraction and processing. So, guys, extraction and processing are all about getting data. From these different data sources and then putting it in a format so that you can analyze it now next is data wrangling and exploration now guys data wrangling is one of the most difficult tasks in data science. This is the most time-consuming task because data wrangling is all about cleaning the data. There are a lot of instances where the data sets have missing values or they have null values or they have inconsistent formats or inconsistent values and you need to understand what to do with such values. This is Data wrangling or data cleaning comes into the picture then after you're done with that. You are going to analyze the data. So where's after data wrangling and cleaning are done. You're going to start exploring. This is where you try to make sense of the data. So you can do this by looking at the different patterns in the data the different Trends outliers and various unexpected results in all of that. Next.
machine-learning:
So guys if you're a large company or with huge amounts of data or if you're working at a company. See where the product is data-driven, like if you're working on Netflix or Google Maps.
What is machine learning methods, right?
You cannot process large amounts of data with traditional methods. So that's why you need machine learning algorithms. So there are a few algorithms. Like, knock nearest neighbor
does random Forest this K means algorithm this support Vector machines, all of these algorithms. You have to be aware of all of these algorithms and let me tell you that most of these algorithms can be implemented. Using our or python libraries. you need to have an understanding of machine learning. If you have a large amount of data in front of you which is going to be the case for most of the people right now because data is being generated at an Unstoppable Pace earlier in the session we discussed how much of data is generated. So for now knowing machine learning algorithms and machine learning Concepts is a very required skill if you want to become a data scientist, so if you're sitting for an interview as a data scientist, you will be asked machine learning.
Big Data Processing:
we know that we've been generating a lot of data and most of this data can be structured or unstructured as well. So on such data, you cannot use the traditional data processing system. So that's why you need to know Frameworks like Hadoop and Spark.These Frameworks can be used to handle big data lastly. We have data visualization. So, guys, data visualization is Is one of the most important parts of data analysis, it is always very important to present the data in an understandable and visually appealing format. So data visualization is one of the skills that data scientists have to master. If you want to communicate the data with the end-users in a better way then data visualization is a must so guys are a lot of tools that can be used for data visualization tools like Diablo and power bi are few the most popular visualization tools. So with this, we sum up the entire skillset that is needed to become a data scientist apart from this you should also have a data-driven problem-solving approach. You should also be very creative with data.
So now that we know the skills that are needed to become a data scientist. Let's look at the different job roles just data science is a very vast field. There are many job roles in data science. So let's take a look at each role. Let's start off with a data scientist. So there are data scientists
have to understand. The challenge is over the business and they have to offer the best solution using data analysis and data processing. So for instance, if they are expected to perform predictive analysis, they should also be able to identify trends and patterns that can have the companies in making better decisions to become a data scientist. You have to be an expert in our Matlab SQL Python and other complementary technologies. It can also help if you have a higher degree in mathematics or computer engineering next we have data. An analyst so a data analyst is responsible for a variety of tasks, including visualization processing of massive amounts of data and among them. They have to also perform queries on databases. So they should be aware of the different query languages and guys one of the most important skills of a data analyst is optimization.
This is because they have to create and modify algorithms that can be used to pull information from some of the biggest databases without corrupting the data so to become Be done.
You must know Technologies such as SQL our SAS and python. So certification in any of these technologies can boost your job application. You should also have good problem-solving qualities.
Data Architect:
So a data architect creates the blueprints for data management so that the databases can be easily integrated
centralized and protected with the best security measures.
They also ensure that the data Engineers have the best tools
and systems to work with So to become a data architect, you have to have expertise and data warehousing data modeling extraction
transformation and loan. You should also be well versed in Hive Pig and Spark now apart from this there are data Engineers. So, guys, the main responsibilities of a data engineer is to build
and test scalable Big Data ecosystems. Okay, they are also needed to update the existing systems with newer or upgraded versions and they are also responsible for improving the efficiency. For database now. If you are interested in a career as a data engineer, then technologies that require hands-on experience include Hive NoSQL are Ruby Java C++ and Matlab, it would also help if you can work with popular data APIs and ETL tools next.
statistician:
So as the name suggests you have to have a sound understanding of statistical theories and data organization. Not only do they extract and offer valuable insights. They also create new. Methodologies for engineers to apply now. If you want to become a statistician then you have to have a passion for logic. They are also a good variety of database systems such as SQL Data Mining and other various machine learning Technologies by that I mean, you should be good with math and you should also have good knowledge about the weight is
database system such as SQL and also the various machine learning Concepts and algorithms is the most next we have the database administrator.
Database Administrator/Data Science Jobs:
So guys the job profile of a database administrator is Much self-explanatory, they are basically responsible for the proper functioning of all the databases and they are also responsible for granting permission or the working in services to the employees of the company. They also have to take care
of the database backups and recoveries. So some of the skills that are needed to become a database administrator include database backup and Recovery data security data modeling and design next. We have the business analyst now the role of a business analyst is a little It different from all of the other
data signs jobs now. Don't get me wrong. They have a very good understanding of data-oriented Technologies. They know how to handle a lot of data and process it but they are also very focused on how this data can be linked o actionable business inside. so order to become a business analyst you have to have an understanding of business finances business intelligence. And also I did acknowledge, he's like data modeling data visualization tools and Etc at last we have a data and analytics manager a data and analytics manager is responsible for the data science operations. Now the main responsibilities of a data and analytics manager are to oversee the data science operation. Okay, he's responsible for assigning the duties to the team according to their skills and expertise now their strength should include Technologies like SAS our SQL. And of course, they should have good management skills apart from that. They must have excellent social skills leadership qualities and out-of-the-box thinking attitude. And like I said earlier you need to have a good understanding of Technologies. Like pythons as our Java and Etc. So, Guys, these were the different job roles
in data science. I hope you all found this informative.
Data Life Cycle?
It starts with a business requirement. Next is the data acquisition after that you would process the data which is called data processing. Then there is data exploration modeling
and finally deployment. So guys before you even start on a data science project. It is important that you understand the problem
you're trying to solve. So in this stage, you're just going to focus on identifying the central objectives of the project and you will do this by identifying the variables that need to be
predicted next up. We have data acquisition. Okay. So now that you have your objectives I find it's time for you to start
Gathering the data. So data mining is the process of gathering your data from different sources at this stage some of the questions.
1.what data do I need for my project?
2. Where does it live?
3. How can I obtain it?
4.what is the most efficient way to store and access all of it?
Next up there is data processing now usually all the data that you collected is a huge mess. It's not formatted. It's not structured. It's not cleaned. So if Find any data sO that is cleaned and it's packaged well for you, then you've actually won the lottery because finding the right data takes a lot of time and it takes a lot of effort and one of the major time-consuming tasks in the data science process is data cleaning. Okay, this requires a lot of time. It requires a lot of effort because you have to go through the entire data set to find out any missing values or if there are any inconsistent values or corrupted data, and you also find the unnecessary data. Over here and you remove that data. So this was all about data processing next we have data exploration. So now that you have a sparkling clean set of data, you are finally ready to get started with your analysis. Okay, the data exploration stage
is basically the brainstorming of data analysis. So in order to understand the patterns in your data, you can use a histogram. You can just pull up a random subset of data and plot a histogram. You can even create interactive visualizations. This is the point where you Dive deep into the data and you try to explore the different models that can be applied to your data next up.
We have data modeling. So after processing the data, what you're going to do is you're going to carry out model training. Okay. Now model training is basically about finding a model that answers the questions more accurately. So the process of model training involves a lot of steps. So firstly you'll start by splitting the input data into the training data set and the testing data set. Okay, you're going to take the entire data set and you're going to separate it into Two two parts one is the training and one is the testing data after that your build a model by using the training data set and once you're done with that, you'll evaluate the training and the test data set now to evaluate the training and testing data. So you'll be using a series of machine learning algorithms after that. You'll find out the model which is the most suitable for your business requirement?
So this was mainly data modeling. This is where you build a model out of your training data set and then you evaluate this model by using the testing data set. You have deployment. So guys a goal of this stage is to deploy the model into production or maybe a production-like environment. So this is basically done for final user acceptance and the users have to validate the performance of the models and if there are any issues with the model or any issues with the algorithm, then they have to be fixed in this stage. So guys with this we come to the end of the data lifecycle. I hope this was clear statistics and probability are essential because these disciples form the basic Foundation of all machine learning algorithms deep learning artificial intelligence and data science. In fact, mathematics and probability are behind everything around us from shapes patterns and colors to the coun of petals in flower mathematics is embedded in each and every aspect of our lives with this in mind.
I welcome you all to today's session. So I'm going to go ahead and Scoffs the agenda for today with you all now going to begin the session by understanding what is data after that?
We'll move on and look at the different categories of data, like quantitative and qualitative data, then we'll discuss what exactly statistics are the basic terminologies in statistics and a couple of sampling techniques. Once we're done with that. We'll discuss the different types of Statistics which involve descriptive and inferential statistics. Then in the next session will mainly be focusing on descriptive statistics here will understand the different measures of center measures of spread Information Gain and entropy will also understand all of these measures with the help of a use case and finally we'll discuss what exactly a confusion Matrix is once we've covered the entire descriptive statistics module will discuss the probability module here will understand what exactly probability is
the different terminologies in probability will also study the Different probability distributions, then we'll discuss the types of probability which include marginal probability joint and conditional probability. Then we move on and discuss a use case where and we'll see examples that show us how the different types of probability work and to better understand Bayes theorem. We look at a small example.
Also, I forgot to mention that at the end of the descriptive statistics module will be running a small demo in our language. So for those of you who don't know much about our, I'll be explaining every line in-depth, but if you want to have a more in-depth understanding of our I'll leave a couple of blocks. And a couple of videos in the description box you all can definitely check out that content.
Now after we've completed the probability module will discuss the inferential statistics module will start this module by understanding
what is point estimation:
We will discuss what is confidence interval and how you can estimate the confidence interval. We will also discuss the margin of error and will understand all of these concepts by looking at a small use case. We'd finally end the inferential Real statistic module by looking at what hypothesis testing is a hypothesis. Testing is a very important part of inferential statistics. So we'll end the session by looking at a use case that discusses how hypothesis testing works and to sum everything up.We'll look at a demo that explains how inferential statistics Works. Alright, so guys, there's a lot to cover today. So let's move ahead and take a look at our first topic which is what is data.
Now, this is a quite simple question if I ask any of You what is data?
You'll see that it's a set of numbers or some sort of documents that have stored in my computer now data is actually everything. All right, look around you there is data everywhere each click on your phone generates more data than you know, now this generated data provides insights for analysis and helps us make Better Business decisions. This is why data is so important to give you a formal definition data refer to facts and statistics. Collected together
for reference or analysis.
4.Linear Regression.
5.Logistic Regression.
6.Decision Trees.
7.Random Forest.
8.K Nearest Neighbour.
9.Naive Bayes.
10.Support Vector Machine.
11.K Means Clustering.
12.Association Rule Mining.
13.Reinforcement Learning.
14.Deep Learning.
15.Interview Questions.