Showing posts with label hot_skills. Show all posts
Showing posts with label hot_skills. Show all posts

Sunday, November 29, 2015

Sitting for a Data Scientist or a data analyst interview, you can afford to miss this post!

Productiviti
So you have decided to save the world by finding insights from the data which are otherwise not visible to the  naive world out there So if you are not sure of what data scientists are doing and how it is impacting the world then we advise you to follow our facebook page by clicking here

So I hope that you have already liked our page and have clicked on 'Following' by now. So now its time to get your hands on to some interview questions that may helpful while sitting in any interview.

Our friend who is working for an analytic firm has been generous enough to provide these questions and answers to us. If you find these of any use we would appreciate if you could leave a thank you note in the comment box below.

Question 1. Can you outline the various steps in an analytics project?
Broadly speaking these are the steps. Of course these may vary slightly depending on the type of problem, data, tools available etc.
1. Problem definition – The first step is to of course understand the business problem. What is the problem you are trying to solve – what is the business context? Very often however your client may also just give you a whole lot of data and ask you to do something with it. In such a case you would need to take a more exploratory look at the data. Nevertheless if the client has a specific problem that needs to be tackled, then then first step is to clearly define and understand the problem. You will then need to convert the business problem into an analytics problem. I other words you need to understand exactly what you are going to predict with the model you build. There is no point in building a fabulous model, only to realise later that what it is predicting is not exactly what the business needs.
2. Data Exploration – Once you have the problem defined, the next step is to explore the data and become more familiar with it. This is especially important when dealing with a completely new data set.
3. Data Preparation – Now that you have a good understanding of the data, you will need to prepare it for modelling. You will identify and treat missing values, detect outliers, transform variables, create binary variables if required and so on. This stage is very influenced by the modelling technique you will use at the next stage.  For example, regression involves a fair amount of data preparation, but decision trees may need less prep whereas clustering requires a whole different kind of prep as compared to other techniques.
4. Modelling – Once the data is prepared, you can begin modelling. This is usually an iterative process where you run a model, evaluate the results, tweak your approach, run another model, evaluate the results, re-tweak and so on….. You go on doing this until you come up with a model you are satisfied with or what you feel is the best possible result with the given data.
5. Validation – The final model (or maybe the best 2-3 models) should then be put through the validation process. In this process, you test the model using completely new data set i.e. data that was not used to build the model. This process ensures that your model is a good model in general and not just a very good model for the specific data earlier used (Technically, this is called avoiding over fitting)
6. Implementation and tracking – The final model is chosen after the validation. Then you start implementing the model and tracking the results. You need to track results to see the performance of the model over time. In general, the accuracy of a model goes down over time. How much time will really depend on the variables – how dynamic or static they are, and the general environment – how static or dynamic that is.

Question 2.   What do you do in data exploration?
Data exploration is done to become familiar with the data. This step is especially important when dealing with new data. There are a number of things you will want to do in this step –
a.        What is there in the data – look at the list of all the variables in the data set. Understand the meaning of each variable using the data dictionary. Go back to the business for more information in case of any confusion.
b.        How much data is there – look at the volume of the data (how many records), look at the time frame of the data (last 3 months, last 6 months etc.)
c.         Quality of the data – how much missing information, quality of data in each variable. Are all fields usable? If a field has data for only 10% of the observations, then maybe that field is not usable etc.
d.        You will also identify some important variables and may do a deeper investigation of these. Like looking at averages, min and max values, maybe 10th and 90th percentile as well…
e.        You may also identify fields that you need to transform in the data prep stage.

Question 3: What do you do in data preparation?
In data preparation, you will prepare the data for the next stage i.e. the modelling stage. What you do here is influenced by the choice of technique you use in the next stage.
But some things are done in most cases – example identifying missing values and treating them, identifying outlier values (unusual values) and treating them, transforming variables, creating binary variables if required etc,
This is the stage where you will partition the data as well. i.e create training data (to do modelling) and validation (to do validation).

Question 4: How will you treat missing values?
The first step is to identify variables with missing values. Assess the extent of missing values. Is there a pattern in missing values? If yes, try and identify the pattern. It may lead to interesting insights.
If no pattern, then we can either ignore missing values (SAS will not use any observation with missing data) or impute the missing values.
Simple imputation – substitute with mean or median values
OR
Case wise imputation –for example, if we have missing values in the income field.

Question 5: How will you treat outlier values?
You can identify outliers using graphical analysis and univariate analysis. If there are only a few outliers, you can assess them individually. If there are many, you may want to substitute the outlier values with the 1stpercentile or the 99th percentile values.
If there is a lot of data, you may decide to ignore records with outliers.
Not all extreme values are outliers. Not all outliers are extreme values.

Question 6: How do you assess the results of a logistic regression analysis?
You can use different methods to assess how good a logistic model is.
a. Concordance – This tells you about the ability of the model to discriminate between the event happening and not happening.
b. Lift – It helps you assess how much better the model is compared to random selection.
c. Classification matrix – helps you look at the false positives and true negatives.
Some other general questions you will most likely be asked:
  • What have you done to improve your data analytics knowledge in the past year?
  • What are your career goals?
  • Why do you want a career in data analytics?
If you are following our page on facebook we can assure you that you will know the answer to the last question atleast by now.  

All you need to know about Six Sigma

Productiviti
Most of the companies nowadays ask for 6 sigma certified people or atleast they demand that the candidates must have atleast prelimnary knowledge of six sigma. At Expertmasterji we have created a jist of what six sigma is and how it is used in the industry. Hope it helps you in your interviews. For more material on interviews please click here.

Question: - What is Six Sigma?
Answer: - Six Sigma (Six Standard Deviations from mean) is a methodology to enhance the efficiency and minimize the errors in any given process through some special tools & techniques.

It was applied for the first time in the manufacturing division of the Motorola Company, where this process was and infact has been rigorously used for minimizing the errors in the processes of producing several millions of different parts. However under Jack Welch Six Sigma was adopted by GE and its concerned companies and it was so rigorously used there that Six Sigma became more related with GE then with Motorola .Although Six Sigma was primarily designed to reduce errors in manufacturing domain with time however its application widened with the inclusion of non-manufacturing processes too. Today It has spread to such a level that its use can be witnessed in the fields as different as call centers, medical processes and insurance processes etc.

The impetus Six Sigma lays on quality can be gauged by the very fact that six sigma allows only 3.4 defects per million of transactions. It uses statistical techniques for reducing the error and measuring the quality of the final process or product.

Question: - What is the methodology used by Six Sigma? 
Answer: - Six Sigma helps in improving the process via constant revisions and alterations.Business process improvement is done in SIx sigma using the novel methodology called DMAIC (Define opportunities, Measure performance, Analyze opportunity, Improve performance, Control performance).
Besides process improvement Six Sigma can also be used for new product development(NPD), however for NPD it uses the process known as DFSS (Design for Six Sigma).
Some of the important elements for the improvement of process by Six Sigma are: continuous improvement, metrics & measures, customer requirements, employee involvement and design quality.

Question: - What is the meaning of Green Belt and Black belt in six sigma?
Answer: - 

Question: - What are the 3 core elements of Six Sigma ?
Answer: - The 3 core elements of Six Sigma are :-
i) Customer satisfaction:
ii)Defining processes & metrics and measures
iii) Involvement of employees and team building

The Six Sigma is very strict towards the customer satisfaction. Under the Six Sigma methodology, it is ensured that the satisfaction of the customers is never compromised with.

This element Of Six Sigma defines the usefulness and the understanding of data & systems as well as sets the goals for improvements.

Under Six Sigma, it is imperative to involve all the employees by the company and to provide the employees with proper incentives and opportunities. This way the employees can be motivated to do their work more sincerely. Besides, Six Sigma also ensures that each of the employees has a defined role in the company and there is no scope of ambiguity, at all.