![]() ![]() Ideally, optionally be able to summarise by group, where group is typically some categorical variable.Be able to summarise a single field or all the fields in a particular dataframe at once, depending on user preference.does the most popular category contain 10% or 99% of the data? Some indication as to the distribution – e.g.A list of the categories – perhaps restricted to the most popular if there are a high number.For categorical data, produce at least these types of summary stats:.Also optionally, some measures of skew, kurtosis etc. ![]() ![]() Some measure of variability, probably standard deviation.Of course, what I regard as esoteric may be very different to what you would. And not to produce too many more esoteric ones, cluttering up the screen. For numeric data, produce at least these types of summary stats.Other times, you might only care about the statistics derived from those which are not missing. It is often important to know how many of your observations are missing. But a “mean” of an unordered categorical field makes no sense. For example, if you have a continuous numeric field, you might want to know the mean. Produce appropriate summary stats depending on the data type.Be able to provide info on as many types of fields as possible (numeric, categorical, character, etc.).Show the number, names and types of the fields.Provide a count of how many observations (records) there are.So, in the usual format, what would I like my data summarisation tool to do in an ideal world? You may note some copy and paste from my previous post. How many records are there? What fields exist? Of which type? Is there missing data? Is the data in a reasonable range? What sort of distribution does it have? Whilst I am a huge fan of data exploration via visualisation, running a summary statistical function over the whole dataset is a great first step to understanding what you have, and whether it’s valid and/or useful. One of the first steps analysts should perform when working with a new dataset is to review its contents and shape. Hot on the heels of delving into the world of R frequency table tools, it’s now time to expand the scope and think about data summary functions in general. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |