generally, Exploratory Data Analysis (EDA) is done to answer a question. as i never practised statistics, it is not that “refined” but, hopefully, has the benefit of applied understanding from an “outsider” – it took me awhile to try and prepare this for a broader audience. that said, i’m always open to make the content more understandable to the “layperson”.

here’s the link to my updated GitHub repository:

https://github.com/LinsAbadia/Python/blob/master/Analyses/EDA.ipynb

initially, i just planned on posting a “short” “blurb” on my blog and GitHub Python page as there seemed to me to be a “virtual triangle” among machine learning, statistics, and data visualisation. i’m still likely to make a brief GitHub “file” but upon serious reflection this post may not be a “cursory” post.

it took me awhile to come up with this post because i was partly busy with an online machine learning course, and, frankly, didn’t know what to write – and i’m finding it difficult to figure out how to do it – it didn’t help that there was a “time-consuming” upgrade of the Jupyter notebook environment that i use to store my ipynb files online.

my last experience started me thinking on how i learn- i still need to reflect more on it. i’ve done “ok” academically but i’ve discovered i can understand better if “alternatives” are provided for me to choose from. programming is, essentially, divergent: that is, sometimes there is more than a single way to arrive at an “acceptable solution”. why can’t “formal” education be that way? i know that the human brain can be easily overloaded by many things but perhaps offering a few choices might result in more students understanding. i’m realistic and pragmatic enough to understand that most teachers are overworked (at least those that care about the development of others) and that maybe there needs to be a more “active” open-source community like coding: sharing can make lighter work.

here is my initial attempt at my updated repository:

https://github.com/LinsAbadia/Python/blob/master/Machine%20Learning/Learning.ipynb

i’m not a statistician so kindly bear with my “crudeness”. initially, i just planned on discussing the “3ms”: Mean, Median & Mode. however, aside from these appearing “too short” and after what the describe method returns, it seemed more sensible to cover all the outputs.

as a former educator, i’m open to content being improved : “iteration” is often necessary in endeavouring to present something simpler – so if you have an idea on how to do this “better”, kindly let me know.

here’s the updated GitHub repository:

https://github.com/LinsAbadia/Python/blob/master/Statistics/Descriptive.ipynb

i’m not a botanist so this is unfamiliar to me – so naturally i googled it.

“serendipitously”, i ran across this “foreign” term in one of my data science courses in learning about a computer language. they are usually green and are the leaf-like vestiges that “protect” the petals of a flower in bud form and act as supports in the blooming process.

i’m currently taking a visualisation course in Python and it has reminded me of red and green colour blindness: both hues appear similar to them.

while they are still granted driver’s licenses as a “strong” convention for traffic lights exist, the position and not just the colour convey information.

this made me think of truly inclusive designs: where a “best effort” is placed that a design is accessible by default (or a “reasonable” alternative or accomodation is provided). this is “good” to know since coming up with a “universal” design can be “problematic” (as more effort can be required) but in media without guidelines this can invaluable.

i was so hung up on words that i “overlooked” visualisations can deceive audiences. i’ve been recently exposed to the works of Edward Tufte and Alberto Cairo on Information Graphics (commonly known by its portmanteau, Infographics). Aside from the important role it can play in emphasising statistics, it also has the power to mislead “consumers” of the information (whether intentional or not). The main point is that they need to be designed carefully and not simply thrown in to break the “monotony” of words or “pretty” things up – they must only be included to serve a particular purpose.

here are a few guidelines to help make the figure you generate “better”:

https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003833

i’m currently taking: Applied Plotting, Charting & Data Representation in Python and have been introduced to a “relevant” model.

as validated by my years of professional experience in ICT, communication is a major part. as technologist, we almost only always focus on the processing and analyses of information. i’m glad that Data Science “explicitly emphasises” the importance of also communication of results. most people just refer to it as IT (but that IMHO is an “antiquated” form of thinking}. not just because it was “recently” rebranded as ICT by some governments and agencies, but because it highlights the other part of the equation and is a much more holistic approach to technology.

for your reference, here’s the Visualization Whee/ by Alberto Cairo:

undefined

i also added it to my GitHub repository:

https://github.com/LinsAbadia/Python/blob/master/Visualisation/VisualizationWheelAlbertoCairo.jpeg