to “complete” “slicing” DataFrames, i discuss loc and iloc. i think this enough to cover the “basics” of Python. as you know, i will start trying to delve into statistics to a.) further my skills, and b.) see if i can be “useful” to my wife.

i was always planning to tackle “advanced” topics -it was just “accelerated” sooner rather later.

here’s something i “shared” so i can “move on” to statistics :

That said, i can consider revisiting “past” topics based on feedback.

i did a lot of coding in my time and was introduced to neural networks at school so it wasn’i really a stretch learning Python. i only knew aspects of statistics so it became obvious to me that it was something i had to strengthen to upgrade my data science skills because i had a lot of exposure to programming and a little background on artificial intelligence – let me preface it by saying, it’s been awhile since i’ve “actively” done both and technology has advanced, that said, i’ve been developing a GitHub repository because i believe the expression that says you teach best what you need to learn.

to brush on the basics and truly understand Descriptive Statistics i’m perusing version 2 of the ebook Think Stats: Exploratory Data Analysis by Allen B. Downey. it’s supposedly framed for programmers and better suited for them in learning statistics.

aside from personal growth, my wife (although she’s well versed in machine learning and teaching programming) and her work team are looking at doing some research that may require this. so there’s a greater incentive to study this.

“Slicing” (that is, creating subsets using indices) DataFrames can be quite useful in partitioning datasets. for those familiar with SQL, this kind of reminds me of the SELECT command that is sometimes paired with an optional WHERE clause.

i know this is a very “basic” treatment but i used to play a lot of basketball and i believe in the importance of fundamentals. i use a lot of this in my own code and from what i’ve seen on the internet this is very common in snippets shared so IMHO it’s important to grasp the “basics” of this – in other words, it’s important to understand this in trying to make sense of sample code (comments are another thing but don’t get me started on that “bugbear”…).

here my updated GitHub repository:

since i mainly use a Jupyter notebook for Python coding, i use the print() function a lot to help with “debugging”. Error “detection” has a lot to be desired (that’s one of my only complaints. i lean towards it being used to introduce programming).

here are a “few debugging tips” that would have handy to know in learning how to code in Python:

in “major” databases there is sometimes an ETL (Extract,Transform, Load) tool. as DataFrames are the “commonly” used data structure in Python for similar operations (and analysis), you can perform all three functions. That said, i prefer to only do the ‘E’ and ‘L’ as they are “simply” accomplished by built-in functions. The ‘T’ require me to use a for loop and read each row using a file handler, so it’s more “convenient” for me to manipulate the data once it’s imported.

it’s important to note that determining which dataset to use can involve unconscious/implicit bias. therefore in analysis (and offering insights), you need to consider the source: no matter the prevailing “wisdom”, one needs to distinguish between fact and opinion.

here is the updated GitHub repository:

there are many ways to instantiate a DataFrame but here’a a primer on typical ways to create one.

the DataFrame is the primary data structure in Python for data science. it acts like a spreadsheet or database – it kind of reminds me of the Data Window object in PowerBuilder (it was very convenient for me). And unlike most high-level programming computer languages it didn’t need a “connector” (or driver) like ODBC (Open DataBase Connectivity) or JDBC (Java DataBase Connectivity) – you were lucky if there was a “native” one because it performed quicker as there was no need to “translate” stuff – to interact with external databases.

here is my updated GitHub repository:

when i started preparing stuff for DataFrames, it seemed sensible to introduce the Python Dictionary.

in it, i use the NATO alphabet, which as we all know is an acronym. Another form of an abbreviation is an initialism. Both utilise the first letter of words to form a “new” word but the former pronounces it as a word, while the latter is voiced by each initial (like AI for Artificial Intelligence).

as part of forming the subheadings, (although often used interchangeably) i discovered if i should use duplicate or replicate.

here’s the updated repository: