Mongo DB for Administrators


language: EN

                  WEBVTT
in terms of how to do it, right?
But anyway, the web console,
if you're gonna be using the enterprise version of MongoDB,
you get the ops manager,
but there's also open source ones
that you can be able to use.
Key features of it, real time monitoring,
you can view your metrics, your operations,
your memory usage, your network activity.
You can set up alerts, you can perform analysis,
you can analyze query performance,
identify slow queries,
and then you can do backup and restores
from the web interface, right?
That's just the advantage of the web console.
Now let's talk about indexing.
Indexing is where you structure your data
in such a way that there is improved speed
of data retrieval.
So in essence, you're taking a bit of data
and moving it closer for easier accessibility, right?
You use that to be able to quickly locate documents
without scanning the entire collection.
So in essence, instead of you having to go through
2,000 students, right?
If you create a index, let's say for a name, right?
Then it won't need to go through
the whole collection of students.
It will just go look for the specifics in the name.
That's how indexes work, right?
And they use what you call bit trees,
sort of a type of balance tree to store indexes, right?
So there is the database and then there's indexes
going down and then each index entry contains a value
from the index fields, right?
Which is used as a pointer to the corresponding document.
So for example, if you are going to use indexes
of first name, which means in every document,
the first name is going to be used as that.
So when you want to try and search for something
using the first name, it's gonna be quick to retrieve, right?
And then when a query is executed,
MongoDB uses that index to quickly locate
the relevant documents.
So in essence, what it will do is instead of going through
the whole document, it goes looks for first name
called kumbulani and then bring back whatever it needs
to bring back from that kumbulani, right?
Instead of it having to go through the whole collection
of documents looking for a kumbulani,
it goes first name, first name is index.
All I need to look for is kumbulani, kumbulani here,
that type of setup, right?
There's different types of indexes.
First one being a index single field, right?
In a collection.
So for example, you create name as an index, right?
That's a single index, single field index, right?
And then you've got a compound where you can have
multiple fields as indexes.
So for example, the in DB students,
you've got name and then age.
So in essence, if you want to retrieve something quickly,
if you have name and age on it, it will be very, very quick.
I'm sure in banks, they use the ID number, right?
To retrieve information about you at any point.
So they would index the ID number
because that's something that is so different
that if you index that,
then it will quickly retrieve anything
that's specific to that ID.
You can also have multi-key indexes, right?
Or nested documents.
Let's say for example, subjects, right?
Where you have a nested document or an array, right?
Because subjects has got many other subjects under it, right?
That you can actually index that array
or that nested document, right?
And then you can also do text index where test,
that you can, it supports text search queries
or string content, right?
So in essence, you can then index name as a text, right?
So it becomes easy to be able to search
first name using your text.
The text type of data, right?
And then you've got your JSPATAL,
which is obviously this one is used by your Ubers
and your bolts, right?
They index locations, right?
So location is the main thing that's used by Uber and Bolt.
So they index that.
And then you've got hashed index where you can hash,
or you can use the hash of a field, right?
Probably the email or something, you know?
It's also good for sharding.
And then you've got TTL index
where you automatically remove documents
after a specified time.
So in essence, after a month, right?
You can be able to actually remove those documents
if you want to, right?
So you can index them that way, right?
And then managing index,
what do you do in managing index?
You can create indexes, right?
Which could be a single field index.
You use the create index.
You can list indexes, right?
To see within your collection, what is it that's indexed?
And then you can drop indexes.
You no more want that index, then you can drop it,
or you can rebuild indexes
where you do the DB students re-index.
So in essence, it will rebuild all the indexes
in the collection, right?
So some information about,
or some talking about the indexing internals, right?
Your B structure.
So as you said, it uses B tree structure
to store your indexes.
In essence, it's got something like a tree.
And when you speaking about Linux, right?
The tree is where you've got the mother
and then the child, the children as they follow below, right?
And usually it's balanced trees
that allow efficient insertion, deletion,
and search operations.
So your database will be on the one
or your collection will be on being the highest
and then your documents being under there, right?
And each node in the B tree contains multiple keys
and pointers to child nodes, right?
And then index storage, indexes are stored
in separate data files within the DB path.
So they create smaller file indexes, create smaller files
where each index will contain the indexed field value,
and then a pointer to the corresponding document, right?
You've created an index of ID number, right?
It will index that field, right?
Of the ID number and a pointer to the correct document,
right?
And then you've got index selectivity, right?
Where it refers to how unique the values
in an indexed field are.
So you've got what you call high selectivity, right?
Where there's better performance that way,
then there's low selectivity,
which may reduce the effectiveness,
sometimes of the index,
but you'll have many duplicate values, right?
Where high selectivity is involved, it's a unique value.
So there's never the same, which makes life easier.
And then you've got cardinality,
where it refers to the number of unique values
in an indexed field, right?
Where you've got high cardinality field, right?
EG, email, right?
There's so many unique values in that field.
It's very, very good.
But where you've got low cardinality gender,
gender is either male or female, right?
And that's it.
You probably won't get that much benefit out of indexing
because it's either a male or a female.
And if you have to pull anything that's male related,
you can get half the database or 80% of the database,
you know?
If it's email and it's looking for kumbalanity at gmail.com,
that's the only one that is going to look for, right?
And then also indexes consume additional storage space
because remember it creates files, right?
Number one, what does the size of the index depend on?
The size of the indexed field or fields, right?
If it's a very huge,
if it's a field for very huge information,
then it will be big.
And then the number of documents in the collection.
So the bigger,
more the number of documents that are linked
to a specific indexed field,
the more obviously the index file becomes bigger, right?
And then some best practices for that.
Index only frequently queried fields, right?
As an example, as I said, in a bank,
the very first thing that they ask you is your ID number
so that they can pull your profile.
You can index that, you know?
If they are going to use maybe card number, for example,
you can index that, then it pulls much, much quicker,
you know?
Because then anybody that comes in needs to query
using probably an ID number or a cell phone number
or a card number, you know?
And use compound indexes wisely.
Don't just put them, right?
Make sure you use them in a very clever way
where you can be able to filter for multiple fields
but use them very, very wisely.
You don't want to have a situation
where it ends up confusing you.
And then avoid over-indexing
because then it can consume storage
and then it can slow down your write operation, right?
And then monitor your index usage.
Use the index stats aggregation stage,
the stage to monitor index usage.
It's very, very important.
Make sure that you monitor your index usage
and then use covered queries.
Where a query is covered if it can be satisfied
entirely using the index.
So if you can use the index,
then make sure it just uses the index
and not really go all over the place, you know?
For part of it uses an index
and part of it has to go and search in other documents
and whatnot, it wouldn't make sense.
So make sure that if you're gonna entirely use indexes
use indexes the most, right?
Some single field index where you've got a single field
as an index, right?
On a single field in a collection, right?
That one very easy to use.
Speeds up queries that do things like filtering,
sorting, aggregate based on that field, right?
You can aggregate based on that field,
your totals, your sum and all that stuff.
Obviously each entry in the index contains a value
of the index field and then a pointer
to the corresponding document.
For example, there when you have this part, right?
Where you've got, where's my pen now?
This part, so you're creating on names,
that's your index, you know?
And when it's one, it's ascending order,
when it's minus one, it's descending order.
Use cases of it, filtering or sorting by a single field
or when you want to find all students with a specific name
or sorting students by age,
that's another way you can do that.
Then there's the compound one where you've got
a combination of multiple fields in a collection
that you want to index, right?
Multiple fields and then it helps,
especially when you do things like filter,
your sort, your aggregation, right?
Same thing creates a B structure, B tree structure
for the combination of fields
and then the order of the field in the index is,
it matters, right?
Queries can be used to index,
they include a prefix of the index field, right?
An example is if you index on name and age, right?
Then you can be able to use queries on name
or age or name and age, right?
That's how you can be able to utilize them.
So it's either you can use name or you can use name and age
or you can't use just age.
So there is that precedence of the order
of how you're using it, name and age,
name or name and age, but not age, right?
Compound index, when you're creating it,
it's just as simple as adding the two, right?
And then use cases when you want to filter
or sort by multiple fields.
And then for example, if you want to find all students
with a specific name and age range,
then you can be able to use that, right?
Geospatial, as I said, geospatial, as I said,
is where you're using coordinates,
your geospatial data, 2D for flat 2D coordinates,
your 2D sphere, and I think there's 3D now.
And how does it work?
It uses specialized data structure,
which is geohashing, right?
To index your geospatial data.
Example, Uber and all that stuff.
And then for an example, you can be able to create
an index using location, right?
Which is 2D sphere and then,
how can you be able to use it?
An example is finding all places
within a certain distance from a point.
Best practices when it comes to a single field index
is use for filtering or sorting by a single field, right?
Where you want to index email for user authentication,
where you use your email, for example, for authentication.
That's a single field.
Then compound is where you want to filter or sort by,
or sort by multiple fields, right?
And then ensure the order of the field
matches the query patterns.
That's very important.
Geospatial, when you want to use
for location-based queries, you know,
use 2D sphere for spherical geometry,
for example, its surfaces.
And then avoid over-indexing.
It's very, very important.
Or else it will slow down operation.
It will use up your storage and then monitor,
always monitor.
Monitoring is gonna be a word
that we will hear all the time.
Now, any questions on indexing?
That's too good my side.
Everyone else?
All good.
Now, we can now go and do number five
and then exercise day one.
So if we go and look at number five, it says indexing.
So we can look at the indexing there.
It still uses the same database, which is university,
to do some indexing.
You're going to create some index.
You're going to create compound indexes.
Geospatial, but I don't think you might really,
it might really be worth it, but you can try it.
I don't think it will work though.
But then you also look at query optimization,
where you find a student by email
and you've optimized by a single field.
Or you then look at coding using a compound index,
putting geospatial, not really using any coordinates
or anything, so it might be really tricky.
You can try it, but I don't think it will work.
It might give you an error.
And then there's something called,
what do you call this?
Query profile that you can be able to use.
So also have a look at that.
It also helps in terms of query optimization
and all that stuff.
And then, yeah, after that, there's exercise day one.
Exercise day one is more or less
what we've spoke about the whole entire time.
There's going to be some insert.
There's going to be some deletion, updating,
and then some operations that you need to do.
For example, checking stats, creating an index,
a bit of advanced where you probably need to change
or not two configurations,
and then be able to see how it works.
But then you will find that most of the stuff
that we spoke about or that we tried to do
might actually be there.
For example, yes.
Are you sharing screen or we must just follow on the site?
Oh, I wasn't sharing my screen,
but okay, let me share my screen quickly.
Where is my machine?
This one, okay.
I think I've got it now.
Window, there we go.
Can you see my screen now?
Oh, yes.
Okay, cool.
So I was saying that,
so indexing is more or less you're going to create an index
using the existing database that is there,
which is the university one,
create your complex query, your compound indexing,
and then you do some query optimization,
use a query profiler and all that stuff, right?
And after that, there's exercise day one, right?
There are some things that we probably have done already,
like you've created the university database already,
and then here you need to insert some documents.
Be mindful of the names.
It might be names that are existing.
They're existing, change it or use upset, it's up to you.
And then query finding some courses,
some crude operations where you need to delete something,
some intermediate operations, some collecting of stats,
and then some indexing also,
and then a bit of advanced stuff, for example,
where you're going to do security, storage path,
system logs.
This might not really, you might not really worry about it
because already it's connecting on localhost,
so there's no need to worry.
But in terms of the rest of the stuff,
start Mongo with the configuration file.
If you want to create a separate configuration file
and put this, then you can be able to start it that way.
And yeah, most of the stuff is just what we spoke about.
Hardware and file system,
you might not be able to do that
because obviously you can't add any SSD,
but you can be able to add journaling, right?
And then see what could be happening within the journaling.
And then there's some security aspect of it
that we did already, some security deployment recommendations.
These are just the recommendations, right?
And then you do some monitoring
and then you'll be done for the day.
So let's do number five
and then after that do exercise number one.
Exercise day one, sorry.
Clean?
It's super clean.
Cool.
So we're doing five and exercise.
Exercise day one.
Okay.
Okay, cool.
So I have lost you there when you said,
so you said two, is there something over there?
When you mentioned geospatial.
Yeah, geospatial.
Because we don't have any data
that relates to geospatial, right?
Like coordinates at any point, right?
Within that, you probably might not see anything
that no effect, rather if that makes sense.
Because we don't have that data.
We don't have any way that needs that data.
Like location, for example.
We don't have a situation like it's an Uber application
or an Uber data.
We don't have that within our database.
I can always create a quick script
that can create that data if you want.
But you might not really see the effect
besides just showing you the information in itself.
The effect of it speeding up works
when there's an application
that needs to access the location, right?
But we don't have that application.
So it really would be too much admin
to have that in just the run commands for nothing.
Okay, you see this thing obvious
and convenience test, that's what errors.
What is that with errors?
Unexpected token limit, 5.1 already.
Number? 5.1?
5.1.1.
5.1.1.
What error are you getting?
Who is this now?
It's unexpected token limit.
I'm also getting the same error.
Yeah, I can see the issue before I even tell you.
You're not logged into your database, E2?
Yes, sir.
Log into your database, Mongo SH first.
Mongo SH.
Yes, and then you can then run those.
You can now copy and paste.
Anything that has a use this or DB something
should know that you need to run it within the database.
You need to authenticate.
You need to authenticate.
So exit and then authenticate.
And just go up arrow until you get to the part
where it's supposed to, yeah, that one.
Yeah.
So that went on to.
Yeah.
And then now you can try and run those commands
using VST and yeah.
Did you see the, are you sorted Winnie?
No, I'm not sorted.
Okay, I'll come to your screen just now.
Enter.
This is going to be interesting to figure out.
You're not authorized to do what you did,
which means the user that you're using
is not authorized to do that.                

on 2025-01-27

Visit the Mongo DB for Administrators course recordings page

6 videos