Avsnitt
-
This week, Lois Houston and Nikita Abraham continue their exploration of Oracle AI Vector Search with a deep dive into vector indexes and memory considerations. Senior Principal APEX and Apps Dev Instructor Brent Dayley breaks down what vector indexes are, how they enhance the efficiency of search queries, and the different types supported by Oracle AI Vector Search. Oracle Database 23ai: Oracle AI Vector Search Fundamentals: https://mylearn.oracle.com/ou/course/oracle-database-23ai-oracle-ai-vector-search-fundamentals/140188/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ Twitter: https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript:
00:00
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!
00:26
Nikita: Welcome back to the Oracle University Podcast! I’m Nikita Abraham, Team Lead of Editorial Services at Oracle University, and with me is Lois Houston, Director of Innovation Programs.Lois: Hi everyone! Last week was Part 1 of our discussion on Oracle AI Vector Search. We talked about what it is, its benefits, the new vector data type, vector embedding models, and the overall workflow. In Part 2, we’re going to focus on vector indices and memory.
00:56
Nikita: And to help us break it all down, we’ve got Brent Dayley back with us. Brent is a Senior Principal APEX and Apps Dev Instructor with Oracle University. Hi Brent! Thanks for being with us today. So, let’s jump right in! What are vector indexes and how are they useful?
Brent: Now, vector indexes are specialized indexing data structures that can make your queries more efficient against your vectors. They use techniques such as clustering, and partitioning, and neighbor graphs. Now, they greatly reduce the search space, which means that your queries happen quicker. They're also extremely efficient. They do require that you enable the vector pool in the SGA.
01:42
Lois: Brent, walk us through the different types of vector indices that are supported by Oracle AI Vector Search. How do they integrate into the overall process?
Brent: So Oracle AI Vector Search supports two types of indexes, in-memory neighbor graph vector index. HNSW is the only type of in-memory neighbor graph vector index that is supported. These are very efficient indexes for vector approximate similarity search. HNSW graphs are structured using principles from small world networks along with layered hierarchical organization.
And neighbor partition vector index, inverted file flat index, is the only type of neighbor partition index supported. It is a partition-based index which balances high search quality with reasonable speed.
02:35
Nikita: Brent, you mentioned that enabling the vector pool in the SGA is a requirement when working with vector indexes. Can you explain that process for us?
Brent: In order for you to be able to use vector indexes, you do need to enable the vector pool area. And in order to do that, what you need to do is set the vector memory size parameter.
You can set it at the container database level. And the PDB inherits it from the CDB. Now bear in mind that the database does have to be balanced when you set the vector pool.
03:12
Lois: Ok. Are there any other considerations to keep in mind when using vector indices?
Brent: Vector indexes are stored in this pool, and vector metadata is also stored here. And you do need to restart the database. So large vector indexes do need lots of RAM, and RAM constrains the vector index size. You should use IVF indexes when there is not enough RAM. IVF indexes use both the buffer cache as well as disk.
03:42
Nikita: And what about memory considerations?
Brent: So to remind you, a vector is a numerical representation of text, images, audio, or video that encodes the features or semantic meaning of the data, instead of the actual contents, such as the words or pixels of an image. So the vector is a list of numerical values known as dimensions with a specified format.
Now, Oracle does support the int8 format, the float32 format, and the float64 format. Depending on the format depends on the number of bytes. For instance, int8 is one byte, float32 is four bytes. Now, Oracle AI Vector Search supports vectors with up to 65,535 dimensions.
04:34
Lois: What should we know about creating a table with a vector column?
Brent: Now, Oracle Database 23ai does have a new vector data type. The new data type was created in order to support vector search.
The definition can include the number of dimensions and can include the format. Bear in mind that either one of those are optional when you define your column. The possible dimension formats are int, float 32, and float 64. Float 32 and float 64 are IEEE standards, and Oracle Database will automatically cast the value if needed.
05:18
Nikita: Can you give us a few declaration examples?
Brent: Now, if we just do a vector type, then the vectors can have any arbitrary number of dimensions and formats. If we describe the vector type as vector * , *, then that means that vectors can have an arbitrary number of dimensions and formats. Vector and vector * , * are equivalent. Vector with the number of dimensions specified, followed by a comma, and then an asterisk, is equivalent to vector number of dimensions.
Vectors must all have the specified number of dimensions, or an error will be thrown. Every vector will have its dimension stored without format modification. And if we do vector asterisk common dimension element format, what that means is that vectors can have an arbitrary number of dimensions, but their format will be up-converted or down-converted to the specified dimension element format, either INT8, float 32, or float 64.
06:25
Working towards an Oracle Certification this year? Take advantage of the Certification Prep live events in the Oracle University Learning Community. Get tips from OU experts and hear from others who have already taken their certifications. Once you’re certified, you’ll gain access to an exclusive forum for Oracle-certified users. What are you waiting for? Visit mylearn.oracle.com to get started.
06:52
Nikita: Welcome back! Brent, what is the vector constructor and why is it useful?
Brent: Now, the vector constructor is a function that allows us to create vectors without having to store those in a column in a table. These are useful for learning purposes. You use these usually with a smaller number of dimensions. Bear in mind that most embedding models can contain thousands of different dimensions. You get to specify the vector values, and they usually represent two-dimensional like xy coordinates. The dimensions are optional, and the format is optional as well.
07:29
Lois: Right. Before we wrap up, can you tell us how to calculate vector distances?
Brent: Now, vector distance uses the function VECTOR_DISTANCE as the main function. This allows you to calculate distances between two vectors and, therefore, takes two vectors as parameters. Optionally, you can specify a metric. If you do not specify a metric, then the default metric, COSINE, would be used. You can optionally use other shorthand functions, too. These include L1 distance, L2 distance, cosine distance, and inner product. All of these functions also take two vectors as input and return the distance between them. Now the VECTOR_DISTANCE function can be used to perform a similarity search. If a similarity search query does not specify a distance metric, then the default cosine metric will be used for both exact and approximate searches.
If a similarity search does specify a distance metric in the VECTOR_DISTANCE function, then an exact search with that distance metric is used if it conflicts with the distance metric specified in a vector index. If the two distance metrics are the same, then this will be used for both exact as well as approximate searches.
08:58
Nikita: I was wondering Brent, what vector distance metrics do we have access to?
Brent: We have Euclidean and Euclidean squared distances. We have cosine similarity, dot product similarity, Manhattan distance, and Hamming similarity. Let's take a closer look at the first of these metrics, Euclidean and Euclidean squared distances. This gives us the straight-line distance between two vectors. It does use the Pythagorean theorem. It is sensitive to both the vector size as well as the direction.
With Euclidean distances, comparing squared distances is equivalent to comparing distances. So when ordering is more important than the distance values themselves, the squared Euclidean distance is very useful as it is faster to calculate than the Euclidean distance, which avoids the square root calculation.
09:58
Lois: And the cosine similarity metrics?
Brent: It is one of the most widely used similarity metrics, especially in natural language processing. The smaller the angle means they are more similar. While cosine distance measures how different two vectors are, cosine similarity measures how similar two vectors are.
Dot product similarity allows us to multiply the size of each vector by the cosine of their angle. The corresponding geometrical interpretation of this definition is equivalent to multiplying the size of one of the vectors by the size of the projection of the second vector onto the first one or vice versa. Larger means that they are more similar. Smaller means that they are less similar.
Manhattan distance is useful for describing uniform grids. You can imagine yourself walking from point A to point B in a city such as Manhattan. Now, since there are buildings in the way, maybe we need to walk down one street and then turn and walk down the next street in order to get to our result. As you can imagine, this metric is most useful for vectors describing objects on a uniform grid such as city blocks, power grids, or perhaps a chessboard.
11:27
Nikita: And finally, we have Hamming similarity, right?
Brent: This describes where vector dimensions differ. They are binary vectors, and it tells us the number of bits that require change to match. It compares the position of each bit in the sequence. Now, these are usually used in order to detect network errors.
11:53 Nikita: Brent, thanks for joining us these last two weeks and explaining what Oracle AI Vector Search is. If you want to learn more about what we discussed today, visit mylearn.oracle.com and search for the Oracle Database 23ai: Oracle AI Vector Search Fundamentals course.Lois: This concludes our season on Oracle Database 23ai New Features for administrators. In our next episode, we’re going to talk about database backup and recovery, but more on that later! Until then, this is Lois Houston…
Nikita: And Nikita Abraham signing off!
12:29
That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
-
In this episode, Senior Principal APEX and Apps Dev Instructor Brent Dayley joins hosts Lois Houston and Nikita Abraham to discuss Oracle AI Vector Search. Brent provides an in-depth overview, shedding light on the brand-new vector data type, vector embeddings, and the vector workflow. Oracle Database 23ai: Oracle AI Vector Search Fundamentals: https://mylearn.oracle.com/ou/course/oracle-database-23ai-oracle-ai-vector-search-fundamentals/140188/ Oracle Database 23ai: SQL Workshop: https://mylearn.oracle.com/ou/course/oracle-database-23ai-sql-workshop/137830/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ Twitter: https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode. --------------------------------------------------------- Episode Transcript:
00:00
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!
00:26
Lois: Hello and welcome to the Oracle University Podcast! I’m Lois Houston, Director of Innovation Programs here at Oracle University. Joining me as always is our Team Lead of our Editorial Services, Nikita Abraham.
Nikita: Hi everyone! Thanks for tuning in over the last few months as we’ve been discussing all the Oracle Database 23ai new features. We’re coming to the end of the season, and to close things off, in this episode and the next one, we’re going to be talking about the fundamentals of Oracle AI Vector Search. In today’s episode, we’ll try to get an overview of what vector search is, why Oracle Vector Search stands out, and dive into the new vector data type. We’ll also get insights into vector embedding models and the vector workflow.
01:11
Lois: To take us through all of this, we’re joined by Brent Dayley, who is a Senior Principal APEX and Apps Development Instructor with Oracle University. Hi Brent! Thanks for joining us today. Can you tell us about the new vector data type?
Brent: So this data type was introduced in Oracle Database 23ai. And it allows you to store vector embeddings alongside other business data. Now, the vector data type allows a foundation to store vector embeddings.
01:42
Lois: And what are vector embeddings, Brent?
Brent: Vector embeddings are mathematical representations of data points. They assign mathematical representations based on meaning and context of your unstructured data. You have to generate vector embeddings from your unstructured data either outside or within the Oracle Database. In order to get vector embeddings, you can either use ONNX embedding machine learning models or access third-party REST APIs. Embeddings can be used to represent almost any type of data, including text, audio, or visual, such as pictures. And they are used in proximity searches.
02:28
Nikita: Hmmm, proximity search. And similarity search, right? Can you break down what similarity search is and how it functions?
Brent: So vector data tends to be unevenly distributed and clustered into groups that are semantically related. Doing a similarity search based on a given query vector is equivalent to retrieving the k nearest vectors to your query vector in your vector space. What this means is that basically you need to find an ordered list of vectors by ranking them, where the first row is the closest or most similar vector to the query vector. The second row in the list would be the second closest vector to the query vector, and so on, depending on your data set. What we need to do is to find the relative order of distances. And that's really what matters rather than the actual distance.
Now, similarity searches tend to get data from one or more clusters, depending on the value of the query vector and the fetch size. Approximate searches using vector indexes can limit the searches to specific clusters. Exact searches visit vectors across all clusters.
03:44
Lois: Ok. I want to move on to vector embedding models. What are they and why are they valuable?
Brent: Vector embedding models allow you to assign meaning to what a word, or a sentence, or the pixels in an image, or perhaps audio. It allows you to quantify features or dimensions. Most modern vector embeddings use a transformer model. Bear in mind that convolutional neural networks can also be used. Depending on the type of your data, you can use different pretrained open source models to create vector embeddings. As an example, for textual data, sentence transformers can transform words, sentences, or paragraphs into vector embeddings.
04:33
Nikita: And what about visual data?
Brent: For visual data, you can use residual network also known as ResNet to generate vector embeddings. You can also use visual spectrogram representation for audio data. And that allows us to use the audio data to fall back into the visual data case. Now, these can also be based on your own data set. Each model also determines the number of dimensions for your vectors.
05:02
Lois: Can you give us some examples of this, Brent?
Brent: Cohere's embedding model, embed English version 3.0, has 1,024 dimensions. Open AI's embedding model, text-embedding-3-large, has 3,072 dimensions.
05:24
Want to get the inside scoop on Oracle University? Head over to the Oracle University Learning Community. Attend exclusive events. Read up on the latest news. Get first-hand access to new products. Read the OU Learning Blog. Participate in Challenges. And stay up-to-date with upcoming certification opportunities. Visit mylearn.oracle.com to get started.
05:50
Nikita: Welcome back! Let’s now get into the practical side of things. Brent, how do you import embedding models?
Brent: Although you can generate vector embeddings outside the Oracle Database using pre-trained open source embeddings or your own embedding models, you also have the option of doing those within the Oracle Database. In order to use those within the Oracle Database, you need to use models that are compatible with the Open Neural Network Exchange Standard, or ONNX, also known as Onyx.
Oracle Database implements an Onyx runtime directly within the database, and this is going to allow you to generate vector embeddings directly inside the Oracle Database using SQL.
06:35
Lois: Brent, why should people choose to use Oracle AI Vector Search?
Brent: Now one of the biggest benefits of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data, all in one single system. This is very powerful, and also a lot more effective because you don't need to add a specialized vector database. And this eliminates the pain of data fragmentation between multiple systems.
It also supports Retrieval Augmented Generation, also known as RAG. Now this is a breakthrough generative AI technique that combines large language models and private business data. And this allows you to deliver responses to natural language questions. RAG provides higher accuracy and avoids having to expose private data by including it in the large language model training data.
07:43
Nikita: In the last part of our conversation today, I want to ask you about the Oracle AI Vector Search workflow, starting with generating vector embeddings.
Brent: Generate vector embeddings from your data, either outside the database or within the database. Now, embeddings are a mathematical representation of what your data meaning is. So what does this long sentence mean, for instance? What are the main keywords out of it?
You can also generate embeddings not only on your typical string type of data, but you can also generate embeddings on other types of data, such as pictures or perhaps maybe audio wavelengths.
08:28
Lois: Could you give us some examples?
Brent: Maybe we want to convert text strings to embeddings or convert files into text. And then from text, maybe we can chunk that up into smaller chunks and then generate embeddings on those chunks. Maybe we want to convert files to embeddings, or maybe we want to use embeddings for end-to-end search.
Now you have to generate vector embeddings from your unstructured data, either outside or within the Oracle Database. You can either use the ONNX embedding machine learning models or you can access third-party REST APIs.
You can import pre-trained models in ONNX format for vector generation within the database. You can download pre-trained embedding machine learning models, convert them into the ONNX format if they are not already in that format. Then you can import those models into the Oracle Database and generate vector embeddings from your data within the database.
Oracle also allows you to convert pre-trained models to the ONNX format using Oracle machine learning for Python. This enables the use of text transformers from different companies.
09:51
Nikita: Ok, so that was about generating vector embeddings. What about the next step in the workflow—storing vector embeddings?
Brent: So you can create one or more columns of the vector data type in your standard relational data tables. You can also store those in secondary tables that are related to the primary tables using primary key foreign key relationships.
You can store vector embeddings on structured data and relational business data in the Oracle Database. You do store the resulting vector embeddings and associated unstructured data with your relational business data inside the Oracle Database.
10:30
Nikita: And the third step is creating vector indexes?
Brent: Now you may want to create vector indexes in the event that you have huge vector spaces. This is an optional step, but this is beneficial for running similarity searches over those huge vector spaces.
So once you have generated the vector embeddings and stored those vector embeddings and possibly created the vector indexes, you can then query your data with similarity searches. This allows for Native SQL operations and allows you to combine similarity searches with relational searches in order to retrieve relevant data.
11:15
Lois: Ok. I think I’ve got it. So, Step 1, generate the vector embeddings from your unstructured data. Step 2, store the vector embeddings. Step 3, create vector indices. And Step 4, combine similarity and keyword search.
Brent: Now there is another optional step. You could generate a prompt and send it to a large language model for a full RAG inference. You can use the similarity search results to generate a prompt and send it to your generative large language model in order to complete your RAG pipeline.
11:59
Lois: Thank you for sharing such valuable insights about Oracle AI Vector Search, Brent. We can’t wait to have you back next week to talk about vector indices and memory.
Nikita: And if you want to know more about Oracle AI Vector Search, visit mylearn.oracle.com and check out the Oracle Database 23ai: Oracle AI Vector Search Fundamentals course.
Lois: Yes, and if you're serious about advancing in your development journey, we recommend taking the Oracle Database 23ai SQL workshop. It’s designed for those who might be familiar with SQL from other database platforms or even those completely new to SQL.
Nikita: Yeah, we’ll add the link to the workshop in the show notes so you can find it easily. Until next week, this is Nikita Abraham…
Lois: And Lois Houston signing off!
12:45
That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
-
Saknas det avsnitt?
-
In this episode, Lois Houston and Nikita Abraham explore the Automatic Transaction Quarantine feature with Senior Principal Database & MySQL Instructor, Bill Millar. Bill explains that this feature isolates transactions that could potentially cause system crashes, preventing them from impacting the entire container database. They also discuss the key advantages of automatic transaction quarantine in maintaining database stability and availability. Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/140830/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript:
00:00
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!
00:26Nikita: Welcome to the Oracle University Podcast! I’m Nikita Abraham, Team Lead: Editorial Services with Oracle University, and with me is Lois Houston, Director of Innovation Programs.
Lois: Hi there! In our last episode, we looked at an Oracle Database 23ai new feature called Automatic Transaction Rollback, and we spoke about why it is such an important feature for database administrators.
00:51
Nikita: Today, we’re going to talk about another new feature called Automatic Transaction Quarantine. We’ll discuss what it is, go through the steps to monitor and identify quarantine transactions, explore how an issue is resolved once a quarantined transaction has been identified, and end by looking at quarantined transaction escalation, and how it helps to protect not only your PDB, but also your container database.
Lois: Back with us is Bill Millar, our Senior Principal Database & MySQL Instructor with Oracle University. Hi Bill! What is automatic transaction quarantine and why do we need it?
01:27
Bill: The good news is that starting in 23c with the database quarantines, it's going to isolate a transaction or transactions that could possibly cause a system crash, so you can avoid crashes. It's going to isolate those transactions that potentially could cause a problem. However, those transactions must be manually resolved by the DBA so that the row locks are released from those bad transactions.
A transaction recovery basically is going to isolate failure and also identify what is the cause of that corruption. So when a system restarts, transaction can fail to recover while the other transactions can be recovered. So with the transaction recovery, basically, we know when the system recovers, the SMON is going to use the redo and the undo.
02:27
Nikita: Can you explain that in a little more detail? How does transaction recovery work and why is it so critical for database stability?
Bill: It does the redo to roll forward the database. However, at that point, it'll go ahead and open the database, allow it to start being used while it is applying the undo. And when it cannot apply that undo, that's when the system is going to mark that transaction as bad for that.
That is what is transaction recovery. Whereas instance recovery is basically the same thing, except now you're in a RAC environment. And it's unable to be recovered on one of the instances within your RAC environment.
Because it can be, it'll have those rows locked, and it can affect the other instances. So SMON might be unable to perform that recovery, so it could cause that PDB or the CDB to crash. OK, now, nobody can access any information.
So once if that entire container crashes, recovery is going to stop. If it has a bad transaction, recovery stops. So it might be because of physical data, might be because of the index is corrupt, might be logical corruption.
So it stops that interactive transaction recovery process. So not only does it stop the recovery of the transaction that is trying to be recovered by SMON, it's going to stop the rest of the inactive transactions. Those row locks are held.
And it can impact critical operations. Yeah, if my system can't do anything, yes, it's going to have an impact. The DBAs must resolve what is that bad transaction, how to get rid of it, how we're going to get around it?
04:12
Lois: Bill, what’s the workflow a DBA would follow when a transaction is quarantined?
Bill: So in the system, when that transaction recovery failure is, OK, I've found this dead transaction.
I'm going to quarantine. I'm going to say, hey, you have something you need to take care of for that. So it's not recovered by the SMON. So what's going to happen?
So there is also is going to be a limit. So if it does reach that limit and the limit is three, then you're going to have to step in and try to take care of that very quickly.
The shut down abort will be performed on the PDB. So the good news there is that it's going to keep it from impacting the entire container. If the limit isn't reached, well, then, OK, hey, we have this bad transaction that's going to quarantine, is going to populate.
There's a couple of views that you can go out and look at. There's a CDB quarantine transactions or a DBA quarantine transactions. Those views you can look at. And then once we determine that, what are we going to do to try to recover it?
If we're going to try to recover it, then we can go ahead and drop that bad transaction. It'll help free up the rows. That way, everything can start working again. That PDB can be opened.
05:30
Nikita: What can you tell us about monitoring quarantined transactions? What specific views or logs should DBAs monitor?
Bill: So you can view.
You'll see these quarantine transactions in several different places. One is the alert queue. It's going to be sent to the alert queue. That is what is going to notify Enterprise Manager Cloud Control, also populates it within the AWR.
Back in 21c, we added the attention log. It shows critical events. Hey, you need to take a look at this. It also can populate it. It will populate it to the alert log.
So remember you have the V$DIAG_ALERT that you can look at. Or, if you're familiar with or you use the ADRCI, automatic diagnostic repair recovery advisor, so you can also look at the alert log there. So there are two new views, the CDB_QUARANTINED_TRANSACTION, the DBA_QUARANTINE_TRANSACTIONS working with multi-tenant. The CDB, I can see all the quarantine transactions from the root container, the DBA_QUARANTINE_TRANSACTIONS what I see if I'm in a specific PDB. But it's going to give me the information.
06:52
Lois: What about resolving quarantined transactions?
Bill: Monitoring is a must to be able to identify, hey, we have bad transactions that we need to-- quarantine transactions we need to take care of. You can apply the appropriate MOS note if you're not sure what to do. Like anything else, if something happens-- and hopefully, you're not getting quarantined transactions daily or anything like that. But once we start doing a few things, we remember how to do them.
07:21
Lois: And, how do we take care of this?
Bill: Well, you always have the ability to go to My Oracle Support. There is a view called-- that CDB quarantine transaction that we talked about that we can look at, hey, here's the reason. And we might use that to go out there and search My Oracle Support and/or contact Oracle Support.
07:49
Do you have an idea for a new course or learning opportunity? We’d love to hear it! Visit the Oracle University Learning Community and share your thoughts with us on the Idea Incubator. Your suggestion could find a place in future development projects! Visit mylearn.oracle.com to get started.
08:09
Nikita: Welcome back! Bill, what are some of the common causes of quarantined transactions? Could you share some examples with us? And how do you resolve them?
Bill: One could be physical corruptions. It could either be logical or physical. So maybe because media failed. Hardware bits get flipped. So that might be able to be easily fixed by using the RMAN Block Media Recovery. And that's the lowest level of recovery that we can apply.
And then there's logical corruptions. This is the recommended order when trying to resolve logical corruptions. First level is the Block Media Recovery. And then, after that, if the Block Media Recovery fails, then possibly, how about re-creating that data segment? So either truncate or drop it, and then recover it from another source. So once you drop the segment, the transaction then is going to skip trying to recover it. It's no longer there. So it's, OK, hey, I'm successful now.
And then, the last resort type method is to drop that undo segment. There's an offline rollback segment that you can use. But it's recommended-- it's best to avoid that-- again, kind of a last-ditch effort to try to fix something. There are other options that you might try. However, these options do end up being a loss of data. Why? Because we're going to do a point-in-time recovery.
So we can go back to a table point-in-time recovery. So we start with the Block Media Recovery. OK, we can't. OK, so how about if we go back before that transaction and try to recover the table at that time? So it will be a loss of data.
Then, the next level is, we can't do the table. Can we do the entire tablespace? That might be an option. Might flashback the database if we are using-- if we have Flashback Database on. Again, that's just another method of point-in-time recovery. And then also do a database point-in-time recovery.
If we can do the database point-in-time recovery flashback at the PDB level, so that way it's not impacting the entire container, hopefully, we don't have to try to do a point-in-time recovery at the database level. So we wouldn't want to do that. That would something really drastic would have to happen to force us to do the entire container. But we want to do that at the PDB level.
10:54
Lois: Ok. So the issue is resolved. What happens next?
Bill: So once we have the issue resolved that caused that, SMON is still going to try to do transaction recovery because why? That quarantined transaction says, hey, I've still got this bad transaction there. So once that transaction has been fixed, we need to drop that quarantined transaction. So that way, SMON says, hey, I have this transaction. I need to recover. SMON will keep from trying to do that.
So there is a DDL command to drop that quarantined transaction. So remember, from the views, the quarantined transaction views, that's where we saw the undo segment. We saw the slot number. We saw the quarantined transaction slot number. So that way, we can drop that transaction by using that.
11:51
Nikita: How does the escalation process work for quarantined transactions? And why is it important to protect the PDB and the container database?
Bill: So quarantined transaction escalation-- we might have multiple transactions fail, depending on the corruption level. It might have multiple blocks for that that have failed. So just to quarantine a bad transaction may not help whatsoever. It depends on what the root cause is for the failures and how many are happening at that time. So the database with these bad transactions will continuously run in an inconsistent state. So it could be dangerous if we have multiples of the same issue and that.
So with that system running in an inconsistent state, things will continue to spread. Things will continue to get worse. That's why, once that level of 3 is reached, we go ahead, and we do a shut down abort on that PDB. Because if a transaction can't be recovered, there's no need in trying to do any other type of shutdown.
So with this escalation process, it does benefit us because, again, SMON is going to continuously try to recover that bad transaction for that. OK, SMON's going to keep trying. It's not going to work. And at some point, it might cause it to crash. So by stopping it before it continues getting worse, damaging more, we're going to go ahead and say we're escalating this issue to where we're shutting down the PDB.
Fault tolerance, so meaning that we have higher availability of the rest of the container. So it's not going to crash the entire container. So the PDB can continue to operate when we are trying to resolve transactions except in the case where it exceeds the amount, and it does a shutdown abort on that PDB.
So with that escalation, we reach that limit of 3 for that. We do a Shutdown Abort on that PDB. That transaction recovery is disabled. OK. Don't try to recover any transactions. Why? Because we know we have a few of them. So it's shut down, so we're going to go out and look at our quarantine transactions views, what's the reason for that, how many do we have?
And then, once we resolve the issue, we are going to enable recovery again because it turns off the recovery option before it allows us to open that PDB. It's not going to be in a consistent state, though. So now we can go ahead and alter the system and, OK, go ahead and allow recovery of transactions again.
14:42
Lois: Thank you, Bill, for walking us through the details of automatic transaction quarantine and telling us how to manage and resolve these complex scenarios.
Nikita: Yeah, thanks Bill! To learn more about what we discussed today, visit mylearn.oracle.com and search for the Oracle Database 23ai New Features for Administrators course. Join us next week for a discussion on some more Oracle Database 23ai new features. Until then, this is Nikita Abraham…
Lois: And Lois Houston signing off!
15:13
That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
-
Join Lois Houston and Nikita Abraham as they discuss the Automatic Transaction Rollback feature with Senior Principal Database & MySQL Instructor, Bill Millar. Bill explains that in the 23ai release, transactions blocking other transactions can now be automatically rolled back, depending on certain parameters. Bill highlights the advantages of using automatic transaction rollback, which eliminates the time-consuming process of manually terminating blocking transactions. They also cover the workload reduction benefits for database administrators. Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/140830/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode. ------------------------------------------------------- Episode Transcript:
00:00
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!
00:26Lois: Hello and welcome back to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Team Lead of Editorial Services.
Nikita: Hi everyone! Last week, we looked at two Oracle Database 23ai new features related to Data Manipulation Language, or DML. One was Unrestricted Parallel DMLs and the other was Unrestricted Direct Loads. Do check out that episode if you missed it.
00:56
Lois: Today, we have Senior Principal Database & MySQL Instructor, Bill Millar, with us. He’s been on several times this season taking us through all the different 23ai new features. In this episode, we’re going to ask him about the Automatic Transaction Rollback feature. Hi Bill! What is automatic transaction rollback and why is it an important feature for database administrators?
01:22
Bill: We can now have transactions that are blocking other transactions, depending on some settings, to automatically roll back. It does require some parameters to be set. Rows basically get locked in a single row. Each row is locked based off of what type of activity is being performed on that row, such as inserts, updates, deletes, merge, select for updates.
01:52
Nikita: And how were things before this feature?
Bill: Traditionally, the database administrator had to research and manually terminate blocking transactions, or there are some things that resource manager might have been able to do.
02:05
Lois: This seems like such a game-changer for DBAs, Bill. So, how does it work?
Bill: So there are some parameters that control the automatic rollback. One is the transaction priority. We're going to set that priority for a transaction either to medium, high, or low. We have the high priority wait target and a medium priority wait target that we can set.
The high wait target will terminate if a medium transaction is blocking that high target based off of the values that we set, the medium transaction can be terminated. A medium transaction will terminate a low priority. So if a transaction designated as low exceeds the blocking time that we set for the medium priority wait time, then it'll be terminated. Whereas, the high priority will terminate both medium and low transactions.
We have the rollback mode. We're either going to roll back or we're going to track, depending on what we're trying to do.
03:10
Nikita: So, if I decide that I want to use automatic transaction rollback… if I decide to implement it…I’ll need to set those parameters, right?
Bill: So we can set those at a session level. We also have some system level wait targets. What are the wait times for the medium, high transactions? How long they are going to wait for those lower transactions?
And then we also have the rollback mode. Are we actually going to roll back or are we just going to track for right now? We have to determine what is going to be the wait times for those transactions that we want to wait before those lower transactions, priority transactions are rolled back?
At that session level, we're going to set the session. High is the default. So if we want transactions to run at a lower, we have to set those. So we can set the medium or low because that's going to determine how they're rolled back.
So, what is that rollback order? Again the low, we'll roll back any low that's blocking mediums. High, we'll roll back any mediums or lows that are blocking. So you do need to have the understanding of that application, and how critical are the different transactions, because if you start rolling back transactions, what? It does-- If you roll back the transactions, it does generate a little research, a little bit more work on why did that happen.
04:38
Lois: Yeah… you don't want to set it without really understanding what you’re doing. Ok, so, what else do I need to know?
Bill: So we do have the system level wait targets again. How long is the high priority transaction going to wait for a lower transaction before it rolls it back? How long that medium priority is going to wait?
We use the ALTER SYSTEM SET command. It does have a range of values from one second to 2,147,483,647 seconds. That's like 68 years. Might not want to wait 68 years for a transaction to be rolled back.
We can set it at the PDB level. Each pluggable can have a different value. And it can have a different value in the different RAC instances. We have those system level wait targets that we want to set. Automatic rollback. In order for it to function, all the parameters have to be set properly.
What is that transaction priority? We saw the medium, high, low. What is the wait target? How long is the medium is going to wait? How long is the low is going to wait? We set that in seconds. The order of those transactions determine how they are terminated.
05:53
Lois: Earlier on, you mentioned rollback mode. Can you tell us a little more about it?
Bill: So with that automatic rollback mode, there's only two valid values. It is considered advanced parameter. We can either set it in rollback, which is the default, or we can put it in track mode. Track mode gives us the ability to try it out. I guess you can say.
It will say, hey, if I would have been running, if I would have been used, I would have terminated this transaction. It'll show me the number of times it would have happened for high priority, the number of times it would happen for a medium priority. It is modifiable in the PDB, but however, the track mode must be the same in each instance.
So that rollback mode, again, that is the default value for that. So statistics are going to be available. So how many high priority rollbacks occurred? How many medium rollbacks occurred?
In that track mode, I have to set that value. I do have to have the time set for how long is it going to wait for those, so the high and medium. And those priorities has to be set in the session.
So statistics are available for the high and the medium in the track mode. Not only when we're actually rolling back, but also tracking. Again, this gives us the ability, by having it in the track mode, gives us the ability to do a little testing with it first.
07:27
The Oracle University Learning Community is an excellent place to collaborate and learn with Oracle experts and fellow learners. Grow your skills, inspire innovation, and celebrate your successes. All your activities, from liking a post to answering questions and sharing with others, will help you earn a valuable reputation, badges, and ranks to be recognized in the community. Visit mylearn.oracle.com to get started.
07:55
Nikita: Welcome back! Bill, when it comes to monitoring, how do you keep track of these rollbacks?
Bill: For monitoring our rollback transactions, the data dictionary information is available to assist with monitoring our transaction priority. So from the V$TRANSACTIONS, there are columns available allowing us to do that.
Based off that transaction priority shows what is the wait target for that. And then also each of the priority of those transactions. We can view this information, it will be populated to the alert log. So we can see that session ID, what was session ID of that? What was the transaction ID? What was the priority? What was the system identifier for that?
It tells you-- even tells you the parameter and tells you what that wait time was set at. If it was a medium transaction that was terminated, it shows, OK, it was a medium. So we can view the alert log. And we can look for these terminations. Gives an idea of what's being done.
09:01
Nikita: And finally, what are the key advantages of using automatic transaction rollback?
Bill: It eliminates a very manual process. It can be very time-consuming for the DBA to go out there and try to find what's the blocking session.
Yep, I'll go ahead and do an ALTER SYSTEM. I'll kill that session trying to track it down, finding the views to look at it to say, OK, Yeah, this is the blocking one. I want to go ahead and take care of it. Resource manager doesn't really fully address blocking transactions.
Some things that can do for that. We have the maximum estimate execution time. So that's the number in CPU seconds allowed for that call. It's terminated. It doesn't matter whether it's blocking another session or not in that case or even another transaction. It just says, OK, you exceeded this time. I'm going to terminate you.
Then we also have the max idle time again. That's maximum session idle time. All right. You haven't been doing anything to session, we're going to terminate you. And then we have the MAX_IDLE_BLOCKER. That's the time duration of an idle session can block another session. Again, it's going to check OK, is the session actually idle? But these don't really address the issue of, hey, I have a higher priority transaction waiting for a lower transaction that's blocking it.
10:27
Lois: Thank you, Bill, for that breakdown. This feature is such a time saver.
Nikita: Yeah, and such good way to reduce the manual workload for DBAs. Thanks Bill!
Lois: To learn more about what we discussed today and view some of the demonstrations of this feature, visit mylearn.oracle.com and search for the Oracle Database 23ai New Features for Administrators course. Join us next week for a discussion on some more Oracle Database 23ai new features. Until then, this is Lois Houston…
Nikita: And Nikita Abraham signing off!
11:02
That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
-
In this episode, hosts Lois Houston and Nikita Abraham discuss new features in Oracle Database 23ai related to Data Manipulation Language (DML). They are joined by Senior Principal Database & MySQL Instructor, Bill Millar, who explains the concept of unrestricted parallel DMLs and their importance in speeding up large operations and maintaining summary tables. The discussion then turns to unrestricted direct loads, examining the evolution of direct loads with 23ai and the broader impact of these changes. Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/140830/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript:
00:00
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!
00:26
Nikita: Welcome to the Oracle University Podcast! I’m Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs.
Lois: Hi there! In our last episode, we discussed a ground-breaking caching solution in Oracle Database 23ai, known as True Cache. We spoke about its configuration and deployment, and explored how to apply True Cache to our applications.
Nikita: Today, we’re going to talk about two Oracle Database 23ai new features related to Data Manipulation Language, or DML. The first is Unrestricted Parallel DMLs and then we’ll move on to Unrestricted Direct Loads. We’ll talk about the situation prior to 23ai, identify the improvements that have been made, and look at their benefits.
01:15
Lois: And returning for another episode is Bill Millar, our Senior Principal Database & MySQL Instructor with Oracle University. Hi Bill! So, to start, can you explain what unrestricted parallel DMLs are and why they are important, especially in the context of Oracle Database?
Bill: The Oracle Database allows DML statements such as inserts, updates, deletes, merge to be executed in parallel by breaking those statements into smaller task. These transactions can contain multiple DML statements. And they can modify multiple different tables.
So transactions with the parallel DML is going to use the execution method by breaking up those large operations to execute the transaction in parallel. It helps speed up the large operations. And it's advantageous to data warehouse environments where we're maintaining like summary tables, historical tables. And even in OLTP systems, it can be beneficial for long-running batch jobs.
The scale up. Well, it's basically dividing the executing SQL against those large tables and indexes into those smaller units of work.
02:36
Nikita: So, what were the limitations prior to 23ai?
Bill: So once that object was modified by APLL statement, the object cannot be read or modified later in the same transaction. After that parallel DML modifies a table, there is no follow-on DML or query on the same table within that same transaction. If any attempt to access a table modified by that parallel statement, the transaction would be rejected.
You're only allowed to query on those tables prior to that DML on that object itself.
03:16
Lois: Ok… So with these new improvements, I’m guessing some of these restrictions have been removed?
Bill: In this case, in the same session, you can query the table multiple times. You can perform conventional DML on the same table within the same session. And you can also have multiple direct loads in the same session without having to do that commit.
But there are still some restrictions with it. Heap tables only. You can't do it with any clustered tables or IOT, Index Organized Tables. Non-ASSM, the Automatic Segment Space Management tables. The temp table is not under ASSM. Why? Because it has to have uniform extents or any other tablespaces that you created with the uniform extents. So those restrictions still apply.
So some of the improvements are some of the restrictions can help reduce the overhead. We can enable Parallel DML within that session. Allows the multiple operations on the same object. And it doesn't require that commit for each separate operation.
Makes it a little bit easier to use by removing some of these limitations. Now users can run parallel DMLs and any combination of statements within that same transaction. And it can help simplify and speed up data loading analytic processes by making the database, the parallel execution and parallel queries, at the same time within that same session, again, eliminating having to do commits.
04:58
Nikita: Thanks for that summary of all the improvements, Bill. Now, how do you enable this? Is it enabled by default?
Bill: To enable the Parallel DML mode, it is required for a session. It is disabled by default. That's because the Parallel DML and Serial DML, they have different locking, different ways to handle the transactions, different disk space requirements.
When Parallel DML is enabled in a session, all DML statements are considered for parallel execution. Only a statement is considered for parallel execution when the Enable Parallel DML hint is used if I don't set it for a session. The sessions DML mode does not influence any parallelism of DDL statements. When the Parallel DML is disabled, no DML is executed in parallel, even if the hint is used.
05:59
Lois: Bill, I would like to dig a little deeper into the benefits. How do these lifted restrictions improve the overall performance and reduce overhead?
Bill: There's no longer that requirement to commit everything separately. So that's going to reduce the overhead, not having to do the commit all the time.
The scalability of accessing those large objects, executing parallel makes the decision support systems, those data warehouses and batch OLTP jobs or any other larger DML operation execute faster.
By removing that one touch limitation, it allows the parallel DML statements to be read or modified by later statements of the same transaction in the same session. It's very similar to the non-parallel statements. And even OLTP systems can also benefit, for example, maintaining a larger operation, such as the creation of indexes, refreshing tables, or even creating summary tables.
07:14
Did you know that Oracle University offers free courses on Oracle Cloud Infrastructure? You’ll find training on everything from cloud computing, database, and security to artificial intelligence and machine learning, all free to subscribers. So, what are you waiting for? Pick a topic, leverage the Oracle University Learning Community to ask questions, and then sit for your certification. Visit mylearn.oracle.com to get started.
07:42
Nikita: Welcome back! Let's move on to the next new feature, which is unrestricted direct loads. Bill, what was the situation with direct loads like, prior to 23ai?
Bill: After a direct load and prior-- it was always prior to a commit, queries in additional DMLs were not allowed on that same table. You might encounter the ORA error, the 12838, saying, hey, you can't read or modify this in parallel. That's because the DML on that direct load had access to that and that session for that. So you might have received that error.
The enq contention, the wait event for the direct load issue in a different session from the other sessions during the direct load is having to wait, because of that queuing that-- because a transaction gets that table, locks that information to keep that table from being modified until that direct load has actually committed.
Within the same transaction, within the same session, trying to do multiple DMLs with the-- while it is being modified with the direct loads itself. Unlike conventional loads, the direct loads, as the new blocks and extents are added to the segment, the high water mark does not actually get moved until the actual commit itself. So that's why there is restrictions in the same session or even in other sessions to be able to do anything. So to prevent the errors, the applications had to do a commit immediately after that direct load to prevent those errors from happening.
Well now, there are restrictions when that direct load was done prior to that commit for that. The same table in the same session, couldn't query, couldn't do any additional DMLs, couldn't do any additional parallel DMLs. And even in other sessions, queries were not allowed on the same tables that was in use by the other session. So no additional conventional DMLs, no additional parallel DMLs were allowed.
10:09
Lois: Ok.. it was restrictive in what could be done. So, how have direct loads evolved with the 23ai release?
Bill: Some of those previous restrictions have been lifted in that same session with that same table. So now you can immediately-- and notice that we're talking here, same session, same table. All right. So you can query multiple times within that same session. You can perform additional DML and you can also do multiple direct loads in the same session without having to do that commit.
However, there still are restrictions. It has to be a heap table. It does not work with index organized tables or clustered tables. And the tablespace, if it's not using the automatic segment space management, it cannot-- it does not apply to those either, or if tables with a uniform extents-- tablespace with uniform extents. That's why anything in the temporary table is also included. Why? Because the temporary tablespace has to be uniform extents.
11:17
Nikita: So, what are the restrictions lifted for different sessions on the same table?
Bill: Sessions can query that table, can perform conventional DML on that, able to also concurrently perform a direct load, and I can roll back to a save point. So you can see those added features can be very beneficial.
But there's still restrictions that apply. It still applies to heap tables only, and it still applies to only tablespaces that are using the automatic segment space management for that. Of course, that includes the temporary tablespace and it doesn't work with tablespaces that have uniform extents.
Your application DML might need to query the data after that direct load without committing, applications that might need to modify data within that same transaction as that direct load. You can enable multiple append hint. So you can specify the hint in addition to pending hint to disable. You can specify the no multi-append hint to disable it.
12:27
Lois: Bill, what’s the broader impact of these changes. How do these improvements make things more development-friendly?
Bill: So changes to the direct load make things a little bit more development friendly by removing those directions after that direct load itself. So previous restrictions when loading-- querying the data kept us from doing multiple things at the same time. So now I can query on that table direct load from the same session, from a different session. I can do conventional DMLs on the table within the same session. It allows me to do a rollback on it.
I can do direct loads on the same table within the same session. Again, I can also allow rollback to a save point. As long as my compatibility is set to 21.0.0.0, I will be able to go ahead and benefit from this feature. And there is no increase with it as far as the space usage or causing any fragmentation to the table. So that will not be an issue.
13:35
Nikita: Well, that’s the end of our time together, but I want to thank you, Bill, for sharing your expertise with us.
Lois: To learn more about what we discussed today, visit mylearn.oracle.com and search for the Oracle Database 23ai New Features for Administrators course. Join us next week for a discussion on some more Oracle Database 23ai new features. Until then, this is Lois Houston…
Nikita: And Nikita Abraham signing off!
14:03
That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
-
Hosts Lois Houston and Nikita Abraham are joined by Senior Principal Database & MySQL Instructor Bill Millar who explains Oracle's newest caching solution called True Cache. Available in Oracle Database 23ai, True Cache is an automatically managed, in-memory, read-only cache that improves application performance dramatically. Bill provides an overview of its features and highlights the benefits of using True Cache. Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/140830/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript:
00:00
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!
00:26
Lois: Hello and welcome to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor.
Nikita: Hi everyone! Last week, we had quite a power-packed episode. We discussed the 23ai new feature for Automatic SQL Plan Management. We also looked at the 23ai automatic feature that enhances SecureFiles LOB Write Performance as well as the update to Wide Columns.
00:59
Lois: Yeah, and in today’s episode, we will look at True Cache, another 23ai new feature. To tell us all about it, we have Bill Millar back with us. Bill is a Senior Principal Database & MySQL Instructor with Oracle University. We'll ask Bill to give us an overview of True Cache, talk about its configuration and deployment, and discuss how to apply True Cache to our applications.
Nikita: To kick things off, Bill, can you give us a high-level overview of what True Cache is? How does it differ from other caching solutions like Redis or Memcached?
01:35
Bill: True Cache is an in-memory cache. It is read-only. True Cache is deployed in front of a primary database, and it is automatically managed. It keeps the most frequently accessed data in the cache, and it keeps the cache consistent with the primary database. They call it diskless, but it's not. It does require some space for SP file, redo logs, control files, and such. But it's very similar to Active Data Guard.
The queries can be offloaded to the True Cache for faster query response. And the data in the query cache is consistent. Unlike other mid-tier caches like Redis or Memcached, a query to the True Cache returns only committed data, and the data is always consistent. It's secure. Why? Because we implement our Oracle database security policies and you can control access to the cache.
02:33
Lois: So, why should we use True Cache?
Bill: Improve application performance without having to rewrite any applications. That can save considerable amount of time, effort, and expense. Reduces the application response time. So the closer the True Cache is to the application, the faster the response. Now, you do need a large amount of memory. We're talking memory here. It's an in-memory storage area, and depending on how you configure it, you can have it shared, you can have it divided. We mentioned it's automatically maintained. So there's no application changes required, and it is transparent to the application. Again, simplifies that development and maintenance.
03:15
Nikita: How does it impact application performance, and what kind of scenarios would benefit the most from implementing True Cache?
Bill: So at a high-level view, True Cache or primary database, the application configuration serves as other things that are going to decide where is it going to query the data from, from the True Cache or from the primary database.
The True Cache satisfies that query. And that's where the data will be fetched from. If not, then from the primary database. On start up, True Cache is empty. So it starts reading large chunks of data to populate the True Cache. So after a block is cached, then again, it can be automatically updated, apply the redo to it-- very similar to the Oracle Active Data Guard. In the data returned, it is always going to be consistent.
04:04
Lois: Is it going to be current data?
Bill: Maybe, maybe not. If it's been updated in the primary, if they redo apply hasn't occurred yet, then it's not the most consistent. But as far as the query cache is concerned, it is the most current because we only display consistent. You can have multiple True Caches. You can save the same database application service to the True Cache as you can partition it.
04:28
Nikita: I'm curious about the memory requirements, Bill. How crucial is memory for True Cache's performance?
Bill: You need to have significant amount of memory. Memory, memory, memory. So True Cache is completely memory, memory. So I want to have all my data possible in there. The more memory you have, the less likely something is going to age out. And of course, just like with the standard caching, you can also pin objects to stay in the True Cache.
Yeah, like I said, there are some requirements for storage, even though it's called diskless because of, again, redo log files, configuration files like the control files, SP file. And again it is read only.
05:11
Lois: Can you explain the differences between using physical and logical connections with True Cache? How does this impact the way applications interact with the database?
Bill: So with using the True Cache, we have two physical connections, and we can have one to the primary database and one to the True Cache. Each connection has a database application service associated with it, and it's going to choose which connection to use based whether it's going to go to the True Cache or to the primary database.
The second way is the application maintains one logical connection that uses the application service for the primary database. It's the JDBC thin driver, starting with Oracle Database 23 is available. It's going to maintain the physical connections to the primary database and the True Cache itself. Now, the logical connection, the one logical and one physical, is for Java applications only.
Applications that work with JSON, we extend the HTTP entity tag support for that. So a database GET request to the True Cache is going to compute the ETag, insert it into the return document.
06:27
Nikita: But what happens if there’s a mismatch when the modified document is put back into the primary database?
Bill: Well, then the database is going to verify. OK, what happens with that?
It's going to verify the document row still matches that ETag for that. If with that put command, let's say, I have new data here, the row is going to match that ETag that was automatically updated. If there's no match, another user has changed the data and the PUT request is rejected. So the PUT request can be retired using the new data.
07:05
Are you planning to become an Oracle Certified Professional this year? Whether you're a seasoned IT pro or just starting your career, getting certified can give you a significant boost. And don't worry, we've got your back! Join us at one of our cert prep live events in the Oracle University Learning Community. You'll get insider tips from seasoned experts and learn from other professionals' experiences. Plus, once you've earned your certification, you'll become part of our exclusive forum for Oracle-certified users. So, what are you waiting for? Head over to mylearn.oracle.com and create an account to jump-start your journey towards certification today!
07:48
Nikita: Welcome back! Now, how do you configure True Cache, Bill?
Bill: You can configure True Cache one of two ways. You can either use the Database Configuration Assistant, which actually makes it a little simpler to configure it, and you can also manually create it.
You have some environment options. One is a uniform configuration where you can deploy identical True Cache that use the same database application service. Another way is partition configuration. The data is going to be divided across multiple True Caches, which, each cache is a different subset of the data. You can also deploy True Cache in a RAC environment. As one might expect, there are some additional configuration steps for a RAC environment.
You want to make sure you verify your configuration, that the database application services are working as expected after you configure it. And then, optionally, you can enable DML redirection. What that will do, it writes data to the primary database, and that data is automatically updated in the cache. It's very similar how to the Oracle Active Data Guard works. Because the DML redirection uses more resources, it's not recommended for update-intensive applications.
There is a parameter, a ADG_REDIRECT_DML initialization parameter, that you will set to True in order to do that.
09:18
Lois: Bill, what are the specific challenges or considerations that administrators should be aware of during the configuration process?
Bill: You do need to make sure your network is configured for True Cache in the primary database. So optionally, you can create a remote listener for high availability. But you create your True Cache. You go ahead, and make sure that you have your primary database. You want the network configuration for both of those. And then you create the True Cache. Once the True Cache is created, you're going to create the application services associated with the database. And then, you're going to start the database application services on the True Cache.
When it comes to naming the application service names, each primary database application is going to be associated with a corresponding True Cache application service. To help simplify things a little bit, in the naming convention, you'll notice in our examples-- for example, if we have SALES as the primary database service, then we have the True Cache, we have SALES_TC, standing for True Cache, so it's easily identified.
You don't have to do that, but it's kind of recommended to do that, some way that you're going to identify it. So we're going to start our True Cache services. And you only start the True Cache services on the True Cache instances. Because it's the database services on the database that you need to make sure are started. And they are read-only.10:46
Lois: Are there some best practices for maximum availability architecture?
Bill: Uniform configuration seems to be a popular one. Why? Because I am going to have the both True Caches can be shared. That way, hopefully, I'm getting full usage out of both. And maybe if I have one service going to one, it might be minimally used. Whereas, the other one might be over. Hey, I could use more memory over here.
We'll also recommend use the JDBC 23ai UCP, Universal Connection Pool, for the application. So that can lessen the impact. If one True Cache becomes unavailable, as far as, OK, I need to reroute over here-- benefit of uniform configuration also. Prepopulate the cache. You want to go ahead and run the critical workload for that. If you have a planned outage, and you need to shut down the True Cache, you want to make sure you stop the database application service on that True Cache.
And then, how are you going to design your True Cache? Are you going to partition it? Are you going to have uniform? Which partition option are you going to use? So you can try to design that to help minimize the number of fetches it has to do from the primary database. And the more you can keep in the True Cache, the better the performance is going to be.
12:09
Nikita: What do I need to keep in mind when it comes to managing True Cache?
Bill: One thing you might need to do for managing the True Cache is to monitor the True Cache. There's a couple different ways that we can do it. One, you can use the V$ view, the V$TRUE_CACHE view. And, of course, you can always use the Automatic Workload Repository.
12:30
Lois: Bill, we already spoke about this a bit, but can you tell us more about using True Cache in an application?
Bill: There's two ways of using True Cache, as we've seen, physical and logical. Physical, it's going to maintain two connections, front one to the primary database and one to the True Cache. The application can decide which connection to use, based off of what it is trying to do.
If it's just reading, long as it's for a service that's configured with True Cache, it can read the True Cache. If it's going to write something, it's going to update, insert, whatever the case might be, it's for the primary database. And you can use any existing client driver as long as you're using the physical connection method. Any programming language will also work.
With the one logical connection method, it uses the application service for the primary database. You're going to use the JDBC Thin driver, starting with 23ai. You can use it and it maintains the connection to the primary database and True Cache. This model only works with Java applications, though. It maintains the physical connections.
We're going to enable the driver connection. And then, we're going to set the read only. We're going to set it to read only, true. Read only, false, whatever the case might be. And the read only mode is false for a connection by default. False is the default. Java applications only.
14:14
Nikita: What are some best practices for load balancing in a uniform configuration?
Bill: You have multiple--multiple True Caches. They're going to service the same database application. They're going to cache the same data. It's the listener that's going to distribute the load balances. So the listener will automatically distribute and load each session to each cache. It will do it randomly and it will do it based off a load. Where can it configure? Where can it send for the best performance.
To route the request to the best performing True Cache, you want to make sure that you are using the same listener. So that remote listener parameter should point to the same listener, which is also the primary database listener. Single instance primary database local listener or scan listener, whichever one you're using, points to the primary. For the application for the JDBC URL, should point to the primary database.
You'll remember that Thin driver is going to create that logical connection, and it's going to create the physical connection to the primary database into each True Cache. To simplify things and possibly avoid connection issues, you might consider using the LISTENER_NETWORK, so the initialization parameter instead of specifying the remote and local listener separately. Because with the local--with the listener networks, all listeners within the same network name will cross register.
15:44
Lois: Before we wrap up, are there any complementary features that you would recommend using alongside True Cache to further enhance performance or simplify management?
Bill: There are features that can complement True Cache-- the server-side result set cache.
So you can create--you can go ahead and create the result set that's part of the library cache set aside, a portion of that. You're going to go in, you're going to configure what objects will use that. You can still use that even with True Cache.
There's also the KEEP Buffer Pool that can be used. It's a separate pool that you set aside as part of the buffer cache. And you want to make sure you size it so the object that you want to keep in memory in the buffer cache that you size it appropriately. But again, some configuration, you configure the key pool, plus also you go in and alter the objects to use it.
And then lastly, there's the database smart flash cache. So again, if your data doesn't fit into memory, you can expand the capacity of by adding flash devices. When you configure the flash cache, if you are using transparent data encryption data, the local flash devices is not supported. So if it's TD encrypted on the primary database, it's going to stay in the buffer cache of the primary database.17:11
Nikita: Ok! I think we can close the episode with that. Thank you, once again, for joining us, Bill.
Lois: Yes thanks! We’re learning so much from you. To learn more about what we discussed today, including the various configuration options that are available, visit mylearn.oracle.com and search for the Oracle Database 23ai New Features for Administrators course. Join us next week for a discussion on some more Oracle Database 23ai new features. Until then, this is Lois Houston…
Nikita: And Nikita Abraham signing off!
17:46
That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
-
Join Lois Houston and Nikita Abraham, along with Senior Principal Database & Security Instructor Ron Soltani, as they discuss how the new Automatic SQL Plan Management feature in Oracle Database 23ai improves performance consistency and simplifies management. Then, Senior Principal Database & MySQL Instructor Bill Millar shares insights into two new features: one that enhances SecureFiles LOB Write Performance, improving read and write speeds, and another that increases the column limit in a table to 4,096, making it easier to handle complex data. Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/137192/207062 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript:
00:00
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!
00:26
Nikita: Welcome to the Oracle University Podcast! I’m Nikita Abraham, Principal Technical Editor with Oracle University, and joining me is Lois Houston, Director of Innovation Programs.
Lois: Hi there! Last week, we looked at the Oracle Database 23ai enhancements that have been made to Hybrid Columnar Compression and Fast Ingest. In today’s episode, we’ll talk about the 23ai new feature for Automatic SQL Plan Management with Ron Soltani, a Senior Principal Database & Security Instructor with Oracle University.
01:01
Nikita: And later on, we’ll be joined by Bill Millar, another Senior Principal Database & MySQL Instructor, who will tell us about the 23ai automatic feature that enhances SecureFiles LOB Write Performance. We’ll also get him to talk about the Wide Columns update. So, let’s get started. Hi Ron! What have been the common challenges with SQL plans and database performance?
Ron: One of the problems that we have always had, if you remember, was when data changes, database setting configuration, parameter changes, SQL that were operating very well could now behave badly using the SQL plan that were associated to them. And remember, the same SQL plan generally Oracle likes to continuously reuse.
So the SQL plans were put in the baseline in the past, and we could have those SQL plan baseline, which are a set of approved plans to be used for a SQL from the SQL history stored in AWR, then could be used for the optimizer to choose from. However, which plan to choose and which one would be the best one to use, this is what the problem has been in managing the SQL plan baselines, and a lot of the operation would have been done manually.
02:22
Lois: And what have we done to overcome this?
Ron: So now this new system will going to perform all of those operations automatically for us. Now it can search the Automatic Workload Repository. It can find SQL plans for a particular SQL statement, then look for any alternative plans that may available in alternate sources like SQL tuning sets. And then validate those plans and see if those plans are going to be good and to be used as SQL plan baseline for executing SQL statement by the optimizer.
03:00
Nikita: So we now have the Automatic SQL Plan Management Evolve Advisor to help manage operations automatically, right? Can you tell us a little more about it? How does it ensure optimal performance?
Ron: This is an automatic advisor that is created that can go look for different plans and validate the plans by examining them, making sure that they are not causing any regression compared to the previous operation, and then evolve that plan into a good baseline.
This simplifies management of the baseline repository for a SQL statement. So as data changes, as parameters changes, optimizer could come up with different type of plans that are set within this baseline that has been validated to be good baseline for each situational operation. So this way you reduce a lot of hard parsing operations.
04:00
Lois: And how does the SQL Evolve Advisor work, Ron?
Ron: First, it will check the AWR to find what are the top SQLs that has been found. Then it will look to see if these top SQLs who did not perform well with the plan that they have, that's why they're top SQL, have other alternative plans that are stored in the SQL plan history, in AWR, or available in any other sources.
Then if it finds any additional plans, it will go ahead and add all of those plans into the plan history. So in the plan history, now you have accumulation of all the plans available in AWR and anything that has been brought from other sources. Then it will test every one of those plans and validate that by use of the plan, the SQL statement will not deprivate and get slower. The performance is either similar or actually better. So normally, there is a percentage that the SQL should improve. So we will then validate these baselines.
And finally, once the baselines or those plans have been validated, they will be accepted, and then they will be added as SQL plan baselines. They will remain in the statement history, in the AWR, and will be available for optimizer for the future use.
05:28
Nikita: What are the benefits of this?
Ron: Number one is Autonomous Database. As you know, they want to automate all management, including management of the SQL execution due to changes that are happening for the application, for the data, or the database and its environment.
It totally eliminates any manual intervention for management of the statement, and it can transparently repair any statement that had been affected by a major change.06:00
Lois: What sort of problems does this feature solve for us?
Ron: Of course, this is a performance consistency. We want to make sure that every statement performed to its best performance and any specific changes that may impact those SQL statements would be taken into an account, and a better plan, if available, would then be available for use.
It also improves the application performance level, therefore database service level will get much improvement. And the SQL execution plans will be automatically managed behind the scene by expanding these baselines, by managing all of these baseline history and all of that that is managed by this automatic SQL plan management environment automatically.
06:50
Nikita: And when do we use this?
Ron: If there is a change in a database environment, like you add SGA, the change into the shared pool, change in the size of the buffer cache or any type of storage effects. So all of those can actually affect the SQL execution.
Now all of those changes, including data changes, can cause a SQL plan to not behave very well or behave as well as it was doing before. Therefore, if particular plans do not perform as well as they did before, that affects the performance of the application. This also affects the performance of the database and the instance.
07:35
Lois: So, how do we use this environment?
Ron: Well, best news that I have for you in that is that there is nothing manual needs to be done. All we need to do is, number one, make sure that we enable foreground automatic SQL plan management that we done through the package for the DBMS SPM for SQL plan management.
You will use the package with the configure option, and you enable the auto SPM evolve task, and you set it to auto. Once this is done, now the SQL evolve plan management and advisor are enabled, and they will then monitor your statements, review all of the top SQLs as they are found with all of the ADDM operation, and then do their work in looking for better plans and being able to maintain the SQL plan baselines we talked about.
Now for you to be able to view, monitor, and see how these operations are going, if it is enabled, you can take a look at the DBA SQL plan baseline's view. There are many, many columns in that particular baseline, and there are also columns that has been added that tell you where is the plan generated from, if a plan is approved, and any other user interaction with the plan or settings can then be verified using that DBA SQL plan baseline view.09:13
Are you looking for practical use cases to help you plan and apply configurations that solve real-world challenges? With the new Applied Learning courses for Cloud Applications, you'll be able to practically apply the concepts learned in our implementation courses and work through case studies featuring key decisions and configurations encountered during a typical Oracle Cloud Applications implementation. Applied learning scenarios are currently available for General Ledger, Payables, Receivables, Accounting Hub, Global Human Resources, Talent Management, Inventory, and Procurement, with many more to come! Visit mylearn.oracle.com to get started.
09:54
Nikita: Welcome back! Let’s bring Bill into the conversation. Hi
Bill! Can you tell us about the 23ai automatic feature that enhances SecureFiles LOB Write Performance?
Bill: The key here is that it is automatic and transparent. There's no parameters set. Nothing to configure in table, no hints, and nothing that you have to do with these improvements. It is tightly integrated with SecureFiles LOB infrastructure.
So now, multiple LOBs can be handled in a single transaction and can be buffered simultaneously. This will help with mixed workloads, switching between the LOBs that are writing in a single transaction. The PGA will adaptively resize based off the size for these large writes for the LOBs if you're using the No Cache option. Remember, no cache is going to bypass the buffer cache and does direct reads and writes from the PGA.
JSON type will be transformed into the OSON Oracle data type. It is an optimized native binary storage format for JSON data.11:15
Lois: Ok. So, going forward, there will be better read and write performance for LOBs.
Bill: Multiple LOBs in a single transaction can be buffered simultaneously, improving mixed workloads. We just talked about the PGA. Automatically, the buffer is automatically resized.
And the improved JSON support. The reason it will recognize, hey, this is a JSON data type. But traditionally, JSON data types were small.
So they were small to medium size. So the range from 32k to 32 meg was considered small to medium whereas LOBs were designed for data types larger than 100 meg. So by recognizing this a JSON data type, it can take advantage of the LOB architecture.
Other enhancements will also include the acceleration of compressed LOBs, the pen and compression caching, and improves the poor performance of your reads and writes to compressed LOBs. It's faster than previously.
12:24
Nikita: Bill, what do you think about the recent increase in the column limit? Previously, the limit was 1000 columns per table, which sometimes posed issues when migrating from other systems that allowed more than 1,000 columns, right?
Bill: Maybe because of workload requirements, the whole machine learning, the internet of things workloads, IOTs can have hundreds of thousands of attributes, dimensional attribute columns for that. And even our very own blockchain tables reserves up to 40 hidden virtual columns, so that takes away from the total amount.
Virtual columns count towards the column limits and some applications as they drop columns, what it does, it just converts them to unused, and it still applies towards the limit the number of columns that you can have to that limit. There were workarounds. However, they were most likely not the best way to do it, like column switching, table splitting for that. But big data really use cases, really saw where files have or required more than 1,000 columns.
13:42
Lois: So, now that we can have 4,096 columns in a table, I’m sure it’s made handling complex data a lot easier.
Bill: So by increasing this, since other systems do support higher column limits, it can-- the increase can make migration from other systems easier and possibly even a little bit more attractive while it can make applications a little bit simpler because the 1,000 column limit was not always optimal for analytics. Where 1,000 might have been plenty for OLTP type environments, but not for the analytics, especially when it comes to machine learning and those internet of things that we talked about, where the previous workarounds, like splitting the tables, really caused more performance issue than anything else.
So we want to avoid those suboptimal workarounds. And the nice thing is there's no change to the SQL. So once you have that-- well, if we were doing SQL, if we had tables that were split and we're trying to do things that is actually going to help improve that SQL, now, we don't have multiple objects that we're dealing with.
14:57
Nikita: How do we actually go about increasing the column limit to 4,096?
Bill: You do have to have the compatibility set to 23c. Why? Because it's a new feature. There is a new initialization parameter called Max columns, and you do set that. There's two different ways, two different values. We can set it to standard or we can set it to extended.
It is dynamic. When it's set to standard, it's only 1,000. When we set it to extended, it's going to allow the 4,096. It is modifiable at the PDB level. However, it will inherit what's at the root level, if it's not explicitly set at a PDB. It can't alter it in a session for that. And multiple instances of the RAC environment must use the same value.
Now one thing, notice that it cannot be set to standard if I created a table that had more than 1,000 columns. One thing that might get you, when you drop a table that has more 1,000 columns and you try to set it back to standard, it might tell you, hey, you have tables that have more than 1,000 columns. Don't forget your recycle bin unless you did a drop table purge.
16:09
Lois: Are there any performance considerations to keep in mind, Bill?
Bill: There's really no DML or query performance degradation for the tables. However, it might require, as you would expect, the increase in memory when we have the new column limits. It might require additional shared pool, additional SGA with the additional columns, more buffer cache as we're bringing blocks in.
So that's shared pool along with the PGA. And also we can add in buffer cache in there, because that increased column count is going to be increase in the total PGA memory usage. And those are kind of expected for that. But the big advantage is it gives us the ability to eliminate some of these suboptimal workarounds that we had in the past.
17:02
Nikita: Ok! We covered a lot today so thank you Bill and Ron.
Lois: To learn more about what we discussed today, visit mylearn.oracle.com and search for the Oracle Database 23ai New Features for Administrators course. Join us next week for a discussion on some more Oracle Database 23ai new features. Until then, this is Lois Houston…
Nikita: And Nikita Abraham signing off!
17:27
That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
-
In this episode, hosts Lois Houston and Nikita Abraham speak with Senior Principal Database & MySQL Instructor Bill Millar about the enhanced performance of Hybrid Columnar Compression, the different compression levels, and how to achieve the best compression for your tables. Then, they delve into Fast Ingest, what’s new in Oracle Database 23ai, and the benefits of these improvements. Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/137192/207062 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript:
00:00
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!
00:26
Lois: Hello and welcome to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor.
Nikita: Hi everyone! In our last episode, we spoke about the 23ai improvements in time and data handling and data storage with Senior Principal Instructor Serge Moiseev. Today, we’re going to discuss the enhancements that have been made to the performance of Hybrid Columnar Compression. We'll look at how Hybrid Columnar Compression was prior to 23ai, learn about the changes that have been made, talk about how to use this compression in 23ai, and look at some performance factors. After that, we’ll move on to Fast Ingest, the improvements in 23ai, and how it is managed.
01:15
Lois: Yeah, this is a packed episode and to take us through all this, we have Bill Millar back on the podcast. Bill is a Senior Principal Database & MySQL Instructor with Oracle University. Hi Bill! Thanks for joining us. So, let’s start with how Hybrid Columnar Compression was prior to 23ai. What can you tell us about it?Bill: We support all kinds of platforms from the Database Enterprise Edition on up to the high engineered systems for that and even the Exadata Cloud at the Customer. We have four different levels of compression. One is considered the warehouse compression where we do a COLUMN STORE COMPRESS FOR QUERY LOW and COLUMN STORE COMPRESS FOR QUERY HIGH. The COLUMN STORE COMPRESS FOR QUERY HIGH is the default, unless another compression level is specified. With the archive compression, we have the COLUMN STORE COMPRESSED FOR ARCHIVE LOW and also COLUMN STORE COMPRESS FOR ARCHIVE HIGH.
With the Hybrid Columnar Compression warehouse and archive, the array inserts are compressed immediately. But, however, some conditions have to be met. It has to be a locally-- to use these, it has to be a locally managed tablespace, the automatic segment space management. And compatibility level, at least 12 too or higher when these values have been introduced. There are different compressors that are used for the compression hidden from the customer. It just depends on what is selected as to what is going to be the compression that's going to be used for-- notice that with the COLUMN STORE FOR QUERY HIGH and for ARCHIVE LOW, the zlib compression method is used, whereas if you select the ARCHIVE HIGH, the Bzip2. And in 19C, we added the Zstandard. And it's available for the MEMORY COMPRESS FOR CAPACITY HIGH.
03:30
Nikita: So, what’s happened in 23ai?Bill: When in 23c, to take advantage of the changes in compression, the compatibility level has to be set at least to 23.0.0 or higher.
When a table is created or altered with the hybrid column compression, the Zstandard will automatically be selected. So it doesn't matter which one of the four you select, that will be the one that is selected. It is internally set transparent to the user. There is no new SQL format that has to be used in order for the Zstandard compression to be applied.
And the Database Compatibility Mode has to be at least at 23.0.0 or higher. Only then can the format of the Hybrid Column Compression storage use that Zstandard compression. If we already have compressed data blocks in existing tables, they're going to remain in their original format.
04:31
Lois: And are the objects regenerated?Bill: If the objects are-- they might be regenerated if they were deleted in another operation. If you want to completely take advantage of the new compression, all you have to do is alter table move. And that's going to go ahead and trigger the recompression of that, whereas any newly created tables that are created will use the Zstandard by default.
05:00
Nikita: What are the performance factors we need to think about, Bill?Bill: There are some performance factors that we do need to consider, the ratio, the amount of space reduction in storage that we're going to achieve, the time spent compressing the data, the CPU cost to compress that data, and also, is there any decompression rate, time spent decompressing the data when we're doing queries on it?
05:24
Lois: And not all tables are equal, are they?Bill: Not all tables are equal. Some might get better performance by different compression level than others for that. So how we can basically have to test our results, there is a compression advisor that's available, that you can use to give you a recommendation on what compression to use. But only through testing can we really see the availability, the benefits of using that compression for an application.
So best compression, just as in previous versions, the higher the compression levels, the more CPU it's going to use. The higher the compression level, the more space savings that we're going to achieve for that as we are doing those direct path inserts. So there's always that cost.
06:20
Did you know that the Oracle University Learning Community regularly holds live events hosted by Oracle expert instructors. Find out how to prepare for your certification exams. Learn about the latest technology advances and features. Ask questions in real time and learn from an Oracle subject matter expert. From Ask Me Anything about certification to Ask the Instructor coaching sessions, you’ll be able to achieve your learning goals for 2024 in no time. Join a live event today and witness firsthand the transformative power of the Oracle University Learning Community. Visit mylearn.oracle.com to get started.07:01
Nikita: Welcome back! Let’s now move on to the enhancements that have been made to fast ingest. We’ll begin with an overview of fast ingest, how to use it, and the improvements and benefits. And then we’ll look at some features for managing fast ingest. Bill, why don’t you start by defining fast ingest for us?Bill: Traditionally the fast ingest, also referred to as deferred inserts, is faster than processing a single row at a time. It can support high-volume transactions like from the Internet of Things applications, where you have hundreds of thousands of items coming in trying to write to the database.
They are faster, because the inserts don't use the traditional buffer cache. They use a pool that will size out of the large pool. And then they're later written to disk using the SMCO, the space management coordinator. Instead of using the buffer cache, they're going to write into an area of the large pool.
The space management coordinator, it has these helper threads, however many-- that's just a number for that-- that will buffer. And as buffer is filled based off size of that algorithm, it will then write those deferred inserts into the database itself.
08:24
Lois: So, do deferred inserts support constraints?Bill: Deferred writes do support constraints in index just as for regular inserts. However, performance benchmarks that have been done recommend that you disable constraints, if you're going to use the fast ingest.
08:41
Lois: Can you tell us a bit about the streaming and ingest mechanism?Bill: We declare a table with the memoptimize for write. We can do that in the create table statement, or we can alter the table for that. The data is written to the large pool, unlike traditionally writing items to the buffer cache. It's going to write to the ingest buffer, the large pool. And it's going to be drained. It's going to be written from that area by using those background processes to write to the actual database itself.
So the very high throughput, since drainers issues deferred writes in large batches. So we're not having to wait especially for the buffer cache. OK, I need space. OK, I need to write. I need to free up blocks. Very ideal for these streaming inserts, sensor readings, alarms, door locks. Those type of things.
09:33
Nikita: How does performance improve with this?Bill: With the benchmarks we have done, we have found that the performance can be up to 75% faster by going ahead and doing the fast ingest versus traditional inserts. The 23 million inserts per second on a single X6-2 server with the benchmarks that we have.
09:58
Nikita: Are there any considerations to keep in mind?Bill: With the fast ingest, some things to consider for that. The written data, you might need to validate to make sure it's there. So you might have input files that are writing to that that are loading it. You might want to hang on to those, before that data destroyed. Have some kind of mechanism to validate, yes, it was written.
There is a possible loss of data. Why? Because unlike the buffer cache that has the recovery mechanism with the redo and the undo, there is none with that large pool. So that's why if the system crashes, and the buffers haven't been flushed yet, then it's possible loss of data.
There's no queries from the large pool meaning that if I want to query the information that the fast ingest is loading into the table, it doesn't go and see what's sitting in the buffer in the large pool like it does with the buffer cache.
Index and constraints are checked but only at flush time. And the memoptimize pool size is a fixed amount of space that we're going to allocate-- of memory that we're going to allocate to use for the memoptimize write.
We can enable a table for the fast ingest, enable with the memoptimize for write. We can create a table and do it. We can also alter a table. We already have a table existing. All we have to do is alter it. And we want to use that, the fast ingest, for these tables.
11:21
Lois: Do we have options for the writing operation, Bill?Bill: You do have options for the writing operation. We have the parameters, the memoptimize write where we can turn that on. We can also use it in a hint. It is set at the root level, it. Is not modifiable at the PDB level. It's set at the root level, It is a static parameter. We can also do things in our session. We want to verify, OK, is the memoptimize write on? We can verify a table is enabled.
So with the fast ingest, the data inserts, you can also use a hint. You can also set this at a session level.
If you decide there's something that you don't want to use the memoptimize write for, then you can disable it for a table.
12:11
Nikita: Bill, what are some of the benefits of the enhancements made in 23ai?Bill: With some of the enhancements-- so now, some table attributes are now supported-- we can now have common default values for a column. We can use transparent data encryption. We can also use the fast inserts, any inline LOBs, along with virtual columns. We've also added partitioning support. We can do subpartitioning and we can also do interval partitioning, along with auto list. So we've added some items that previously prevented us from doing the fast inserts.
It does provide additional flexibility, especially with the enhancements and the restrictions that we have removed. It allows to use that fast insert, especially in a data warehouse-type environment. It can also use-- in the Cloud, it can use encrypted tablespaces, because remember, in the Cloud, we always encrypt, by default, users' data. So now, it also gives us the ability to use it in that Cloud environment because of that change.We have faster background flushing for the loads.
13:36
Lois: And how is it faster now?Bill: Because we bypassed the traditional buffer cache. Faster ingest for those direct ingest. So again, bypassing the traditional inserts and using the buffer cache gives the ability to bulk load into large pool, then flush to the database so that way, we have access to that data for possible faster analytics of those internet of things, especially when it comes to the temperature of the temperature sensors. We need to know when a temperature of something is out of bounds very quickly. Or maybe it's sensors for security. We need to know when there's a problem with the security.
14:20
Nikita: How difficult is it to manage this?Bill: Management is fairly simple. We have the MEMOPTIMIZE_WRITE_AREA_SIZE parameter that we're going to say-- it is dynamic. It does not require a restart. However, all instances in a RAC environment must have the same value. So we have the write area. What are we going to set? And then the MEMOPTIMIZE_WRITE, by default, it uses a hint. Or we can go ahead and we can just set that to all.
It is allocated from the large pool. You manually set it. And we can see how much is actually being allocated to the pool. We can go out and look at our alert log for that information.
There's also a view. The MEMOPTIMIZE_WRITE_AREA has some columns. What is the total memory allocated for the large pool? How much is currently used by the fast ingest? How much free space? As you're using it, you might want to go out and do a little checking, or do you have enough space? Are you not allocating enough space? Or have you allocated too much?
It'll also show the total number of writes, and also, the number-- the writers is currently the users that are using it.And the container ID, what is the container within that container database? What's the pluggable or pluggables that's using the fast ingest?
There is a subprogram, the DBMS_MEMOPTIMIZE that we have access to that we possibly can use. So there are some procedures. Here, we can return the rows of the low and high water mark of the sequence numbers. And the key here is across all the sessions. We can see the high water mark, sequence number of the rows written to the large pool for the current session. And we can also flush all the ingest data from the large pool to disk for the current session.
16:26
Lois: What if I want to flush them all for all sessions?Bill: Well, that's where we have the WRITE_FLUSH procedure. So it's going to flush the fast ingest data of the Memoptimize Rowstore from the large pool for all the sessions. As a DBA, that's one that you most likely will want to be using, especially if it's going to be before I do a shutdown or something along that line.
16:49
Nikita: Ok! On that note, I think we can end this episode. Thank you so much for taking us through all that, Bill.Lois: Yes, thanks Bill. If you want to learn more about what we discussed today, visit mylearn.oracle.com and search for Oracle Database 23ai New Features for Administrators. Join us next week for a discussion on some more Oracle Database 23ai new features. Until then, this is Lois Houston…
Nikita: And Nikita Abraham signing off!17:21
That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast. -
In this episode, hosts Lois Houston and Nikita Abraham discuss improvements in time and data handling and data storage in Oracle Database 23ai. They are joined by Senior Principal Instructor Serge Moiseev, who explains the benefit of allowing databases to have their own time zones, separate from the host operating system. Serge also highlights two data storage improvements: Automatic SecureFiles Shrink, which optimizes disk space usage, and Automatic Storage Compression, which enhances database performance and efficiency. These features aim to reduce the reliance on DBAs and improve overall database management. Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/137192/207062 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode. --------------------------------------------------------- Episode Transcript:
00:00
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!
00:26
Nikita: Welcome to the Oracle University Podcast! I’m Nikita Abraham, Principal Technical Editor with Oracle University, and joining me is Lois Houston, Director of Innovation Programs.
Lois: Hi there! Over the past two weeks, we've delved into database sharding, exploring what it is, Oracle Database Sharding, its benefits, and architecture. We’ve also examined each new feature in Oracle Database 23ai related to sharding. If that sounds intriguing to you, make sure to check out those episodes. And just to remind you, even though most of you already know, 23ai was previously known as 23c.
01:04
Nikita: That’s right, Lois. In today’s episode, we’re going to talk about the 23ai improvements in time and data handling and data storage with one of our Senior Principal Instructors at Oracle University, Serge Moiseev. Hi Serge! Thanks for joining us today. Let’s start with time and data handling. I know there are two new changes here in 23ai: the enhanced time zone data upgrade and the improved system data and system timestamp data handling. What are some challenges associated with time zone data in databases?
01:37
Serge: Time zone definitions change from time to time due to legislative reasons. There are certain considerations. Changes include daylight savings time when we switch, include the activity that affects the Oracle Database time zone files.
Time zone files are modified and used by the administrators. Customers select the time zone file to use whenever it's appropriate. And customers can manage the upgrade whenever it happens.The upgrades affect columns of type TIMESTAMP with TIME ZONE. Now, the upgrades can be online or offline.
02:24
Lois: And how have we optimized this process now?
Serge: Oracle Database 23c improves the upgrade by reducing the resources used, by selectively using the updates and minimizing the application impact. And only the data that has dependencies on the time zone would be impacted by the upgrade.
The optimization of the time zone file upgrade does not really change the upgrade process, so upgrade can be done offline. Database would be unavailable for a prolonged period of time, which is not optimal for today's database availability requirements.
Online upgrade, in this case, we want to minimize the application impact while the data is being upgraded. With the 23c database enhancement for time zone file change handling, the modified data is minimized, which means that the database updates only impacted rows. And it reduces the impact to the applications and other database operations.
03:40
Nikita: Serge, how does updating only the impacted rows improve the efficiency of the upgrade process?
Serge: The benefits of enhanced timezone update include customers who manage large fleet of databases. They will benefit tremendously with a lower downtime. The DBAs will benefit due to the faster updates and less resource consumption needed to apply those updates. And that improves the efficiency of the update process.
Tables with no affected data are simply skipped and not touched. All results in the significant resource savings on the upgrade of the time zone files. It applies to all customers that utilize timestamp with time zone columns for their data storage.
04:32
Lois: Excellent! Now, what can you tell us about the improved system data and system timestamp data handling?
Serge: Date and time in Oracle databases depends on the system time as well as the database settings. System time now can be set as the local time zone for an individual database.
04:53
Nikita: How was it before this update?
Serge: Before 23c, the time has always matched the time zone of the database host operating system. Now, imagine that we use either multitenant environments or cloud-based environments when the host OS system time zone is not really the same as the application that runs in a different geographic locality or affects data from other locations.
And system time obviously applies not only to the data stored and updated in the database rows but also to the scheduler, the flashback, to a place to materialized view refresh, Recovery Manager, and other time-sensitive features in the database itself.
Now, with the database time versus operating system time, there is a need to be more selective. It is desired that the applications use the same database time in the same time zone as the applications are actually being used in.
And multitenant and cloud databases will certainly experience a mismatch between the host operating system time zone, which is not local for the applications that run in some other geographical locations or not recognizing some, for example, daylight savings time.
So migration challenge is obviously present. If you want to migrate from a specific on-premises database to either multitenant or cloud, you would experience the host operating system time zone by default.
06:38
Lois: And that’s obviously not convenient for the applications, right?
Serge: Well, the database-specific time in Oracle Database 23c, any cloud database can set local time zone to whatever the customer's requirements are explicitly. And any pluggable database can also set its own local time zone to customer's requirements, not inheriting the time zone from the container database it is currently running in.
This simplifies migration to multitenant or cloud for applications that are time-sensitive. And it offers more intuitive, easier database monitoring, and development.
07:23
Working towards an Oracle Certification this year? Take advantage of the Certification Prep live events in the Oracle University Learning Community. Get tips from OU experts and hear from others who have already taken their certifications. Once you’re certified, you’ll gain access to an exclusive forum for Oracle-certified users. What are you waiting for? Visit mylearn.oracle.com to get started.
07:51
Nikita: Welcome back! Let’s move on to the data storage improvements. We have two updates here as well, automatic secure file shrink and automatic storage compression. Let’s start with the first one. But before we get into it, Serge, can you explain what SecureFiles are?
Serge: SecureFiles are the default storage mechanism for large objects in Oracle Database. They are strongly recommended by Oracle to store and manage large object data.
The LOBs are stored in segments. Those segments may incur large amounts of free space over time. Because of the updates to the LOB data, the fragmentation of the space used is growing depending, of course, on the frequency and the scope of the updates.
The storage efficiency could be improved by shrinking segments with the free space removed. And manual secure files shrinking has become available since Oracle Database 21c, requiring administrators to perform these tasks manually.
Traditional SecureFiles required the time-consuming DBA activities. DBAs would need to manually identify eligible LOB segments either using Segment Advisor or PL/SQL or built-in database views.Once identified, the administrators would manually execute shrink operations on very large LOBs which takes too much time and may result in excessive disk space consumption. For example, code to operate this shrinking would look like ALTER TABLE some table SHRINK SPACE CASCADE.
That would shrink all LOB segments in a particular table. If you want to scope the shrinking to a single column, the code would be required to ALTER TABLE some table MODIFY LOB, followed by the column name SHRINK SPACE.
This affects only a single column in a table with LOBs.
10:01
Lois: So, how has automatic secure shrinking made things better?
Serge: Automatic SecureFile shrink removes the emphasis from the DBAs to manually perform these tasks. And it results in the more optimal use of space over time.
It is integrated into the automated database maintenance tasks. The automation once enabled runs every 30 minutes, collects eligible LOB segments, and shrinks them offline. The execution time and freed space would vary depending on the fragmentation and the size of the LOBs. Each shrink execution may reclaim up to 5 gigabytes of unused disk space from each LOB segment that is idle.
On the high level, automatic SecureFile shrink improves the Oracle Database 23c storage usage efficiency. It is part of the ongoing Oracle Database improvement effort and transparently reclaims the free space with negligible to no impact on performance of the database operations.
Again, this is done in the background without affecting the running processes. It makes Oracle database 23c less dependent on the DBA activities while reducing the disk space required to store SecureFiles, reducing the usage of LOB segments.
Automatic securefile shrink runs incrementally in small steps over time. Some of the features are tunable. And it is supported for all types of large objects, storage, compressed, encrypted, and duplicated the object segments.
11:50
Nikita: Right, and note that this feature is turned on out-of-the-box in the Autonomous Database 23ai in Oracle Cloud. Now, let’s talk about Automatic Storage Compression, Serge.
Serge: With Automatic Storage Compression and Automatic Clustering, the storage compression gives you the background compression functionality. Directly loaded data is first uncompressed to speed up the actual load process. Rows are then moved into hybrid columnar compression format in the background asynchronously.The automatic clustering applies advanced heuristic algorithms to cluster the stored data depending on the workload and data access patterns and the data access is optimized to more efficiently make use of database table indices, zone maps, and join zone maps.
Automatic Storage Compression advantages include the improvements to Oracle Database 23c storage efficiency as well. It is part of the continuous improvement, part of the ongoing Oracle Database improvement effort. And it brings performance gains, speeds up uncompressed data loads while compressing in the background.
The latencies to load and compress data are because of that also reduced. With the hybrid columnar compression in particular, this works in combination.
And it results in less DBA activities, makes the Database Management less dependent on the DBA time and availability and effort.
Automatic Storage Compression performs operations asynchronously on the data that has already been loaded. To control Automatic Storage Compression on-premises, it must be enabled explicitly. And you have to have heatmap enabled on your Oracle Database objects.
Table must use hybrid columnar compression and be placed on the tablespace with the SEGMENT SPACE MANAGEMENT AUTO and allowing autoallocation. And this feature, again, is transparent for the Autonomous Database 23c in the Oracle Cloud.
14:21
Lois: Thanks for that quick rundown of the new features, Serge. We really appreciate you for taking us through them. To learn more about what we discussed today, visit mylearn.oracle.com and search for the Oracle Database 23ai New Features for Administrators course. Join us next week for a discussion on some more Oracle Database 23ai new features. Until then, this is Lois Houston…
Nikita: And Nikita Abraham signing off!
14:50
That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast. -
Join hosts Lois Houston and Nikita Abraham in Part 2 of the discussion on database sharding with Ron Soltani, a Senior Principal Database & Security Instructor. They talk about sharding native replication, directory-based sharding, and coordinated backup and restore for sharded databases, explaining how these features work and their benefits. Additionally, they explore the automatic bulk data move on sharding keys and the ability to split and move partition sets, highlighting the flexibility and efficiency they bring to data management. Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/137192/207062 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript:
00:00
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!
00:26
Lois: Hello and welcome to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor.
Nikita: Hi everyone! In our last episode, we dove into database sharding and Oracle Database Sharding in particular. If you haven’t listened to it yet, I’d suggest you go back and do so before you listen to this episode because it will give you a lot of context.
00:53
Lois: Right, Niki. Today, we will discuss all the 23ai new features related to database sharding. We will cover sharding native replication, directory-based sharding, coordinated backup and restore for sharded databases, and a few more.
Nikita: And we’re so happy to have Ron Soltani back on the podcast. If you don’t already know him, Ron is a Senior Principal Database & Security Instructor with Oracle University. Hi Ron! Let’s talk about sharding native replication, which is RAFT-based, meaning that it is reliable and fault tolerant-based, usually providing subzero or subsecond zero data loss replication support. Tell us more about it, please.
01:33
Ron: This is completely transparent replication built in within Oracle sharding that duplicates data across the different shards. So data are generally put into chunks. And then the chunks are replicated either between three or five different shards, depending on how much of the fault tolerance is required.
This is completely provided by the Oracle sharding database, and does not require use of any other component like GoldenGate and Data Guard. So if you remember when we talked about the architecture, we said that each shard, each database can have a Data Guard component, whether through GoldenGate or whether through Data Guard to have a standby.
And that way support high availability with the sharding native replication, you don't rely on the secondary database. You actually-- the shards will back each other up by holding replicas and being able to globally manage the replica, make sure everything is preserved, and manage all of the fault operations.
Now this is a logical replication, generally consensus-based, kind of like different components all aware of each other. They know which component is good, depending on the load, depending on the failure. The sharded databases behind the scene decide who is actually serving the data to the client. That can provide subsecond failovers with zero data loss.03:15
Lois: And what are the benefits of this?
Ron: Major benefits for having sharding native replication is that it is completely transparent to the application or any of the structures. You just identify that you want to go ahead and use this replication and identify the replication factor. The rest is managed by the Oracle sharded database behind the scene.
It supports fast failover with zero data loss, usually subsecond failovers. And depending on the number of replicas, it can even tolerate multiple failures like two server failures.And when the loads are submitted, the loads are also load-balanced across all of these shards based on where the data is located, based on the replicas. So this way, it can also provide you with a little bit of a better utilization of the hardware and load administration.
So generally, it's designed to help you keep your regular SQL-based databases without having to resolve to FauxSQL or NoSQL environment getting into other databases.
04:33
Nikita: So next is directory-based sharding. Can you tell us what directory-based sharding is, Ron?
Ron: Directory-based sharding basically allows the user to define the values that are used and combined for different partition, so better control, location of the data, in what partition, what shard. So this allows you to set up a good configuration.
Now, many times we may have a key that may not be large enough for hash partitioning to distribute the data enough. Sometimes we may not even know what keys are going to come in the future. And these need to be built in the future. So having to build these, you really don't want to have to go reorganize the whole data based on new hash functions, and so when data cannot be managed and distributed using hash partitioning or when we need full control over combination of where data exists.
05:36
Lois: Can you give us a practical example of how this works?
Ron: So let's say our company is very small in three different countries. So I can combine those three countries into one single shard. And then have three other big countries, each one sitting in their own individual shards. So all of this done through this directory-based sharding. However, what is good about this is the directory is created, which is a table, created behind the scene, stored in the catalog, available to the client that is cached with them, used for connection mapping, used for data access. So it can give you a lot of very high-level benefits.
06:24
Nikita: Speaking of benefits, what are the key advantages of using directory-based sharding?
Ron: First benefit allow you to group the data together based on the whatever values you want, depending on what location you want to put them as far as across the shards are concerned. So all of that is much better and easier controlled by us or by the designers. Now, this is when there is not enough values available. So when you're going to use hash-based partition, that would result into an uneven distribution of the data.
Therefore, we may be able to use this directory for better distribution of the data since we understand the data structure better than just the hash function. And having a specification where you can go ahead and create future component, future partitions, depending on how large they're going to be. Maybe you're creating them with an existing shard, later put them in another shard. So capability of having all of those controls become essential for management of this specific type of data.
If a shard value, the key value is required, for example, as we said, client getting too big or can use the key value, split it or get multiple key value. Combine them. Move data from one location to another. So all of these components maintain automatically behind the scene by us providing the changes. And then the directory sharding and then the sharded database manages all of the data structure, movement, everything behind the scene using some of the future functionalities.And finally, large chunk of data, all of that can then be moved from one location to another. This is part of the automatic chunk data move and whatnot, but utilized within the directory-based sharding to allow us the control of this data and how we're going to move and manage the data based on the load as the load or the size of the data changes.
08:50
Lois: Ron, what is the purpose of the coordinated backup and restore system in Oracle Database Sharding?
Ron: So, basically when we talk about a coordinated backup and restore, remember in a sharded database, I have different databases. Each database is a shard. When you take a backup, each database creates its own backup.
So to have consistent data across all of the shards for the whole schema, it is extremely important for these databases to be coordinated when the backup is taken, when the restore is being done. So you have consistency of the data maintained across all of the shards.
09:28
Nikita: So, how does this coordination actually happen?
Ron: You don't submit this through our main. You submit this through the Global Management tool that is used for the sharded database. And it's the Global Management tool that is actually submit your request to each database, but maintains the consistency of when the actual backup is taken, what SCN.
So that SCN coordination across all of the shards is then maintained for the backup so you can create a consistent backup or restore to a consistent point in time across the sharded database. So now this system was enhanced in 23C to support multiple destinations.So you can now send your backup to an object store. You can send it to ZDLRA. You can send it to Amazon S3. So multiple locations can now be defined where you can send these backups to. You can also use multiple recovery catalogs.
So let's say I have data that is located on different countries and we have requirement that data for each country must stay in that country. So I need to also use a separate catalog to maintain that partition.So now I can use multiple catalog and define which catalog is maintaining which partition to satisfy those type of requirements or any data administration requirement when it comes to backup recovery. In addition, you can also now specify different type of encryption to be used, whether you want to have different type of encryption algorithm for each of the databases that you're backing up that is maintained. It can be identified, and then set up for each one of those components.
So these advancements now allow you to manage this coordinated backup and restore with all of the various specific configuration that may be required based on the data organization. So the encryption, now can also be done across that, as I mentioned, for different algorithms. And you can define different components.
Finally, there is much better error handling and response available through this global system. Since things have been synchronized, you get much better information into diagnosing any issues.
12:15
Want to get the inside scoop on Oracle University? Head over to the Oracle University Learning Community. Attend exclusive events. Read up on the latest news. Get first-hand access to new products. Read the OU Learning Blog. Participate in challenges. And stay up-to-date with upcoming certification opportunities. Visit www.mylearn.oracle.com to get started.
12:41
Nikita: Welcome back! Continuing with the updates… next up is the automatic bulk data move on sharding keys. Ron, can you explain how this works and why it's significant?
Ron: And by the way, this doesn't have to be a bulk data. This could be just an individual row or it could be bulk data, a huge piece of data that is going to be moved.
Now, in the past, when the shard key of an existing record was going to be updated, we basically had to remove that row from the table, so moving it to a temporary table or moving it to another location. Basically, you're deleting the row, and then change the value and reinsert the row so the row would then be inserted into the proper location.
That causes a lot of work and requires specific code-writing and whatnot to manage those specific type of situations. And of course, if there is a lot of data, now, you're moving those bulk data in twice.
13:45
Lois: Yeah… you’re moving it to one location and then moving it back in. That’s a lot of double work, not to mention that it all needs to be managed manually, right? So, how has this process been improved?
Ron: So now, basically, you can just go ahead and update the value of the partition key, and then data will then automatically move to the new location. So this gives you complete flexibility of the shard key values.
This is also completely transparent, and again, completely managed behind the scenes. All you do is identify what is going to be changed. Then the database will maintain the actual data location and movement behind the scenes.
14:31
Lois: And what are some of the specific benefits of this feature?
Ron: Basically, it allows you to now be flexible, be able to update the shard key without having to worry about, oh, which location does this value have to exist? Do I have to delete it, reinsert it? And all of those different operations.
And this is done automatically by Oracle database, but it does require for you to enable row movement at the table level. So for tables that are expected to have partition key updates kind of without knowing when that happens, can happen, any time it happens by the clients directly or something, then we may need to enable row movement at the table level and leave it enabled. It does have tiny bit of overhead of maintaining these row locations behind the scenes when enabled, as it maintains some metadata behind the scenes.
But for cases that, let's say I know when the shard key is going to be changed, and we can use, let's say, a written procedure or something for that when the particular shard key is going to be changed. Then when the shard key is updated, the data will then automatically move to the new location based on that shard key operation. So we don't need to move the data manually in and out or to different locations.
16:03
Nikita: In our final segment, I want to bring up the update on splitting and moving a partition set, or basically subpartitioning tables and then being able to move all of the data associated with that in a bulk data move to a new location. Ron, can you explain how this process works?
Ron: This gives us a lot of flexibility for data management based on future requirements, size of the data, key changes, or key management requirements.
So generally when we use a composite sharding, remember, this is a combination of user-defined partitioning plus the system partitioning put together. That kind of defines a little bit more control over how the shards are, where the data is distributed evenly across the shards.
So sometimes based on this type of configuration, we may actually need to split partition and that can cause the shard key values to be now assigned to a new shard space based on the partitioning reconfiguration. So data, this needs to be automatically managed. So when you go ahead and split partition or partitionsets, then the data based on your configuration, based on your identification can automatically move to the new location automatically between those shard spaces.
17:32
Lois: What are some of the key advantages of this for clients?
Ron: This provides a huge benefit to clients because it allows them flexibility of better managing their configuration, expanding both configuration servers, the structures for better management of the data and the load. Data is completely online during all of this data move. Since this is being done behind the scenes by the database, it does not impact the availability of the data for anyone who is actually using the data.
And then, data is generally moved using transportable tablespaces in big bulk and big chunks. So it's almost like copying portions of the files. If you remember in Oracle database, we could take a backup of big files as image copy in pieces. This is kind of similar where chunks of data can then be moved and then transported if possible depending on the organization of the data itself for those particular partitions.
18:48
Lois: So, what does it look like in practice?
Ron: Well, clients now can go ahead and rearrange their data structure based on the adjustments of the partitioning that already exists within the sharded database. The bulk data move then automatically triggers once the customer execute the statement to go ahead and restructure the partitioning. And then all of the client, they're still accessing data. All of the data operation are completely maintained behind the scene.
19:28
Nikita: Thank you for joining us today, Ron. If you want to learn more about what we discussed today, visit mylearn.oracle.com and search for the Oracle Database 23ai New Features for Administrators course. Join us next week for a discussion on some more Oracle Database 23ai new features. Until then, this is Nikita Abraham…
Lois: And Lois Houston signing off!
19:51
That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast. -
In this two-part episode, hosts Lois Houston and Nikita Abraham are joined by Ron Soltani, a Senior Principal Database & Security Instructor, to discuss the ins and outs of database sharding. In Part 1, they delve into the fundamentals of database sharding, including what it is and how it works, specifically looking at Oracle Database Sharding and its benefits. They also explore the architecture of a sharded database, examining components such as shards, shard catalogs, and shard directors. Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/137192/207062 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript:
00:00
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!
00:26
Nikita: Hello and welcome to the Oracle University Podcast. I’m Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs.
Lois: Hi there! The last two weeks of the podcast have been dedicated to all things database security. We discussed why it’s so important and looked at all the new features related to database security that have been released in Oracle Database 23ai, previously known as 23c.
00:55
Nikita: Today’s episode is also going to be the first of two parts, and we’re going to explore database sharding with Ron Soltani. Ron is a Senior Principal Database & Security Instructor with Oracle University. We’ll ask Ron about what database sharding is and then talk specifically about Oracle Database Sharding. We’ll look at the benefits of it and also discuss the architecture.
Lois: All this will help us to prepare for next week’s episode when we dive into each 23ai new feature related to Oracle Database Sharding. So, let’s get to it. Hi Ron! What’s database sharding?01:32
Ron: This is basically an architecture to allow you to divide data for better computing and scaling across multiple environments instead of having a single system performing the work. So this allows you to do hyperscale computing and other different technologies that are included that will allow you to distribute your queries and all other requests across these multiple components to be able to get a very fast response.
Now many times with this distributed segment across each kind of database that is called a shard allow you to have some geographical location component while you are not really sharing any of the servers or the components. So it allows you separation and data management for each of the shards separately. However, when it comes to the application, the sharded database is totally invisible. So as far as the application is concerned, they connect to a global service, submit their statements. Everything else is managed then by the sharded database underneath.With sharded tables, basically it gets distributed across each shard. Normally, this is done through horizontal partitioning. And then the data depending on the partitioning scheme will be distributed across like server A, server B, server C, which are independent servers that are running independent databases.
03:18
Nikita: And what about Oracle Database Sharding specifically?
Ron: The Oracle Database Sharding allows you to automate how the data is distributed, replicated, and maintain the kind of a directory that defines the complete sharding scheme, while everything is distributed across many servers with no sharing whether the hardware or software. It allows you to have a very good scaling to be able to scale based on this partitioning across all of these independent servers.
And based on the subset and the discrete data configuration, you can go ahead and distribute this data across these components where each shard is an independent data location or data component, a subset of data that can be used, whether individually on its own or globally across all of the shards together. And as we said to the application, the Oracle Database Sharding also looks as a single component.
04:35
Lois: Ron, what are some of the benefits of Oracle Database Sharding?
Ron: With Oracle Database, you basically have linear scaling capability across as many shards as you like. And all of the different database configurations are supported with this. So you can have rack databases across the shards, Oracle Data Guard, GoldenGate. So all of the different components are still used to give you all of the high availability and every other kind of functionality that we generally used to having a single database with.
It provides you with fault toleration. So each component could be down. It could have its own replicated data. It doesn't affect other location and availability of the data in those other locations.
And finally, depending on data sovereignty and configuration, you could actually distribute data geographically across the different locations based on requirements and also data access to provide a higher speed for local data management.
05:46
Lois: I’d like to understand more about the architecture of Oracle Database Sharding. Ron, can you first give us a broad overview of how Oracle Database Sharding is structured?
Ron: When it comes to dealing with Oracle Database architecture, the components include, first, your shards. The shards-- each one is an independent Oracle Database depending on the partitioning you decide on a partition key and then how the actual data is divided across those shards.
06:18
Nikita: So, these shards are like separate pieces of the database puzzle…Ok. What’s next in the architecture?
Ron: Then you have shard catalog. Shard catalog is a catalog of your sharding configuration, is aware of all of the components in the shard, and any kind of replicated object that master object exists in the shard catalog to be maintained from there.
And it also manages the global queries acting as a proxy. So queries can be distributed across multiple shards. The data from the shards returned back to the catalog to group together and then sent back to the client.Now, this shard catalog is basically another version of an Oracle Database that is created independently of the shards that include the actual data, and its job is to maintain this catalog functionality.
07:19
Nikita: Got it. And what about the shard director?
Ron: The shard director is like another form of a global service manager.So it understands the sharding by being able to access the catalog, knows where everything exists. The client connection pool will hit the shard director. In general, communication and then whether it's being distributed to the shard catalog to be able to proxy it, or, if the key is available, then the director can send the query directly to the shard based on the key where the data exists. So the shard can then respond to the client directly. So all of the connection pool and the components for global administration, generally managed by the shard director.
08:11
Nikita: Can we dive into each of these components in a little more detail? Let’s go backwards and start with the shard director.
Ron: The shard director, as we said, this is like a global service manager. It acts as a regional listener where all of the connection requests will be coming to the shard director and then distributed from that depending on the type of connection that is being used.
Now the director understands the topology--maintains the complete understanding of the mapping of the data against the shards. And based on the shard key, if the request are specified on the specific key, it can then route the connection request directly to the shard that is appropriate where the data resides for the direct response.
09:03
Lois: And what can you tell us about the shard catalog?
Ron: The shard catalog, this is another Oracle Database that is created for special purpose of holding the topology of the sharded database. And have all of the centralized information metadata about your sharded database. It also act as a proxy.
So, if a client request comes in without providing a shard key, then the request would go to the catalog. It can be distributed to all of the shards. So the shards that you actually have the data can respond, but the data can then be combined and sent back to the client. So, it also creates the master copy of all the duplicate tables that are created in the shard database.09:56
Lois: Ok. I’ve got it. Now, let’s talk more about the shards themselves.
Ron: Each shard is basically a database. And data is horizontally partitioned to be placed on each of these shards. So, this physical database is called the shard. And depending on the topology of your sharding, there could be user sharding, for example, where multiple keys are in a single shard or could be a system sharding that based on the hash value data is distributed whether singly or multiple data components across each shard.
Now, this is completely transparent to the application. So, as far as application is concerned, this is a single database and the response everything that they do is generally just operating as a single database interaction.However, when it comes to the administrators, each shard is a separate database. Each shard can be managed independently and can have its own standby and other components that is then set up for high availability and management of the data operations.
11:21
Do you have an idea for a new course or learning opportunity? We’d love to hear it! Visit the Oracle University Learning Community and share your thoughts with us on the Idea Incubator.
Your suggestion could find a place in future development projects! Visit mylearn.oracle.com to
get started.11:41
Nikita: Welcome back! Let’s move on to global services and the various sharding methods. Ron, can you explain what global services are and how they function in a sharded database?
Ron: Global services is generally the service that is used for the application to be able to connect to the sharded database. This is provided and supported through the shard director. So clients are routed using this global service.12:11
Lois: What are the different sharding methods that are available?
Ron: When it comes to sharding methods that were available, originally we started with the system sharding, which is a hash partition, basically data is distributed evenly across the shards. Then we needed to allow for the user-defined sharding because sometimes it's not about just distributing the data evenly, it's also about controlling where the data goes to be able to control individual query execution based on the keys. And even for data sovereignty and position of the data itself.
And then a composite sharding, which provides you kind of a combination of the user-defined sharding and the system hash sharding that gives you a little bit of a combination of the two to better distribute your data across the shard.
And finally, sub-partitioning all types of sub-partitionings are supported to provide a better structure of the data depending on the application schema design.
13:16
Nikita: Ron, how do clients typically connect to a sharded database?
Ron: When it comes to the client connections, all the client connections are generally routed to the director and then managed from there. So there are multiple ways that clients can connect. One could be a direct connect. With a direct connect, they're providing the shard key in the request. Therefore, the director knowing the topology can route the client directly to the shard that has the data.
The proxy routing is done by the catalog. This is when generally a shard key is not provided or data is requested from many shards. So data will then request is then sent to the catalog. The catalog database will then distribute the query to the shards, collects the results, and then combine sending it back to the client acting as a proxy sitting in the middle.
And the middle tier routing, this is when you can expose the middle tier to the structure of your sharding. So when the middle tier send the request, the request identifies which shard the data is going to. So take advantage of that from the middle tier. So the data is then routed properly. But that requires exposing the structures and everything in the middle tier.
14:40
Lois: Let’s dive a bit deeper into direct routing. What are the advantages of using this method?
Ron: With the client request routing, as we talked about the direct routing, this allows the applications to get very quick data access when they know the key that is used for the distribution of the data. And that is used to access the data from the shard.
This provides you a direct connection to a shard from the shard director. And once the connection is established, then the queries can get data directly across the shard with the key that is supplied. So the RAC respond for that particular subset of data with the data request.Now with the direct routing again, you get some advantages. The advantage is you have much better performance for capturing subset of the data because you don't have to wait for every shard to respond for a particular query.
If you want to distribute data geographically or based on the specific key, of course, all of that is perfectly supported. And kind of allows you to now distribute your query to actually the location where the data exists.
So for example, data that is in Canada can then be locally accessed in Canada through this direct access. And of course, when it comes to management of your client connection, load balancing of those connections. And of course, supporting all types of queries and application requests.
16:18
Nikita: And what about routing by proxy?
Ron: The proxy routing is when queries do not supply the actual sharding key, where identifies which shard the data reside. Or the actual routing cannot be properly identified. Then the shard director will send the request to the catalog performing the work as the proxy.
So proxy will then send a request to all of the shards. If any shards can be eliminated, would be. But generally all of the shards that could have any portion of the data will then get the request. The requests are then sent back to the proxy. And then the proxy will then coordinate the data going back and forth between the client.
And the shard catalog basically hands this type of data access to the catalog to act as the proxy. And then the catalog is-- the shard director is no longer part of the connection management since everything is then handled by the shard catalog itself.
17:37
Lois: Can you explain middle tier routing, Ron?
Ron: This generally allows you to use the middle tier to define which shard your data is being routed into. This is a type of routing that can be used where the data geographically have some sovereignty or the application is aware of the structure.
So the middle tier is exposed to the sharded database topology. So understand exactly what these components are based on the specific request on the shard key, then the middle tier can then route the application to the appropriate location for the connection.And then the middle tier, and then the either one shard or the subset of shard will maintain those connections for the data access going back and forth since the topology is now being managed by the middle tier. Of course, all of the work that is done here still is known in the catalog, will be registered in the catalog. So catalog is fully aware of any operations that are going on, whether connection is done through middle tier or through direct routing.
18:54
Nikita: Ron, can you tell us how query execution and DDL operations work in a sharded database?
Ron: When it comes to the query execution of the application, there are no changes, no requirement for identifying specifically how the data is distributed. All of that is maintained behind the scene based on your sharding topology.
For the DDL, most of your tables, most of the structures work exactly the same way as it did before. There are some general structures that are associated to the sharded database that we will originally create and set up with mapping. Once the mappings are configured, then the rest of the components are created just like a regular database.
19:43
Lois: Ok. What about the deployment process? Is it complicated to set up a sharded database?
Ron: The deployment for the sharded database is fully automated using Terraform, Kubernetes, and scripts that are put together. Basically what you do is you provide some of your configuration information, structure of your topology through an input file, like a parameter file type of a thing.
And then you execute the scripts and then it will build everything else based on the structure that you have provided.
20:19
Nikita: What if someone wants to migrate from a non-sharded database to a sharded database? Is there support for that?
Ron: If you are going to migrate from a regular database to a sharded database, there are two components that are fully shard aware.
First, you have the Shard Advisor. This can look at your current structure, the schema, how the data is distributed. And the workload and how the data is used to give you recommendation in what type of sharding would work best based on the workload.
And then Data Pump is fully aware of the sharding component. Normally, we use Data Pump and load into each of the databases individually on its own. So instead of one job having to read all the data and move data across many shards, data can be loaded individually across each shard using Data Pump for much faster operations.
21:18
Lois: Ron, thank you for joining us today. Now that we’ve had a good understanding of Oracle Database Sharding, we’ll talk about the new 23ai features related to this topic next week.
Nikita: And if you want to learn more about what we discussed today, visit mylearn.oracle.com and search for the Oracle Database 23ai New Features for Administrators course. Until next week, this is Nikita Abraham…
Lois: And Lois Houston signing off!
21:45
That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
-
In this episode, hosts Lois Houston and Nikita Abraham continue their exploration of Oracle Database 23ai's database security capabilities. They are joined once again by Ron Soltani, a Senior Principal Database & Security Instructor, who delves into the intricacies of the new hybrid read-only mode for pluggable databases, the flexibility of read-only users and sessions, and the newly introduced developer role. They also discuss simplified schema-level privileges and the integration of Azure Active Directory with Oracle Database. Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/137192/207062 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode. --------------------------------------------------------- Episode Transcript:
00:00
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!
00:26
Lois: Hello and welcome to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me today is Nikita Abraham, Principal Technical Editor.
Nikita: Hi everyone! In our last episode, we discussed database security, why it is so important, and all its different components. Today, we’re going to be continuing that conversation by looking at all the new features related to database security that have been released in Oracle Database 23ai, previously known as 23c.
00:59
Lois: And we’re so happy to have Ron Soltani back as our guide. Ron is a Senior Principal Database & Security Instructor with Oracle University. Hi Ron! Thanks for joining us again! We have a list of the new features related to database security and we’d like to ask you about them one by one, starting with the new mode for pluggable databases. What’s that about?
01:21
Ron: With the hybrid read-only mode for pluggable database, the database could be in the read/write mode or read-only mode, depending on the user that is actually connected. So one of the things we have to realize is the regular read-only mode has one major issue. The major issue is everything, including data dictionary, including SysAux and all of the other elements are also locked up read-only.
So we cannot do any database maintenance. We cannot collect statistics to monitor anything. So you pretty much have to hard tune everything for the load you want and maintain everything. And this happens in many warehouse environments, in environments where the data itself is generally loaded. And then just heavily read. So it requires to be in a read-only mode to protect it.
So with a hybrid read-only mode, if you are a local user in the PDB, even a PDB administrator-- so I can create a local user in the PDB as a PDB administrator. And grant that PDB administrator even sysdba privilege. But once the PDB is open hybrid read-only mode, even for that user, the PDB is read-only. However, if a common user connect, who is, as you know, is a CDB user. Generally, CDB-level privileges granted and considered CDB administrators.
If they connect to the PDB, then the PDB is actually in read/write mode. So now, they can take snapshots. They can use all of the database tools to monitor how things are going. They can perform maintenance. So this allows us to be able to perform patching, maintenance, and other database-related operation.
03:17
Nikita: So you don’t have to flip back and forth between read-only, read/write, read-only, read/write…
Ron: Because you know if we have database read/write to go to read-only, generally, we would have to shut down the database, then go to read-only. Then from read-only, we can go to read/write. But then going back to read-only, we have to shut down again.
Lois: Which was the issue with the normal read-only on the pluggable database, right? I’m glad that’s been made easier. Ok… Moving on to the next new feature, which is read-only users and sessions. What can you tell us about this one, Ron?
03:51
Ron: As we previously discussed, you can put the PDB in the hybrid read-only mode. But then now the PDB is read-only for all local application users. However, let's say we have an environment where you have multiple application users.
One needs to be able to perform maintenance and perform updates where other sessions who are just reading the data to protect against all security element, and then better performance and operation management. We are going to set up read-only.So setting up read-only at the pluggable database, that can be very high level depending on the application need. So with the read-only users and session, this will give you capability of setting read-only either for a particular user.
So when the user connects, all the user can do is read-only process. We do a lot of testing, for example. And we have users that may have read/write privilege in the test environment, then we want to go ahead and perform other operation.
So we would have to take privileges away, set the read-only, then go back and change again to read/write. So performing all of those different type of tests and even with the development has always been an issue.
So having granular capability of managing at a user or a session level can give us a major benefit of better granularly managing all application needs without sacrificing either security or having extra components that would have to be done by administrators.
05:33
Nikita: Yeah, this gives you a lot of flexibility and you don’t have to keep temporarily changing privileges or configuring specific types of sessions. It’s also an easy way to control user behavior, right?
Ron: An application, as we said, have the schema owner that today we want to have a schema-only user for the schema owner. That is usually nobody connect us.
But then we have multiple schema users that one may be used for performing updates, one is used for administration, and one can be used for read-only. So this can give me a mechanism to manage that, or if a particular operation needs to run and for security purposes, that particular session needs to be set to read-only. So that gives us major control over it.
And in the cloud environment, this can be a very, very good component for better managing all of the security levels, where you can enable very fine-grained control while supporting all functionality of the application.
06:39
Lois: Ok. So, can you tell us about this new developer role in the database?
Ron: If we think about application administration, usually we create a schema owner. And we start by giving that the schema owner privileges-- grant them a resource role. By having resource role, they can create simple objects. But when you design an application, you need to implement it, test that, and then deploy it.
Today, there are many, many complex objects that can be used at the application level to manage the application. So today, we grant the resource role to the schema owner. Then we wait until they complain. They don't have privilege for certain object they want to create. Then we're going to have to grant them privileges as needed, and that used to be the way the security had worked.
But today since we have a schema only account where we can only enable the account when we want to do any type of schema work, and then it's locked up so the schema is protected, giving the schema owner the application role, the DB application role, now that has all the privileges in it, should not cause any security issue when managed properly, and will provide them with all of the privileges that they need to perform their work, including there are many complex schema structure like analytical views, hierarchies, dimensions, data-specific types that you can create.
And many of these type of privileges are not just assigned through a regular privilege assignment. Some of them are assigned through procedures.
08:21
Lois: And could you give us some examples of how this feature could be used?
Ron: So there are many different ways of granting all of these granule privileges. So at the time that we go ahead and perform development of the schema and all of that depending on what's available, we don't know really what privileges do we need.
And as we said, there are many packages that we may be able to use to create complex objects that then gradually have to go ahead and get privileges on executing those packages and to be able to use them. And as we said at the time we actually performed the application, many of these objects, we may not even know we're going to use them until later on becomes evident or it may be a better structure to represent what we want.So having to add and continuously deal with these type of changes can become extremely kind of cumbersome and tedious. It also delays all of the operations, especially now that the application schema owner can be secured. So we can grant this developer role to the schema owner, give the schema owner all privileges that is needed very quickly that they can now manage their schemas and manage all complex objects for that schema operation.
So the role is called db developer role. And just like any other role, you would connect as an administrator, grant db developer role to the schema owner. Now, we don't need to grant the resource role and all other things, because everything here is included in the db developer role.
10:01
The Oracle University Learning Community is an excellent place to collaborate and learn with Oracle experts and fellow learners. Grow your skills, inspire innovation, and celebrate your
successes. All your activities, from liking a post to answering questions and sharing with others, will help you earn a valuable reputation, badges, and ranks to be recognized in the community. Visit www.mylearn.oracle.com to get started.10:28
Nikita: Welcome back! Ron, how have schema-level privileges been simplified in 23ai?
Ron: To be able to understand this, first we can review the privilege assignment in Oracle Database. First, you can be granted a privilege at an object level, so you can perform certain work on a particular object.
However, let's say I have a user account that I'm going to use an app user who's going to have to read from multiple objects within a particular schema. Now this granting at the object level is too low because I have to go at each object and assign the privileges needed on that particular object to the user.
Or we had our system privilege, for example, grant create any table to a user. The problem with that is now you can create any table within the schema that I want you to work with. But that privilege goes across all the schemas in the database, of course, not the database schemas itself-- those are protected, but across all user schemas.
11:34
Lois: Right. So, you're getting that privilege on other schemas that you may not really need that privilege for...
Ron: So now the gap is kind of met with creating a schema-level privilege that allows you to grant the same any privilege but on all objects of a particular schema and not granted across all the schemas.
So this now allows us to much better be able to manage schemas, have schema user accounts with different level privileges on all the objects that they need to perform the type of work that they need to, without having to granularly assign each one of those privileges as we used to create many different roles with different privileges needed, then try to control the users by granting them those roles. Here, these are much better simplified by going through the schema-level privilege.
12:34
Nikita: Ron, I want to ask you about the new feature on creating audit policies at the column level.
Ron: So if you remember, in the past, we talked about we can create audit policies with the old system where you would identify what to audit. But then you had to manage a whole bunch of parameters and security. And protecting audit even from the administrator were major issues.
In 12, Oracle identified or added the unified audit, which gives you protection on the audit schema. Even administrators cannot access it. You manage it through privileges that are assigned specifically to users who are going to manage the audit.
And it also allow you to audit Oracle operations, tools like Data Pump, like RMAN. So you can create a really secure audit environment monitoring everything in the database using unified audit and then maintain and manage those audits.One of the important aspect of auditing is generating the minimal amount of audits. So this way, audits can be reviewed because if you generate too much audit, it is very hard to automate either using an automated system to review the audits or having users to review those audits.
Furthermore, if we wanted to then audit specific columns and different operation like SELECT, DML, we would have had to use the row-level security and build additional policies to be able to then individually monitor those columns, which not very simple to use and manage. And then the audits are put in different tables. Having to maintain all of those, relate them has always caused major issue overall.
So the benefit of having now this column-level audit added to the normal unified audit policies is that you can go ahead and build now your audits instead of at the table level, only for a particular column. This is going to reduce the false positive results that are generated because if I'm going to put update on a table, not updating any column can generate an audit.
But if I put update on the column salary, then only if the salary is updated, the audit is generated. So that can give me just the audits that are needed without the additional false positive audits that are generally generated.
15:08
Lois: Ron, can you talk to us about the management of authorization for Unified Audit administration, especially when using Database Vault?
Ron: So first as we know for the Unified Audit, you have audit admin privilege and audit viewer privilege.
If you want to be able to create and administer and manage all of the audit information, including the audit purging and time periods and all of that, you have to have audit admin privilege. If you want to be able to read and generate the reports or things like that from the audits that have been created, you have to have audit viewer privilege.
Now we also have Oracle Database Vault. Database Vault kind of uses a row level security, but not on the end user data. It applies this row level security and administration on Oracle data dictionary. And allows you to control when particular object can be used, at what level can they be used?
And give you complete control over how the actual database and the objects are used and become available to other users in the database, including other administrators, even schema owners. So when the Database Vault is then applied and enabled, in the past, we could have managed the Unified Audit, which was kind of very funky to put one of the major security functions outside the main security Administration utility of the database.
So now, the Unified Audit has been incorporated into the Database Vault. So you can now use Database Vault to go ahead and set up the privileges and configuration for the authorizations required for managing Unified Audit. This also controls all the high-level users, including SYS, SYSTEM, and anyone who may have DBA roles or other high-level privileges.
So this allows us to now enable the Database Vault, and then manage the authorizations for the Unified Audit through Database Vault. Therefore, all authorization administration is unified under the same security tool, which is Database Vault.17:28
Nikita: The final new feature to discuss is the integration of Microsoft Azure Active Directory with the Oracle database environment. What can you tell us about it, Ron?
Ron: This has been requested by many of the clients who use other platforms and active directories and then need to access either the Oracle OCI, Oracle Cloud where the databases are running or having Oracle databases even in a local environment.
So wanted to be able to now allow this to happen. So if you remember, originally we had capability of mapping users from the database into Oracle Active Directory. So this way the user's role privileges can be centrally managed and the user does not inherit any privileges in the database. So if the user directly connect to database, has no privileges. Connect properly through Active Directory, everything enabled.Then in Database 18, they created the commonly managed users, the CMUs. Where we could now map a third party Active Directory and then be able to use that into connecting to Oracle database for authentication and user administration. However, many of our clients use Microsoft Azure Active Directory. And they wanted to be able to integrate that particular Active Directory into Oracle environment, especially in the Oracle OCI Database as a service environment.
So to be able to do that, Oracle has multiple components that they have built to allow this to be able to now be configured and used. So the client can use these Active Directory for their user administration centrally.
19:20
Lois: With that, I think we’ve covered all the new features related to database security in 23ai. Thanks so much for taking us through all of them and giving us some context.
Nikita: Yeah, it’s really been so helpful. To learn more about these new features and watch some demonstrations on them, visit mylearn.oracle.com and search for the Oracle Database 23ai New Features for Administrators course. Join us next week for a discussion on some more Oracle Database 23ai new features. Until then, this is Nikita Abraham…
Lois: And Lois Houston signing off!
19:54
That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
-
Join hosts Lois Houston and Nikita Abraham, along with Senior Principal Database & Security Instructor Ron Soltani, as they dive into the critical topic of database security. In the first of a two-part series on database security in Oracle Database 23ai, they discuss the importance of protecting data against external and internal threats, common security risks like phishing and SQL injection, and the principle of least privilege. Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/137192/207062 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript:
00:00
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!
00:26
Nikita: Hello and welcome to the Oracle University Podcast. I’m Nikita Abraham, Principal Technical Editor with Oracle University, and joining me is Lois Houston, Director of Innovation Programs.
Lois: Hi there! In case you missed last week’s episode, we’ve begun a new season of the podcast, talking about all the new features in Oracle Database 23ai. We covered blockchain tables and new features, and today’s episode is going to be one of two that will be dedicated to database security.
Nikita: Right, Lois. So, in Part 1, we want to set the scene, so to speak, by looking at an overview of database security so that when we discuss some of the new features, we’ll know exactly where they actually fit into the process. Joining us for these two episodes is Ron Soltani. Ron is a Senior Principal Database & Security Instructor with Oracle University.
01:16
Lois: Hi Ron! Thanks for being with us today. To start off, let's discuss the importance of database security. Why is database security so critical today?
Ron: Security requirements, describes the need for keeping things private and make sure that we protect against threat, against data destruction. We also have, today, data that is global. Therefore, there is consolidation of the data. There is globalization.
There is data sourcing, locational, where the data is actually located, rules opposed by different governments, and guidelines that enforce a certain type of security administration on the data. And finally, there are many different companies or organizations that actually come up with either guidelines or rules that must be followed for security aspect that we must set up and build compliance.
02:24
Nikita: Ron, what are some of the common security risks that databases face?
Ron: Security risk can include external threats that could be unauthorized users trying to use phishing, get privileged user information, and get in as a privileged user to do whatever damage they want. Denial-of-service attack, one of the most common attacks out there where the attackers just create or attack the components, like a listener, for example, in a database, and cause a situation where the listener can no longer establish connection to the database. So now no client can connect to the database to get data, which is that denial-of-service attacks.
Having unauthorized access to the data-- so again, this is generally done through phishing or sometimes even SQL injection. SQL injection also allows you to insert SQL statement in the application where it's not expected, where it can then convert into an executable in the database and then have unwanted data returned for the user.
03:42
Nikita: Sorry, can you explain that?
Ron: For example, when you go to Google, you want to run a search. They expect you to say, meaning of a particular word.
Now, what if I knew the structure of the data organization in Google? And instead of just putting in meaning of whatever word, I actually plug in a SQL statement that then passed along to the Google system to be executed. And then that SQL, if the components and everything exist and within the privileges of what is being executed, could expose some information to me. So that's the idea with being able to perform that type of operation.04:24
Lois: Ok. So, those are external threats. But, could you also have internal threats?
Ron: Internal threat could be abused by someone who is privileged, could be sabotage of the system and the data. It could be data complexity that creates an environment where data is not properly being secured and even accidental damage. It's a security issue.
And then finally, if there is a damage, we do need to be able to perform recovery. So we create backups and data access in those. Therefore, those recovery information must be properly secured. And finally, the omission, being able to block access or cause issues with the data. Then having external threats coming in through the internal abuse, so internal abuse could actually open door to allow external threats to get in.
Now, the final type of security risk could be coming in from partners who have privilege to be able to load or access and get data. For example, I may sell a particular product. But the product description is actually coming from the product distributor.
05:47
Nikita: Yeah, so they have access to push that product information into your system. So, what are the typical points of attack for a database? I’m familiar with phishing.
Ron: People send you emails or do something to be able to get information from the pieces and things that come back. For example, this is one of the reason for many operation. We would return false error messages. Like in Oracle database, if you don't have privilege on the table and you try to select it, we tell you a table or a view does not exist. So this way, you don't know if it's a table, you don't know if it's a view. And as far as you know, it doesn't exist. So the name you have does not correspond to any particular data.
06:32
Nikita: That’s clever!
Ron: If we would tell you don't have privilege, now you know the name of this table exists. So now I just got to find a way of hacking the table. So this is basically phishing means, extracting different pieces of information through different channels, being able to put them together.
Then in database, we have some privilege known accounts that if not protected can be a vulnerable access. The back doors into the database. For example, somebody being able to get to the operating system DBA group, and then connect to the database without user ID and password.
That's why we have to protect every layer. Any debug codes that may be available that could reference how the operation of the system is actually going. Creating cross-scripting between the different data and then operations that goes on. And as we talked about, SQL injection.
07:28
Lois: Can you dive a little deeper into SQL injection, Ron?
Ron: With SQL injection, you kind of have to understand that, in general, SQL injection means somebody, like we said, knows the structure of something, knows the structure of the way the application is operating, and then be able to inject a SQL statement where they would generally put a condition or pass some parameters or some information to the application.
So, then that SQL statement becomes part of that statement and submitted to the database. Now, we need to understand SQL injection is not about the person, is not about generally your overall configuration of the database.The most important aspect of SQL injection is about the session that is actually doing the work. For example, if I am a DBA and I am going to collect statistics for a table. If I connect a SYS DBA to collect that statistics and somebody hacks into my session and inject a SQL drop database. Database gone, because the session has SYS DBA privilege.
But if I have a user that only has create session privilege and execute a script. And in this script, I write the statement to collect statistics and I give that script only the privilege to collect stats. So now I can connect as that user with that minimal privilege, just execute the script.
So now anyone inject any SQL into the session, that will never be executed because the session has no privilege. So this is the important of SQL injection for us to understand that the importance is what happens at the session level.
And many of the security element we will see, like read-only session, hybrid read-only PDB and things like that are related into this type of SQL injection or abuse.
09:30
Lois: Yeah, we are looking forward to talking through those new features in the next episode.
Ron: So the common vulnerabilities can be exploited, and also any of the users that are part of the operations that can be set up into the string and supplied into middle of statements and things like that.
09:56
Did you know that Oracle University offers free courses on Oracle Cloud Infrastructure? You’ll find training on everything from cloud computing, database, and security, artificial intelligence and machine learning, all free to subscribers. So, what are you waiting for? Pick a topic, leverage the Oracle University Learning Community to ask questions, and then sit for your certification. Visit www.mylearn.oracle.com to get started.
10:25
Nikita: Welcome back! One of the concepts I wanted to ask you about, Ron, is the principle of least privilege. Can you explain what it really means?
Ron: Principles of least privilege, again, means the work that needs to be done has to have minimal privilege.
10:40
Nikita: But, we’ve always thought about that, right? Giving a user minimal privileges…
Ron: Well, back in the old days, we used to execute everything as a schema owner.
Therefore, we had privilege on all the data. Then we said, OK, let's create schema users and only give them like a read privilege or this privilege. So they can only do the type of work they need to do, which is fine.
But at the same time, that can be very complex. Now I need a lot of different users and whatnot. So when it comes to principles of least privilege, this is generally about only installing whatever software that is required. Only enable or turn on whatever machines and segments that is going to be used.
Have proper operating system level users and privileges configured for all of the software that is installed at the operating system level. Have proper administrator account that are properly maintained. Set up privilege user account for each operation.
So when we do maintenance and database administration, we are not creating very high-level privileged session. That's why some of the differences privileges was created in database 12 and up, like Sys backup, Sys DG, Sys RAC. So you don't inherit the privilege as Sys DBA to actually do the work. You only have privileges for what you need.
And, of course, limit the user's access to particular object and things that they need to do. However, as I mentioned, this is not just about the user level. This is also about the session level. If I'm going to do maintenance and I'm connecting as a schema owner, somebody inject a SQL drop table, table gone. So that's why it is very important for us to be able to have control over how sessions can also operate within the database.
12:34
Lois: Right, so, what about the strategy of defense in depth?
Ron: Defense in depth. That means we have to strengthen and apply security at every level, whether it being at the securities applied at the operating system in the database, in the application, in the network.
So we have to have policies defining all the different security levels. Most important, train users. So no mistakable damages. Harden every component, including the operating system. Set up proper firewalls.
Set up proper network security, like use of the Oracle firewall that protects against unwanted SQL statements. We can compare SQL statement to a whitelist of acceptable statements. And then other database security features like VPD, the auditing as we will talk about, and other components to give you an overall very secure environment.
13:35
Nikita: Ron, what are the fundamental aspects of managing security only within the database… not including the operating system or the application?
Ron: So first, we have to have confidentiality. Confidentiality means that we need to make sure that all of the data is properly secured at a data level, whether it be both at the storage level, in the database for data usage, and we have many different ways of doing confidentiality management.
Number one, properly creating users, maintaining users with the proper password through proper authentication. And then setting up authorization that privileges may not be enough because if I give you select privilege, you can see every column, every row.
So I may need row level security, data redaction, data masking for duplication, and other mechanism to help us manage even subset of data for that particular security.
14:35
Lois: Ok… so that’s confidentiality. What’s next?
Ron: Data integrity means that we need to make sure that data is not destroyed, whether it being addressed in the database, in memory, in data file, in backup, in exports, or during transmission in the network.
So we usually apply encryption and check-summing not only to protect the data, but also to validate, make sure it's not corrupted.
Next data availability, which means today, especially, we are 24/7 operation. And remember we talked about denial of service attack on a database. That usually attacks on the listener, because if the listener is crashed, nobody can connect.
We have to then utilize available tools and components like RAC to have multiple instances in case a particular host crashes. And I lose a particular instance. Data Guard in case my storage and a whole database crashes. The PDB and real-time PDB management with duplication, having a PDB standbys that are maintained and managed behind the scene. Using PDB snapshots, which are point in time. Preserve data that I can use it for restoring data at those particular point in time.
Backup recovery through RMAN or other backup recovery processes. So in case data is damaged, I can restore it and recover it. And finally, auditing. Auditing historically was always known as after effect.16:09
Nikita: That’s what I was wondering… You only see what’s going on after something happens, right?
Ron: It also can be a deterrent when people know they are being audited, they're more careful, don't make mistakes. Try not to, of course, do anything you get caught. And today, this auditing can also be set up in a way that it cannot only catch what is going on. It can actually help us better secure data and have much better responses.
Now the problem with auditing has always been the overhead. That's why the unified audits that provides us with much less overhead for management can give us an extreme detailed audits. And then the new features allows us to even more reduce the amount of audits that are generated by only auditing at the column level and better protection for those audits.
By the way, in the older days, most auditing was done at the app, because we never knew who the end user the app is. But today with being able to have Active Directory mapped into the database and information passed between the two, all audits can actually come back centrally to the database.17:21
Lois: So, to wrap up today’s conversation, Ron, can you just summarize database security for us? All the things we need to think about.
Ron: So database security starts with making sure, number one, our network is secure and we are accessing the data through a very secure connection coming in from the user.
If required to be, could have a three-tier environment where the clients go through a first external firewall to get to the middle tier. Then from the middle tier go through internal firewall to get to the database, or if this is like a direct access and things like that, setting up secure network coming in through that like for administration, remote administrations, and operations.
Then, setting up proper authentication and authentication management, configuring detail, access control and setting up multiple level of security for data accesses, not just at the table level, even at the row and column level.
And building a complete data confidentiality by not only adding in storage encryption and all of the management of the data, even on components sitting outside the database, of course, we have Oracle components that can manage some of those for you, like sample backups, RMAN, and things like that.
And to get this complete data confidentiality, you also add in, as we said, an efficient auditing that can then describe any issues, tell us where the problem is, how it happened. And if we set up an audit system that is very focused, then we can even tie it up to triggers, to notifications.
So they're very quickly responded to, because the problem with audits has always been there is just way too much of them, therefore nobody ever reviews them to see exactly what has happened. So many vulnerabilities may go on detected until a major damage happens.
And that's how you can know that this is common out there in a lot of businesses when you hear in the news. And so and so company was broken in and so much data was stolen. Well, if proper security were set up and the network is being hacked in and proper alert system and automated system were configured to be able to catch these in a proper auditing real-time, then maybe corrective action could have stopped a lot of those damages.
20:04
Nikita: Thanks for that wonderful overview, Ron. In our next episode, we’re going to go through each of the new security features and try to understand how Oracle is tightening the screws around security.
Lois: And if you want to learn more about what we discussed today, visit mylearn.oracle.com and search for the Oracle Database 23ai New Features for Administrators course. Until next week, this is Lois Houston…
Nikita: And Nikita Abraham signing off!
20:31
That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
-
In this episode of the Oracle University Podcast, hosts Lois Houston and Nikita Abraham kick off a new season with a deep dive into the latest features of Oracle Database 23ai. Joined by Bill Millar, a Senior Principal Database & MySQL Instructor, they explore the new enhancements to blockchain tables, such as row versions, user chains, delegate signer, and countersignature. So, if you're curious about harnessing the power of blockchain tables for your database needs, this is the perfect episode for you! Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/137192/207062 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript:
00:00
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!
00:26
Lois: Hello and welcome to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor.
Nikita: Hi everyone! Thank you for joining us as we begin a new season of the podcast. For the next few weeks, we’re going to explore all the new features in Oracle Database 23ai, previously known as 23c. These episodes will be great for you if you’re a database administrator, a developer, or even a database architect.
Lois: Right Niki, and while anyone can listen to the podcast, you’re probably going to get the most out of this season if you have prior knowledge or experience with the previous versions of Oracle Database and have used SQL to manage Oracle Databases. Throughout this season, we’ll discuss new features in database availability, architecture, manageability, performance, and security.
01:21
Nikita: Exactly. Today, we're diving into the world of blockchain tables and the new features introduced. First, we'll try to get an overview of blockchain tables that were introduced in 21c. Then, we'll discuss the new features in 23ai, including row versions, user chains, delegate signer, and countersignature.
Lois: So, let’s get started. To take us through all this, we are joined today by Bill Millar. Bill is a Senior Principal Database & MySQL Instructor with Oracle University. Hi Bill! Thanks for joining us. To begin, what is a blockchain table?
01:59
Bill: Well, a blockchain table provides the means for recording transactions where only insert operations are allowed. And rows are protected or restricted based on time as defined when the table is created. This makes the rows tamper-resistant with their chaining algorithms.
02:16
Nikita: Bill, take us through some common attributes of a blockchain table.
Bill: They are append only, protects the current data in the table. Made tamper-resistant with their hashing algorithm. And optionally, they can be digitally signed. However, they are mandatory in blockchain platform transactions. Transaction logs, audit trails, compliance information, they can most benefit from using blockchain tables.
02:44
Lois: Bill, let’s talk for a minute about the blockchain tables being tamper-resistant. What makes a blockchain table tamper-proof?
Bill: Well, with the insert only tables, each row is going to be chained to the previous row, except the first row. There's nothing to change it to. So once a row is added, it changes it to the previous row, to the previous row. Rows are linked when the transaction commits. We don't link them beforehand because you might roll back.
03:13
Nikita: Do we have some considerations or guidelines for managing blockchain tables?
Bill: One, they may be partitioned. You can specify retention at a table level, the blockchain table itself. You can use the no drop clause. And you can also define it blockchain tables at the row level when you create that blockchain table. Defining a retention period for the table itself or a retention period for the rows.
03:41
Nikita: And are there any restrictions when using blockchain tables?
Bill: There are several restrictions for the blockchain table. Some of them are… There are some data types that are not supported. The row ID, long, timestamp with time zone, and so forth. And there are other operations not allowed. A few of them are updating rows, merging rows, truncating, dropping them partitions. Converting a regular table to a blockchain table or vice versa.
So you do want to make sure that you understand the restrictions if you decide that you're going to use a blockchain table. There are some things you can alter in a blockchain table. One is you can modify a retention period.
It cannot be reduced. However, you can make it longer.
04:30
Lois: Ok, I think I’ve got it. So, coming to the 23ai features, what’s new with blockchain tables? Could you give us a brief overview of them before we dive into each one?
Bill: So we have the user chain, just a chain of rows based off to three user-defined columns. Previously, the system defined the chain.
The row versions…it allows me to have multiple historical views of a row that's going to be-- that is maintained with the blockchain table. We have the log history. The flashback data archive history tables are now blockchain tables.
And there's also a countersignature. So you can request the time of signing a row that it has a signature for that. That signature metadata is going to be stored within the row, within some hidden columns. And then you can also have a delegate signer. It's an alternate to the user who is allowed to sign rows inserted by that primary user.
05:31
Nikita: What are some advantages of using blockchain tables?
Bill: There are benefits of using the blockchain tables in transparent from fraud protection and users don't know as they're inserting the data. You can detect it by verifying the rows in the blockchain table. They are not part of the database itself. It can be more secure when you're validating them. And it is easier than distributed blockchains where multiple blockchains with identical data is being maintained across multiple different platforms.
06:03
Lois: And what about benefits specifically from the 23ai new features?
Bill: We have allowed increased flexibility. Just the user-defined itself, instead of having it just rely on the system-defined. It can guarantee row versioning. The blockchain log history to record and protect the changes. The counter signature, along with the digital signature, can help protect it even more.
So you must specify a version. There is no default version, so you must specify whether either it's going to be version 1 or version 2 and create the table. Version 1 is the version from 21c. You have to specify version 2 if you're going to take advantage of some of the new features in 23c.
And with these two different versions, it does reduce the number of columns that you are going to have accessible. Version 1 uses 20 additional columns to maintain that blockchain information, whereas a version 2 blockchain table is going to use 40 additional columns. So that reduces the number of columns that you can use by 40.
Even though version 2 does use more columns for the hidden information, it does have its benefit. It does allow you to add, drop columns. You can drop partitions with version 2. You have distributed transactions. And you can also use with replication, such as Oracle Golden Gate and Active Data Guard.
07:32
Nikita: Are there restrictions when it comes to using blockchain tables?
Bill: Again, make sure that you understand the requirements of your tables when determining if blockchain table is going to be appropriate for your application or not.
XMLTypes are not supported. Can't truncate. Doesn't work with sharded tables. Can't work with different policies such as the automatic data optimization, virtual private database, label security. Cannot use the DBMS_REDEFINITION package on a blockchain table.08:10
Are you planning to become an Oracle Certified Professional this year? Whether you're a seasoned IT pro or just starting your career, getting certified can give you a significant boost. And don't worry, we've got your back! Join us at one of our cert prep live events in the Oracle University Learning Community. You'll get insider tips from seasoned experts and learn from other professionals' experiences. Plus, once you've earned your certification, you'll become part of our exclusive forum for Oracle-certified users. So, what are you waiting for? Head over to www.mylearn.oracle.com and create an account to jump-start your journey towards certification today!
08:53
Nikita: Welcome back! Let’s get into each of those 23ai new features, Bill. What can you tell us about the row versions feature?
Bill: With the row version option, it allows you to have multiple historic views of a row corresponding to a set of user-defined columns. Previously, only the system would define the columns.
When you create these, it automatically creates a view to allow you to view information about that blockchain table with the row version. The system is going to create the view with the same columns. However, the name of that view, it's going to take whatever that table name that you create and it's going to append the _Las$ onto it for that.And it has not only the same columns of your table, but it also has additional columns in there. One of them be that last row version. This is going to allow you to see, what is the latest version of that row? In order to use the row versions, you must specify with the row version clause when you create the table.
It is also supported with or without primary key. The primary key column must not be identical to the set of the row version column. There are some restrictions, though. So you must specify-- you must specify a row version name with it. And remember, three columns is the maximum. You don't have to have three. You can have one, two, or three. And then the fields that are restricted to the types-- number, char, varchar, and raw.And it cannot be used with version 1 blockchain tables, meaning blockchain tables came out in 21C. So if you have 21C, you cannot create it. It's a 23C feature. That's why that is like that. So you're going to specify with the row version. And then you're going to give it that row version name because that is required. And then up to three different columns that you want to use.
10:58
Lois: What about user chains? How do they enhance blockchain tables?
Bill: So with the user chains, previously again, only the system chains were available. It randomly selected how to change the tables, what columns to chain it with.
Well now a user chain can be defined by the end user. And set up one, two, or three. Well, how many rows do you want to chain? Have that chain apply to. Again, the column types that we just talked about that are only supported. The number of the char, varchar, and raw. But with the user chains and you being able to identify the columns, it adds that additional flexibility to allow you to have this tamper-resistant table to be used by your applications.
So to create that blockchain table, user chain is defined when you create the table. So you're going to define when you create the table what is going to be that chain for that. When you do create that, any rows that have the same change values will be grouped together.
For example, let's say a banking application. I have an account. I make deposits. I make withdrawals. I do balance inquiries because that's all based off of that same field, that account, it'll group those together within the chain.
It does apply the hashing value to the columns that are stored within that chain.
12:27
Lois: Bill, can you explain the blockchain table delegate signer feature?
Bill: What it is, optionally, a signature that can be applied to provide additional security against tampering for that. However, if you do use it, it does require a digital certificate when adding a signature to a row.
Signatures are validated using that digital certificate and any signature algorithm for that. The delegate is an alternate. And it can be used instead of addition to just a user signature. So when I am the user, I create a row, it adds my signature, I can add my certificate to it or now I can have a delegate to do that for me.
So it can be digitally signed by the delegate. It can be signed by the delegate instead of the user itself. So that way, it's verified. Yes, that is good. Well, maybe users are not able to sign the rows they created, but they trust the delegate.
13:32
Nikita: And the last new feature to discuss is a blockchain table countersignature.
Bill: A countersignature is going to provide additional guarantees that, hey, this data has been securely stored within our table itself. You can request a countersignature. It is requested at the time of signing a row. So what it's going to do is it's going to record that signature metadata in that row and the counting signature in the signed bytes that can be returned to the caller to verify, yes, that I might want to retrieve that information to use in another source for that.
So we can use that. As we said here, that candidate signature and the sign bytes, we might put it in another data store, might put it in our Oracle blockchain platform. For this non-repudiation purposes, basically what that means is that, hey, it's proof of the origin, the authenticity of it, the integrity of that data. Well, I want to pass that information to something else, another application or source or whatever. So yes, this is trusted information for that. So it gives that additional security. So it assures that the sender that their message was delivered plus gives proof of that sender's identity.
Countersignatures are saved in the blockchain table, that happens to be a blockchain table itself. The countersignature is computed using the bytes, using that hashing algorithm. It's going to include that end user signature, the delegate, or both. Remember, the end user can sign, a delicate can, or it can use both of that information for that. Even though we do save that information in the blockchain table, we recommend if you're going to use this, you might want to store that information outside of the database for those non-repudiation purposes.
15:37
Lois: Thank you so much, Bill, for taking us though all these updates. We look forward to having you back soon to talk us through some more of these new features.
Nikita: To learn more about blockchain tables, visit mylearn.oracle.com and search for the Oracle Database 23ai: New Features for Administrators course. Join us next week for a discussion on some more Oracle Database 23ai new features. Until then, this is Nikita Abraham…
Lois: And Lois Houston signing off!
16:06
That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
-
Join hosts Lois Houston and Nikita Abraham, along with Hope Fisher, Oracle’s Product Manager for Database Technologies, as they break down the basics of databases, explore different database management systems, and delve into database development. Whether you're a newcomer or just need a refresher, this quick, informative episode is sure to offer you some valuable insights. Oracle MyLearn: https://mylearn.oracle.com/ou/course/database-essentials/133032/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript:
00:00
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!00:26
Nikita: Hello and welcome to the Oracle University Podcast. I’m Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs.Lois: Hi there! For the last seven weeks, we’ve been exploring the world of OCI Container Engine for Kubernetes with our senior instructor Mahendra Mehra. We covered key aspects of OKE to help you create, manage, and optimize Kubernetes clusters in Oracle Cloud Infrastructure. So, be sure you check out those episodes if you’re interested in Kubernetes.
01:00
Nikita: Today, we’re doing something a little different. We’ve had a lot of episodes on different aspects of Oracle Database, but what if you’re just getting started in this world? We wanted you to have something that you could listen to as well. And so we have Hope Fisher with us today. Hope is a Product Manager for Database Technologies at Oracle, and we’re going to ask her to take us through the basics of database, the different database management systems, and database development.
Lois: Hi Hope! Thanks for joining us for this episode. Before we dive straight into terminologies and concepts, I want to take a step back and really get down to the basics. We sometimes use the terms data and information interchangeably, but they’re not the same, right?
01:43
Hope: Data is raw material or a set of facts and observations. Information is the meaning derived from the facts. The difference between data and information can be explained by using an example, such as test scores. In one class, if every student receives a numbered score and the scores can be calculated to determine a class average, the class average can be calculated to determine the school average. So in this scenario, each student's test score is one piece of data. And information is the class’s average score or the school's average score. There is no value in data until you actually do something with it.
02:24
Nikita: Right, so then how do we make all this data useful? Do we create a database system?
Hope: A database system provides a simple function—treat data as a collection of information, organize it, and make the data usable by providing easy access to it and giving you a place where that data can be stored. Every organization needs to collect and maintain data to meet its requirements. Most organizations today use a database to automate their information systems. An information system can be defined as a formal system for storing and processing data.
A database is an organized collection of data put together as a unit. The rationale of a database is to collect, store, and retrieve related data for use by database applications. A database application is a software program that interacts with the database to access and manipulate data. A database is usually managed by a Database Administrator, also known as a DBA.03:25
Nikita: Hope, give us some examples of database systems.
Hope: Popular examples of database systems include Oracle Database, MySQL, which is also owned by Oracle, Microsoft SQL server, Postgres, and others. There are relational database management systems. The acronym is DBMS. Some of the strengths of a DBMS include flexibility and scalability. Given the huge amounts of information that modern businesses need to handle, these are important factors to consider when surveying different types of databases.
03:59
Lois: This may seem a little bit silly, but why not just use spreadsheets, Hope? Why use databases?
Hope: The easy answer is that spreadsheets are designed for specific problems, relatively small amounts of data and individual users. Databases are designed for lots of data, shared information use, and complex data analysis. Spreadsheets are typically used for specific problems or small amounts of data. Individual users generally use spreadsheets. In a database, cells contain records that come from external tables. Databases are designed for lots of data. They are intended to be shared and used for more complex data analysis. They need to be scalable, secure, and available to many users. This differentiation means that spreadsheets are static documents, while databases can be relational.
04:51
Nikita: Hope, what are some common database applications?
Hope: Database applications are used in far and wide use cases that most commonly can be grouped into three areas.
Applications that run companies called enterprise applications. Enterprise applications are designed to integrate computer systems that run all phases of an enterprise's operations to facilitate cooperation and coordination of work across the enterprise. The intent is to integrate core business processes, like sales, accounting, finance, human resources, inventory, and manufacturing.Applications that do something very specific, like healthcare applications-- specialized software is software that's written for a specific task rather than for a broad application area.
And then there are also applications that are used to examine data and turn it into information, like a data warehouse, analytics, and data lake.05:54
Lois: We’ve spoken about data lakes before. But since this is an episode about the basics of database, can you briefly tell us what a data lake is?
Hope: A data lake is a place to store your structured and unstructured data as well as a method for organizing large volumes of highly diverse data from diverse sources. Data lakes are becoming increasingly important as people, especially in businesses and technology, want to perform broad data exploration and discovery. Bringing data together into a single place or most of it into a single place makes that simpler.
06:29
Nikita: Thanks for that, Hope. So, what kind of organizations use databases? And, who within these organizations uses databases the most?
Hope: Almost every enterprise uses databases. Enterprises use databases for a variety of reasons and in a variety of ways. Data and databases are part of almost any process of the enterprise. Data is being collected to help solve business needs and drive value.
Many people in an organization work with databases. These include the application developers who create applications that support and drive the business. The database administrator or DBA maintains and updates the database. And the end user uses the data as needed.
07:19
Do you want to stay ahead of the curve in the ever-evolving AI landscape? Look no further than our brand-new OCI Generative AI Professional course and certification. For a limited time only, we’re offering both the course and certification for free. So, don’t miss out on this exclusive opportunity to get certified on Generative AI at no cost. Act fast because this offer is valid only until July 31, 2024. Visit https://education.oracle.com/genai to get started. That’s https://education.oracle.com/genai.
07:57
Nikita: Welcome back. Now that we’ve discussed foundational database concepts, I want to move on to database management systems. Take us through what a database management system is, Hope.
Hope: A Database Management System, DBMS, has the following elements. The kernel code manages memory and storage for the DBMS. The repository of metadata is called a data dictionary. The query language enables applications to access the data.
Oracle database functions include data definitions, storage, structure, and security. Additional functionality also provides for user access control, backup and recovery, integrity, and communications. There are many different database types and management systems. The most common is the relational database management system.
08:51
Nikita: And how do relational databases store data?
Hope: Essentially and very simplistically, there are key elements of the relational database. Database table containing rows and columns; the data in the table, which is stored a row at a time; and the columns which contain attributes or related information. And then the different tables in a database relate to one another and share a column.
09:17
Lois: Customers usually have a mix of applications and data structures, and ideally, they should be able to implement a data management strategy that effectively uses all of their data in applications, right? How does Oracle approach this?
Hope: Oracle's approach to this enterprise data management strategy and architecture is converged database to all different data types and workloads.
The converged database is a database that has native support for all modern data types and, of course, traditional relational data.
By providing support for all of these data types, a converged database can run all sorts of workloads, from transaction processing to analytics and machine learning to blockchain to support the applications and systems.
Oracle provides a single database engine that supports all data models, process types, and development environments. It also addresses many kinds of workloads against the same data sets. And there's no need to use dozens of specialized databases. Deploying several single-purpose databases would increase costs, complexity, and risk.
10:25
Nikita: In the final part of our conversation today, I want to bring up database development. Hope, how are databases developed?
Hope: Data modeling is the first part of the database development process. Conceptual data modeling is the examination of a business and business data to determine the structure of business information and the rules that govern it. This structure forms the basis for database design. A conceptual model is relatively stable over long periods of time.
Physical data modeling, or database building, is concerned with implementation in each technical software and hardware environment. The physical implementation is highly dependent on the current state of technology and is subject to change as available technologies rapidly change.Conceptual model captures the functional and informational needs of a business and is used to identify important entities and their relationships.
A logical model includes the entities and relationships. This is also called an entity relationship model and provides the details of the relationships.
11:34
Lois: I think that’s a good place to wrap up our episode. To know more about the Oracle Database architecture, offerings, and so on, visit mylearn.oracle.com. Thanks for joining us today, Hope.
Nikita: Join us next week for another episode of the Oracle University Podcast. Until then, this is Nikita Abraham…
Lois: And Lois Houston, signing off!
11:55
That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
-
In the season's final episode, hosts Lois Houston and Nikita Abraham interview senior OCI instructor Mahendra Mehra about the security practices that are vital for OKE clusters on OCI. Mahendra shares his expert insights on the importance of Kubernetes security, especially in today's digital landscape where the integrity of data and applications is paramount. OCI Container Engine for Kubernetes Specialist: https://mylearn.oracle.com/ou/course/oci-container-engine-for-kubernetes-specialist/134971/210836 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode. --------------------------------------------------------- Episode Transcript:
00:00
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!
00:26
Nikita: Welcome to the Oracle University Podcast! I’m Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs.
Lois: Hi there! In our last episode, we spoke about self-managed nodes and how you can manage Kubernetes deployments.
Nikita: Today is the final episode of this series on OCI Container Engine for Kubernetes. We’re going to look at the security side of things and discuss how you can implement vital security practices for your OKE clusters on OCI, and safeguard your infrastructure and data.
00:59
Lois: That’s right, Niki! We can’t overstate the importance of Kubernetes security, especially in today's digital landscape, where the integrity of your data and applications is paramount. With us today is senior OCI instructor, Mahendra Mehra, who will take us through Kubernetes security and compliance practices. Hi Mahendra! It’s great to have you here. I want to jump right in and ask you, how can users add a service account authentication token to a kubeconfig file?
Mahendra: When you set up the kubeconfig file for a cluster, by default, it contains an Oracle Cloud Infrastructure CLI command to generate a short-lived, cluster-scoped, user-specific authentication token.
The authentication token generated by the CLI command is appropriate to authenticate individual users accessing the cluster using kubectl and the Kubernetes Dashboard. However, the generated authentication token is not appropriate to authenticate processes and tools accessing the cluster, such as continuous integration and continuous delivery tools.
To ensure access to the cluster, such tools require long-lived non-user-specific authentication tokens. One solution is to use a Kubernetes service account. Having created a service account, you bind it to a cluster role binding that has cluster administration permissions.You can create an authentication token for this service account, which is stored as a Kubernetes secret. You can then add the service account as a user definition in the kubeconfig file itself. Other tools can then use this service account authentication token when accessing the cluster.
02:47
Nikita: So, as I understand it, adding a service account authentication token to a kubeconfig file enhances security and enables automated tools to interact seamlessly with your Kubernetes cluster. So, let’s talk about the permissions users need to access clusters they have created using Container Engine for Kubernetes.
Mahendra: For most operations on Container Engine for Kubernetes clusters, IAM leverages the concept of groups. A user's permissions are determined by the IAM groups they belong to, including dynamic groups. The access rights for these groups are defined by policies.
IAM provides granular control over various cluster operations, such as the ability to create or delete clusters, add, remove, or modify node pool, and dictate the Kubernetes object create, delete, view operations a user can perform. All these controls are specified at the group and policy levels.
In addition to IAM, the Kubernetes role-based access control authorizer can enforce additional fine-grained access control for users on specific clusters via Kubernetes RBAC roles and ClusterRoles.
04:03
Nikita: What are Kubernetes RBAC roles and ClusterRoles, Mahendra?
Mahendra: Roles here defines permissions for resources within a specific namespace and ClusterRole is a global object that will provide access to global objects as well as non-resource URLs, such as API version and health endpoints on the API server.
Kubernetes RBAC also includes RoleBindings and ClusterRoleBindings. RoleBinding grants permission to subjects, which can be a user, service, or group interacting with the Kubernetes API. It specified an allowed operation for a given subject in the cluster.RoleBinding is always created in a specific namespace. When associated with a role, it provides users permission specified within that role related to the objects within that namespace. When associated with a ClusterRole, it provides access to namespaced objects only defined within that cluster rule and related to the roles namespace.
ClusterRoleBinding, on the other hand, is a global object. It associates cluster roles with users, groups, and service accounts. But it cannot be associated with a namespaced role. ClusterRoleBinding is used to provide access to global objects, non-namespaced objects, or to namespaced objects in all namespaces.
05:36
Lois: Mahendra, what’s IAM’s role in this? How do IAM and Kubernetes RBAC work together?
Mahendra: IAM provides broader permissions, while Kubernetes RBAC offers fine-grained control. Users authorized either by IAM or Kubernetes RBAC can perform Kubernetes operations.
When a user attempts to perform any operation on a cluster, except for create role and create cluster role operations, IAM first determines whether a group or dynamic group to which the user belongs has the appropriate and sufficient permissions. If so, the operation succeeds. If the attempted operation also requires additional permissions granted via a Kubernetes RBAC role or cluster role, the Kubernetes RBAC authorizer then determines whether the user or group has been granted the appropriate Kubernetes role or Kubernetes ClusterRoles.06:41
Lois: OK. What kind of permissions do users need to define custom Kubernetes RBAC rules and ClusterRoles?
Mahendra: It's common to define custom Kubernetes RBAC rules and ClusterRoles for precise control. To create these, a user must have existing roles or ClusterRoles with equal or higher privileges. By default, users don't have any RBAC roles assigned. But there are default roles like cluster admin or super user privileges.
07:12
Nikita: I want to ask you about securing and handling sensitive information within Kubernetes clusters, and ensuring a robust security posture. What can you tell us about this?
Mahendra: When creating Kubernetes clusters using OCI Container Engine for Kubernetes, there are two fundamental approaches to store application secrets. We can opt for storing and managing secrets in an external secrets store accessed seamlessly through the Kubernetes Secrets Store CSI driver. Alternatively, we have the option of storing Kubernetes secret objects directly in etcd.
07:53
Lois: OK, let’s tackle them one by one. What can you tell us about the first method, storing secrets in an external secret store?
Mahendra: This integration allows Kubernetes clusters to mount multiple secrets, keys, and certificates into pods as volumes. The Kubernetes Secrets Store CSI driver facilitates seamless integration between our Kubernetes clusters and external secret stores.
With the Secrets Store CSI driver, our Kubernetes clusters can mount and manage multiple secrets, keys, and certificates from external sources. These are accessible as volumes, making it easy to incorporate them into our application containers.
OCI Vault is a notable external secrets store. And Oracle provides the Oracle Secrets Store CSI driver provider to enable Kubernetes clusters to seamlessly access secrets stored in Vault.08:54
Nikita: And what about the second method? How can we store secrets as Kubernetes secret objects in etcd?
Mahendra: In this approach, we store and manage our application secrets using Kubernetes secret objects. These objects are directly managed within etcd, the distributed key value store used for Kubernetes cluster coordination and state management.
In OKE, etcd reads and writes data to and from block storage volumes in OCI block volume service. By default, OCI ensures security of our secrets and etcd data by encrypting it at rest. Oracle handles this encryption automatically, providing a secure environment for our secrets.
Oracle takes responsibility for managing the master encryption key for data at rest, including etcd and Kubernetes secrets. This ensures the integrity and security of our stored secrets. If needed, there are options for users to manage the master encryption key themselves.
10:06
Lois: OK. We understand that managing secrets is a critical aspect of maintaining a secure Kubernetes environment, and one that users should not take lightly. Can we talk about OKE Container Image Security? What essential characteristics should container images possess to fortify the security posture of a user’s applications?
Mahendra: In the dynamic landscape of containerized applications, ensuring the security of containerized images is paramount.
It is not uncommon for the operating system packages included in images to have vulnerabilities. Managing these vulnerabilities enables you to strengthen the security posture of your system and respond quickly when new vulnerabilities are discovered.
You can set up Oracle Cloud Infrastructure Registry, also known as Container Registry, to scan images in a repository for security vulnerabilities published in the publicly available Common Vulnerabilities and Exposures Database.11:10
Lois: And how is this done? Is it automatic?
Mahendra: To perform image scanning, Container Registry makes use of the Oracle Cloud Infrastructure Vulnerability Scanning Service and Vulnerability Scanning REST API. When new vulnerabilities are added to the CVE database, the container registry initiates automatic rescanning of images in repositories that have scanning enabled.
11:41
Do you want to stay ahead of the curve in the ever-evolving AI landscape? Look no further than our brand-new OCI Generative AI Professional course and certification. For a limited time only, we’re offering both the course and certification for free! So, don’t miss out on this exclusive opportunity to get certified on Generative AI at no cost. Act fast because this offer is valid only until July 31, 2024. Visit https://education.oracle.com/genai to get started. That’s https://education.oracle.com/genai.
12:20
Nikita: Welcome back! Mahendra, what are the benefits of image scanning?
Mahendra: You can gain valuable insights into each image scan conducted over the past 13 months. This includes an overview of the number of vulnerabilities detected and an overall risk assessment for each scan. Additionally, you can delve into comprehensive details of each scan featuring descriptions of individual vulnerabilities, their associated risk levels, and direct links to the CVE database for more comprehensive information. This historical and detailed data empowers you to monitor, compare, and enhance image security over time. You can also disable image scanning on a particular repository by removing the image scanner.
13:11
Nikita: Another characteristic that container images should have is unaltered integrity, right?
Mahendra: For compliance and security reasons, system administrators often want to deploy software into a production system. Only when they are satisfied that the software has not been modified since it was published compromising its integrity. Ensuring the unaltered integrity of software is paramount for compliance and security in production environment.
13:41
Lois: Mahendra, what are the mechanisms that guarantee this integrity within the context of Oracle Cloud Infrastructure?
Mahendra: Image signatures play a pivotal role in not only verifying the source of an image but also ensuring its integrity. Oracle's Container Registry facilitates this process by allowing users or systems to push images and sign them using a master encryption key sourced from the OCI Vault.
It's worth noting that an image can have multiple signatures, each associated with a distinct master encryption key. These signatures are uniquely tied to an image OCID, providing granularity to the verification process. Furthermore, the process of image signing mandates the use of an RSA asymmetric key from the OCI Vault, ensuring a robust and secure validation of the image's unaltered integrity.
14:45
Nikita: In the context of container images, how can users ensure the use of trusted sources within OCI?
Mahendra: System administrators need the assurance that the software being deployed in a production system originates from a source they trust.
Signed images play a pivotal role, providing a means to verify both the source and the integrity of the image. To further strengthen this, administrators can create image verification policies for clusters, specifying which master encryption keys must have been used to sign images. This enhances security by configuring container engine for Kubernetes clusters to allow the deployment of images signed with specific encryption keys from Oracle Cloud Infrastructure Registry. Users or systems retrieving signed images from OCIR can trust the source and be confident in the image's integrity.
15:46
Lois: Why is it imperative for users to use signed images from Oracle Cloud Infrastructure Registry when deploying applications to a Container Engine for Kubernetes cluster?
Mahendra: This practice is crucial for ensuring the integrity and authenticity of the deployed images.To achieve this enforcement. It's important to note that an image in OCIR can have multiple signatures, each linked to a different master encryption key. This multikey association adds layers of security to the verification process.
A cluster's image verification policy comes into play, allowing administrators to specify up to five master encryption keys. This policy serves as a guideline for the cluster, dictating which keys are deemed valid for image signatures.
If a cluster's image verification policy doesn't explicitly specify encryption keys, any signed image can be pulled regardless of the key used. Any unsigned image can also be pulled potentially compromising the security measures.
16:56
Lois: Mahendra, can you break down the essential permissions required to bolster security measures within a user’s OKE clusters?
Mahendra: To enable clusters to include master encryption key in image verification policies, you must give clusters permission to use keys from OCI Vault. For example, to grant this permission to a particular cluster in the tenancy, we must use the policy—allow any user to use keys in tenancy where request.user.id is set to the cluster's OCID.
Additionally, for clusters to seamlessly pull signed images from Oracle Cloud Infrastructure Registry, it's vital to provide permissions for accessing repositories in OCIR.
17:43
Lois: I know this may sound like a lot, but OKE container image security is vital for safeguarding your containerized applications. Thank you so much, Mahendra, for being with us through the season and taking us through all of these important concepts.
Nikita: To learn more about the topics covered today, visit mylearn.oracle.com and search for the OCI Container Engine for Kubernetes Specialist course. Join us next week for another episode of the Oracle University Podcast. Until then, this is Nikita Abraham…
Lois Houston: And Lois Houston, signing off!
18:16
That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
-
In this episode, hosts Lois Houston and Nikita Abraham speak with senior OCI instructor Mahendra Mehra about the capabilities of self-managed nodes in Kubernetes, including how they offer complete control over worker nodes in your OCI Container Engine for Kubernetes environment. They also explore the various options that are available to effectively manage your Kubernetes deployments. OCI Container Engine for Kubernetes Specialist: https://mylearn.oracle.com/ou/course/oci-container-engine-for-kubernetes-specialist/134971/210836 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript:
00:00
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!
00:26
Nikita: Hello and welcome to the Oracle University Podcast! I’m Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs.
Lois: Hi everyone! Last week, we discussed how OKE virtual nodes can offer you a complete serverless Kubernetes experience.
Nikita: Yeah, and in today’s episode, we’ll focus on self-managed nodes, where you get complete control over the worker nodes within your OKE environment. We’ll also talk about how you can manage your Kubernetes deployments.
00:57
Lois: To tell us more about this, we have Mahendra Mehra, a senior OCI instructor with Oracle University. Hi Mahendra! Welcome back! Let’s get started with self-managed nodes. Can you tell us what they are?
Mahendra: In Container Engine for Kubernetes, a self-managed node is essentially a worker node that you personally create and host on a compute instance or instance pool within the compute service.
Unlike managed nodes or virtual nodes, self-managed nodes are not grouped into node pools by default. They are often referred to as Bring Your Own Nodes, also abbreviated as BYON. If you wish to streamline administration and manage multiple self-managed nodes collectively, you can utilize the compute service to create a compute instance pool for hosting these nodes. This allows for greater flexibility and customization in your Kubernetes environment.
01:58
Nikita: Mahendra, what are some practical usage scenarios for OKE self-managed nodes?
Mahendra: These nodes offer a range of advantages for specific use cases. Firstly, for specialized workloads, leveraging the compute service allows you to configure compute instances with shapes and image combination that may not be available for managed nodes or virtual nodes.
This includes options like GPU shapes for hardware accelerated workloads or high frequency processor cores for demanding high-performance computing tasks. Secondly, if you require complete control over your compute instance configuration, self-managed nodes are the ideal choice. This gives you the flexibility to tailor each node to your specific requirements.
Additionally, self-managed nodes are particularly well suited for Oracle Cloud Infrastructure cluster networks. These nodes provide high bandwidth, low latency RDMA connectivity, making them a preferred option for certain networking setups.
Lastly, the use of compute instance pools with self-managed nodes enables the creation of infrastructure for handling complex distributed computing tasks. This can greatly enhance the efficiency of your Kubernetes environment. Consider these points carefully to determine the optimal use of OKE self-managed nodes in your deployments.
03:30
Lois: What do we need to consider before creating a self-managed node and integrating it into a cluster?
Mahendra: There are two crucial aspects to address. Firstly, you need to confirm that the cluster to which you plan to add a self-managed node is configured appropriately.
Secondly, it's essential to choose the right image for the compute instance hosting the self-managed node.
03:53
Nikita: Can you dive a little deeper into these prerequisites?
Mahendra: To successfully integrate a self-managed node into your cluster, you must ensure that the cluster is an enhanced cluster. This is a crucial prerequisite for the addition of self-managed nodes.
The flannel CNI plugin for pod networking should be utilized, not the VCN-native pod networking CNI plugin. This ensures optimal pod networking for your self-managed nodes. The control plane nodes of the cluster must be running Kubernetes version 1.25 or later. This is essential for compatibility and optimal performance.
Lastly, maintain compatibility between the Kubernetes version on control plane nodes and worker nodes with a maximum allowable difference of two minor versions. This ensures a smooth and stable operation of your Kubernetes environment. Keep these cluster requirements in mind as you prepare to add self-managed nodes to your OKE cluster.
04:55
Lois: What about the image requirements when creating self-managed nodes?
Mahendra: Choose either Oracle Linux 7 or Oracle Linux 8 image, for your self-managed nodes. Ensure that the selected image has a release date of March 28, 2023 or later.
Obtain the image OCID, also known as Oracle Cloud Identifier, from the respective sources. When specifying an image, be mindful of the Kubernetes version it contains.
It's your responsibility to select an image with a Kubernetes version that aligns with the Kubernetes version skew support policy. Keep in mind that the Container Engine for Kubernetes does not automatically check the compatibility. So it's up to you to ensure harmony between the Kubernetes version on the self-managed node and the cluster's control plane nodes. These considerations will help you make informed choices when configuring images for your self-managed nodes.
05:57
Nikita: I really like the flexibility and customization OKE self-managed nodes offer. Now I want to switch gears a little and ask you about OCI Service Operator for Kubernetes. Can you tell us a bit about it?
Mahendra: OCI Service Operator for Kubernetes is an open-source Kubernetes add-on that transforms the way we manage and connect OCI resources within our Kubernetes clusters. This powerful operator enables you to effortlessly create, configure, and interact with OCI resources directly from your Kubernetes environment, eliminating the need for constant navigation between the Oracle Cloud Infrastructure Console, CLI, or other tools. With the OCI Service Operator, you can seamlessly leverage kubectl to call the operator framework APIs, providing a streamlined and efficient workflow.
06:53
Lois: On what framework is the OCI Service Operator built?
Mahendra: OCI Service Operator for Kubernetes is built using the open-source Operator Framework toolkit. The Operator Framework manages Kubernetes-native applications called operators in an effective, automated, and scalable way. The Operator Framework comprises essential components like Operator SDK. This leverages the Kubernetes controller-runtime library, providing high-level APIs and abstractions for writing operational logic. Additionally, it offers tools for scaffolding and code generation.
07:35
Do you want to stay ahead of the curve in the ever-evolving AI landscape? Look no further than our brand-new OCI Generative AI Professional course and certification. For a limited time only, we’re offering both the course and certification for free! So, don’t miss out on this exclusive opportunity to get certified on Generative AI at no cost. Act fast because this offer is valid only until July 31, 2024. Visit https://education.oracle.com/genai to get started. That’s https://education.oracle.com/genai.
08:14
Nikita: Welcome back! Mahendra, are there any other components within OCI Service Operator to manage Kubernetes deployments?
Mahendra: The other essential component is Operator Lifecycle Manager, also abbreviated as OLM. OLM extends Kubernetes by introducing a declarative approach to install, manage, and upgrade operators within a cluster. The OCI Service Operator for Kubernetes is intelligently packaged as an Operator Lifecycle Manager bundle, simplifying the installation process on Kubernetes clusters. This comprehensive bundle encapsulates all necessary objects and definitions, including CRDs, RBACs, ConfigMaps, and deployments, making it effortlessly deployable on a cluster.
09:02
Lois: So much that users can take advantage of! What about OCI Service Operator’s integration with other OCI services?
Mahendra: One of its standout features is its seamless integration with a range of OCI services. The first one is Autonomous Database, specifically tailored for transaction processing, mixed workloads, analytics, and data warehousing. Enjoy automated patching, upgrades, and tuning, allowing routine maintenance tasks to be performed without human intervention.
The next on the list is MySQL HeatWave, a fully-managed Database Service designed for developing and deploying secure cloud-native applications using widely adopted MySQL open-source database. Third on the list is OCI Streaming service. Experience a fully managed, scalable, and durable solution for ingesting and consuming high-volume data streams in real time.
Next is Service Mesh. This service offers a set of capabilities to facilitate communication among microservices within a cloud-native application. The communication is centrally managed and secured, ensuring a smooth and secure interaction. The OCI Service Operator for Kubernetes serves as a versatile bridge, seamlessly connecting your Kubernetes clusters with these powerful Oracle Cloud Infrastructure services.
10:31
Nikita: That’s awesome! I’ve also heard about Ingress Controllers. Can you tell us what they are?
Mahendra: A Kubernetes Ingress Controller serves as the enforcer of rules defined in a Kubernetes Ingress. Its primary role is to manage, load balance, and route incoming traffic to specific service pods residing on worker nodes within the cluster. At the heart of this process is the Kubernetes Ingress Resource. Think of it as a blueprint, a rich configuration holding routing rules and options, specifically crafted for handling HTTP and HTTPS traffic. It serves as a powerful orchestrator for managing external communication with services inside the cluster.
11:15
Lois: Mahendra, how do Ingress Controllers bring about efficiency?
Mahendra: Efficiency comes with consolidation. With a single ingress resource, you can neatly gather routing rules for multiple services. This eliminates the need to create a Kubernetes service of type LoadBalancer for each service seeking external or private network traffic.
The OCI native ingress controller is a powerhouse. It crafts an OCI Flexible Load Balancer, your gateway to efficient request handling. The OCI native ingress controller seamlessly adapts to changes in routing rules with real-time updates.
11:53
Nikita: And what about integration with an OKE cluster?
Mahendra: Absolutely. It harmonizes with the cluster for streamlined traffic management. Operating as a single pod on a randomly selected worker node, it ensures a balanced workload distribution.
12:08
Lois: Moving on, let’s talk about running applications on ARM-based nodes and GPU nodes. We’ll start with ARM-based nodes.
Mahendra: Typically, developers use ARM-based worker nodes in Kubernetes cluster to develop and test applications. Selecting the right infrastructure is crucial for optimal performance.
12:28
Nikita: What kind of options do developers have when running applications on ARM-based nodes?
Mahendra: When it comes to running applications on ARM-based nodes, you have a range of options at your fingertips. First up, consider the choice between ARM-based bare metal shapes and flexible VM shapes. Each comes with its own unique advantages.
Now, let's talk about the heart of it all, the Ampere A1 Compute instances. These instances are driven by the cutting edge Ampere Altra processor, ensuring high performance and efficiency for your workloads. You must specify the ARM-based node pool shapes during cluster or node pool creation, whether you choose to navigate through the user-friendly console, leverage the flexibility of the API, or command with precision through the CLI, the process remains seamless.
13:23
Lois: Can you define pods to run exclusively on ARM-based nodes within a heterogeneous cluster setup?
Mahendra: In scenarios where a cluster comprises node pools with ARM-based shapes alongside other shapes, such as AMD64, you can employ a powerful tool called node selector in the pod specification. This allows you to precisely dictate that an application should exclusively run on ARM-based worker nodes, ensuring your workloads aligns with the desired architecture.
13:55
Nikita: And before we end this episode, can you explain why developers must run applications on GPU nodes?
Mahendra: Originally designed for graphics manipulations, GPUs prove highly efficient in parallel data processing. This makes them a top choice for deploying data-intensive applications. Our GPU nodes utilize cutting edge NVIDIA graphics cards ensuring efficient and powerful data processing. Seamless access to this computing prowess is made possible through CUDA libraries.
To ensure smooth integration, be sure to select a GPU shape and opt for an Oracle Linux GPU image preloaded with the essential CUDA libraries. CUDA here is Compute Unified Device Architecture, which is a parallel computing platform and application-programming interface model created by NVIDIA. It allows developers to use NVIDIA graphics-processing units for general-purpose processing, rather than just rendering graphics.
14:57
Nikita: Thank you, Mahendra, for another insightful session. We appreciate you joining us today.
Lois: For more information on everything we discussed, go to mylearn.oracle.com and search for the OCI Container Engine for Kubernetes Specialist course. You’ll find plenty of demos and skill checks to supplement your learning. Join us next week when we’ll discuss vital security practices for your OKE clusters on OCI. Until next time, this is Lois Houston…
Nikita: And Nikita Abraham, signing off!
15:28
That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
-
Want to gain insights into how virtual nodes provide a serverless Kubernetes experience? Join hosts Lois Houston and Nikita Abraham, along with senior OCI instructor Mahendra Mehra, as they compare managed nodes and virtual nodes. Continuing from the previous episode, they explore how virtual nodes enhance Kubernetes deployments in Oracle Cloud Infrastructure. OCI Container Engine for Kubernetes Specialist: https://mylearn.oracle.com/ou/course/oci-container-engine-for-kubernetes-specialist/134971/210836 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript:
00:00
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!
00:25
Lois: Welcome to the Oracle University Podcast! I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor.
Nikita: Hey everyone! In our last episode, we examined OCI Container Engine for Kubernetes, including its key features and benefits.
Lois: Yeah, that was an interesting one. Today, we’re going to discuss virtual nodes and their role in enhancing Kubernetes deployments in Oracle Cloud Infrastructure.
Nikita: We’re going to compare virtual nodes and managed nodes, and look at their differences and advantages. To take us through all this, we have Mahendra Mehra with us. Mahendra is a senior OCI instructor with Oracle University.
01:09
Lois: Hi Mahendra! From our discussion last week, we know that when creating a node pool with Container Engine for Kubernetes, we have the option of specifying the type of Oracle nodes as either managed nodes or virtual nodes. But I’m sure there are some key differences in the features supported by each type, right?
Mahendra: The primary point of differentiation between virtual nodes and managed nodes is in their management approach. When it comes to managed nodes, users are responsible for managing the nodes. They have the flexibility to configure them to meet the specific requirements.
Users are also responsible for upgrading Kubernetes on managed nodes and for managing cluster capacity. You can create managed nodes and node pools in both basic clusters and enhanced clusters, whereas in virtual nodes, virtual nodes provide a serverless Kubernetes, experience, enabling users to run containerized applications at scale. The Kubernetes software is upgraded and security patches are applied while respecting application's availability requirements.
You can only create virtual nodes and virtual node pools in enhanced clusters.02:17
Nikita: What about differences in terms of resource allocation? Are there any differences we should be aware of?
Mahendra: When it comes to managed nodes, the resource allocation is at the node pool level and the users specify CPU and memory resource requirements for a given node pool. In the virtual nodes, the resource allocation is done at the pod level, where you can specify the CPU and memory resource requirements, but this time, as requests and limits in the pod specification.
02:45
Lois: What about differences in the approach to load balancing?
Mahendra: When it comes to managed nodes, load balancing is between the worker nodes, whereas in virtual nodes, load balancing is between pods.
Also, load balancer security list management is never enabled, and you always must manually configure security rules. When using virtual nodes, load balances distribute traffic among pods' IP addresses and then assign node port.
03:12
Lois: And when it comes to pod networking?
Mahendra: Under managed nodes, both the VCN-Native Pod Networking CNI plugin and the flannel CNI plugin are supported. When it comes to virtual nodes, only VCN-Native Pod Networking is supported.
Also, only one VNIC is attached to each virtual node. Remember, IP addresses are not pre-allocated before pods are created. And the VCN-Native Pod Networking CNI plugin is not shown as running in the kube-system namespace. Pod subnet route tables must have route rules defined for a NAT gateway and a service gateway.
03:48
Nikita: OK… I have a question, Mahendra. When it comes to scaling Kubernetes clusters and node pools, can users adjust the cluster capacity in response to their changing requirements?
Mahendra: When it comes to managed nodes, customers can scale the cluster and node pool up and down by changing the number of managed node pools and nodes respectively. They also have an option to enable autoscaling to automatically scale managed node pools and pods.
When it comes to virtual nodes, operational overhead of cluster capacity management is handled for you by OCI. A virtual node pool scales automatically and can support up to 1000 pods per virtual node. Users also have an option to increase the number of virtual node pools or virtual nodes to scale up the cluster or node pool respectively.
04:37
Lois: And what about the pricing for each?
Mahendra: Under managed nodes, you pay for the compute instances that execute applications, whereas under virtual nodes, you pay for the exact compute resources consumed by each Kubernetes pod.
04:55
Do you want to stay ahead of the curve in the ever-evolving AI landscape? Look no further than our brand-new OCI Generative AI Professional course and certification. For a limited time only, we’re offering both the course and certification for free! So, don’t miss out on this exclusive opportunity to get certified on Generative AI at no cost. Act fast because this offer is valid only until July 31, 2024. Visit https://education.oracle.com/genai to get started. That’s https://education.oracle.com/genai.
05:34
Nikita: Welcome back! We were just discussing how when you have to choose between virtual nodes and managed nodes for your Kubernetes cluster, you need to consider several key points of differentiation, like the management approach, resource allocation, load balancing, pod networking, scaling, and pricing.
Lois: Yeah, it’s important to understand the benefits and drawbacks of each approach to make informed decisions. Mahendra, now let’s talk about the prerequisites to configure clusters with virtual nodes and the IAM policies that are required to use virtual nodes.
Mahendra: Before you can use virtual nodes, you always have to set up at least one IAM policy, which is required in all circumstances by both tenancy administrators and non-administrator users.
This basically means, to create and use clusters with virtual nodes and virtual node pools, you must endorse Container Engine for Kubernetes service to allow virtual nodes to create container instances in the Container Engine for Kubernetes service tenancy with a VNIC connected to a subnet of a VCN in your tenancy.
All you need to do is create a policy in the root compartment with policy statements from the official documentation page. You will find them under the Working with Virtual Nodes section within the Container Engine topic.
06:55
Lois: Mahendra, how do you create and configure virtual nodes and virtual node pools?
Mahendra: Creating virtual nodes is a pivotal step and it involves setting up a virtual node pool in a new cluster. This is exclusively applicable to enhanced clusters. You can initiate this process using the console, the CLI, or the API.
Configuring your virtual node pools involves defining critical parameters. Firstly, we have the node count. This represents the number of virtual nodes you wish to create within your virtual node pool. These nodes will be strategically placed in the availability domains that you specify.
Now, it's important to carefully consider the placement of these nodes. You can distribute them across different availability domains, ensuring high availability for your applications. Additionally, you have the option to place these nodes in a regional subnet, which is the recommended approach for optimal performance.
07:53
Nikita: Isn’t the pod shape another important parameter? Can you tell us a bit about it?
Mahendra: Pod shape refers to the type of shape you want for pods running on your virtual nodes within the virtual node pool. The pod shape is crucial as it determines the processor type on which you want your pods to run. It is important to note that only shapes available in your tenancy and supported by Container Engine for Kubernetes will be shown. So choose a shape that aligns with the requirements of your applications and services.
A noteworthy point is that you explicitly specify the CPU and memory resource requirements for virtual nodes in the pod specification file. This ensures that your virtual nodes have the necessary resources to handle the workloads of your applications. Precision in specifying these requirements is key to achieving optimal performance.08:49
Lois: What is the network setup for virtual nodes?
Mahendra: The pod running on virtual nodes utilize VCN-native pod networking, and it's crucial to specify how these pods in the node pool communicate with each other. This involves setting up a pod subnet, which is a regional subnet configured specially to host pods.
The pod subnet you specify for virtual nodes must be private. Oracle recommends that the pod subnet and the virtual node subnets are the same. In addition to subnet configurations, you have the option to use security rules in network security group to control access to the pod subnet. This involves defining security rules within one or more NSGs that you specify with a maximum limit of five network security groups.
Also, it is worth noting that using network security group is recommended over using security list. Now, let's shift our focus to virtual node communication. For this, you will configure a virtual node subnet. This subnet can be either a regional subnet, which is recommended, or an availability domain-specific subnet. And it's designed to host your virtual nodes.
10:02
Nikita: What are some key considerations for virtual node subnets?
Mahendra: If you've specified load balancer subnets, ensure that the virtual node subnets are different. As with pod communication, Oracle recommends that the pod subnet and the virtual node subnet are the same, with the added condition that the virtual node subnet must be private.
10:23
Lois: Mahendra, can you take us through the fundamental tasks involved in managing virtual nodes and virtual node pools?
Mahendra: Whether you're creating a new enhanced cluster using the Console, or looking to scale up an existing one, the creation process is versatile.
Creating virtual nodes involves establishing a virtual node pool. Virtual nodes can only be created within enhanced clusters. Listing virtual nodes task offers visibility into virtual nodes within a virtual node pool. Whether you prefer Console, CLI, or the API, you have the flexibility to choose the method that suits your workflow best.
For a comprehensive understanding of your virtual node pools, navigate to the Cluster List page, and click on the name of the cluster. This will unveil the specifics of the virtual node pool you are interested in.
Now let's talk about updating virtual node pools. Whether your initiating a new enhanced cluster, or expanding an existing one, the update process ensures your cluster aligns with your evolving requirements. You can easily update the virtual node pool’s name for clarity. You can also dynamically change the number of virtual nodes to meet the workload demands, and you can fine tune the Node Placement using options like Availability Domain and Fault Domain settings.
Moving on to an essential aspect of node pool management, that is deletion. It's crucial to understand that deleting a node pool is a permanent action. Once deleted, the node pool cannot be recovered.
12:04
Lois: Before we wrap up, Mahendra, can you talk about the critical factors when allocating CPU, memory, and storage resources to pods provisioned by virtual nodes within your OKE cluster?
Mahendra: To ensure optimal performance, OKE calculates CPU and memory allocations at the pod level, a distinctive feature when using virtual nodes. This approach stands in contrast to the traditional worker node-level allocation.
The allocation process takes into account several factors. First one is the CPU and memory requests and limits. These are specified for each container in the pod spec file, if present. Secondly, number of containers in the pod. The total number of containers impacts the overall resource requirements. And kube-proxy and container runtime requirements. A small but essential consideration taking up 0.25 GB of memory and negligible CPU. Pod CPU and memory requests must meet a minimum of 0.125 OCPUs and 0.5 GB of memory.
13:12
Nikita: Thank you, Mahendra, for this really insightful session. If you’re interested in learning more about the topics we discussed today, head over to mylearn.oracle.com and search for the OCI Container Engine for Kubernetes Specialist course.
Lois: You’ll find demos that you watch as well as skill checks that you can attempt to better your understanding. In our next episode, we’ll journey into the world of self-managed nodes and discuss how to manage Kubernetes deployments. Until then, this is Lois Houston…
Nikita: And Nikita Abraham, signing off!
13:45
That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
-
Curious about how OCI Container Engine for Kubernetes (OKE) can transform the way your development team builds, deploys, and manages cloud-native applications? Listen to hosts Lois Houston and Nikita Abraham explore OKE's key features and benefits with senior OCI instructor Mahendra Mehra. Mahendra breaks down complex concepts into digestible bits, making it easy for you to understand the magic behind OKE. OCI Container Engine for Kubernetes Specialist: https://mylearn.oracle.com/ou/course/oci-container-engine-for-kubernetes-specialist/134971/210836 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript:
00:00
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!
00:25
Nikita: Hello and welcome to the Oracle University Podcast. I’m Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs.
Lois: Hi there! If you’ve been listening to us these last few weeks, you’ll know we’ve been discussing containerization, the Oracle Cloud Infrastructure Registry, and the basics of Kubernetes. Today, we’ll dive into the world of OCI Container Engine for Kubernetes, also referred to as OKE.
Nikita: We’re joined by Mahendra Mehra, a senior OCI instructor with Oracle University, who will take us through the key features and benefits of OKE and also talk about working with managed nodes. Hi Mahendra! Thanks for joining us today.
01:09
Lois: So, Mahendra, what is OKE exactly?
Mahendra: Oracle Cloud Infrastructure Container Engine for Kubernetes is a fully managed, scalable, and highly available service that empowers you to effortlessly deploy your containerized applications to the cloud.
But that's just the beginning. OKE can transform the way you and your development team build, deploy, and manage cloud native applications.
01:36
Nikita: What would you say are some of its most defining features?
Mahendra: One of the defining features of OKE is the flexibility it offers. You can specify whether you want to run your applications on virtual nodes or opt for managed nodes.Regardless of your choice, Container Engine for Kubernetes will efficiently provision them within your existing OCI tenancy on Oracle Cloud Infrastructure.
Creating OKE cluster is a breeze, and you have a couple of fantastic tools at your disposal-- the console and the rest API. These make it super easy to get started.
OKE relies on Kubernetes, which is an open-source system that simplifies the deployment, scaling, and management of containerized applications across clusters of hosts.
Kubernetes is an incredible system that groups containers into logical units known as pods. And these pods make managing and discovering your applications very simple.
Not to mention, Container Engine for Kubernetes uses Kubernetes versions that are certified as conformant by the Cloud Native Computing Foundation, also abbreviated as CNCF. And here's the icing on the cake. Container Engine for Kubernetes is ISO-compliant. The other two ISO-IEC standards—27001, 27017, and 27018. That's your guarantee of a secure and reliable platform.
03:08Lois: That’s great. But how do you access all this power?
Mahendra: You can define and create your Kubernetes cluster using the intuitive console and the robust rest API. Once your clusters are up and running, you can manage them using the Kubernetes command line, also known as kubectl, the user-friendly Kubernetes dashboard, and the powerful Kubernetes API.
03:32
Nikita: I love the idea of an intuitive console and being able to manage everything from a centralized place.
Lois: Yeah, that’s fantastic! Mahendra, can you talk us through the magic that happens behind the scenes? What’s Oracle’s role in all this?
Mahendra: All the master nodes or control plane nodes are managed by Oracle. This includes components like etcd, the API server, and the controller manager among others. To ensure reliability, we make sure multiple copies of these master components are distributed across different availability domains.
And we don't stop there. We also manage the Kubernetes dashboard and even handle the self-healing mechanism of both the cluster and the worker nodes. All of these are meticulously created and managed within your Oracle tenancy.
04:19
Lois: And what happens at the user’s end? What is their responsibility?
Mahendra: At your end, you have the power to manage your worker nodes. Using different compute shapes, you can create and control them in your own user tenancy. So, as you can see, it's a perfect blend of Oracle's expertise and your control.
04:38
Nikita: So, in your opinion, why should users consider OKE their go-to solution for all things Kubernetes?
Mahendra: Imagine a world where building and maintaining Kubernetes environments, be it master nodes or worker nodes, is no longer complex, costly, or even time-consuming.
OKE is here to make your life easier by seamlessly integrating Kubernetes with various container life cycle management products, which includes container registries, CI/CD frameworks, networking solutions, storage options, and top-notch security features.And speaking of security, OKE gives you the tools you need to manage and control team access to production clusters, ensuring granular access to Kubernetes cluster in a straightforward process.
It empowers developers to deploy containers quickly, provides devops teams with visibility and control for seamless Kubernetes management, and brings together Kubernetes container orchestration with Oracle's advanced cloud infrastructure. This results in robust control, top tier security, IAM, and consistent performance.
05:50
Nikita: OK…a lot of benefits! Mahendra, I know there have been ongoing enhancements to the OKE service. So, when creating a new cluster with Container Engine for Kubernetes, what is the cluster type we should specify?
Mahendra: The first type is the basic clusters. Basic clusters support all the core functionality provided by Kubernetes and Container Engine for Kubernetes. Basic clusters come with a service-level objective, but not a financially backed service level agreement. This means that Oracle guarantees a certain level of availability for the basic cluster, but there is no monetary compensation if that level is not met.
On the other hand, we have the enhanced clusters. Enhanced clusters support all available features, including features not supported by basic clusters.
06:38
Lois: OK. So, can you tell us more about the features supported by enhanced clusters?
Mahendra: As we move towards a more digitized world, the demand for infrastructure continues to rise. However, with virtual nodes, managing the infrastructure of your cluster becomes much simpler. The burden of manually scaling, upgrading, or troubleshooting worker nodes is removed, giving you more time to focus on your applications rather than the underlying infrastructure. Virtual nodes provide a great solution for managing large clusters with a high number of nodes that require frequent updates or scaling.
With this feature, you can easily simplify the management of your cluster and focus on what really matters, that is your applications. Managing cluster add-ons can be a daunting task. But with enhanced clusters, you can now deploy and configure them in a more granular way. This means that you can manage both essential add-ons like CoreDNS and kube-proxy as well as a growing portfolio of optional add-ons like the Kubernetes Dashboard.
With enhanced clusters, you have complete control over the add-ons you install or disable, the ability to select specific add-on versions, and the option to opt-in or opt-out of automatic updates by Oracle. You can also manage add-on specific customizations to tailor your cluster to meet the needs of your application.
08:05
Lois: Do users need to worry about deploying add-ons themselves?
Mahendra: Oracle manages the lifecycle of add-ons so that you don't have to worry about deploying them yourself. This level of control over add-ons gives you the flexibility to customize your cluster to meet the unique needs of your applications, making managing your cluster a breeze.
08:25
Lois: What about scaling?
Mahendra: Scaling your clusters to meet the demands of your workload can be a challenging task. However, with enhanced clusters, you can now provision more worker nodes in a single cluster, allowing you to deploy larger workloads on the same cluster which can lead to better resource utilization and lower operational overhead. Having fewer larger environments to secure, monitor, upgrade, and manage is generally more efficient and can help you save on cost.
Remember, there are limits to the number of worker nodes supported on an enhanced cluster, so you should review the Container Engine for Kubernetes limits documentation and consider the additional considerations when defining enhanced clusters with large number of managed nodes.
09:09
Nikita: Ensuring the security of my cluster would be of utmost importance to me, right? How would I do that with enhanced clusters?
Mahendra: With enhanced clusters, you can now strengthen cluster security through the use of workload identity. Workload identity enables you to define OCI IAM policies that authorize specific pods to make OCI API calls and access OCI resources. By scoping the policies to Kubernetes service account associated with application pods, you can now allow the applications running inside those pods to directly access the API based on the permissions provided by the policies.
09:48
Nikita: Mahendra, what type of uptime and server availability benefits do enhanced clusters provide?
Mahendra: You can now rely on a financially backed service level agreement tied to Kubernetes API server uptime and availability. This means that you can expect a certain level of uptime and availability for your Kubernetes API server, and if it degrades below the stated SLA, you'll receive compensation. This provides an extra level of assurance and helps ensure that your cluster is highly available and performant.
10:20
Lois: Mahendra, do you have any tips for us to remember when creating basic and enhanced clusters?
Mahendra: When using the console to create a cluster, a new cluster is created as an enhanced cluster by default unless you explicitly choose to create a basic cluster. If you don't select any enhanced features during cluster creation, you have the option to create the new cluster as a basic cluster. When using CLI or API to create a cluster, you can specify whether to create a basic cluster or an enhanced cluster. If you don't explicitly specify the type of cluster to create, a new cluster is created as a basic cluster by default.
Creating a new cluster as an enhanced cluster enables you to easily add enhanced features later even if you didn't select any enhanced features initially. If you do choose to create a new cluster as a basic cluster, you can still choose to upgrade the basic cluster to an enhanced cluster later on. However, you cannot downgrade an enhanced cluster to a basic cluster. These points are really important while you consider selection of a basic cluster or an enhanced cluster for your usage.
11:34
Do you want to stay ahead of the curve in the ever-evolving AI landscape? Look no further than our brand-new OCI Generative AI Professional course and certification. For a limited time only, we’re offering both the course and certification for free! So, don’t miss out on this exclusive opportunity to get certified on Generative AI at no cost. Act fast because this offer is valid only until July 31, 2024. Visit https://education.oracle.com/genai to get started. That’s https://education.oracle.com/genai.
12:13
Nikita: Welcome back! I want to move on to serverless Kubernetes with virtual nodes. But I think before we do that, we first need to have a basic understanding of what managed nodes are.
Mahendra: Managed nodes run on compute instances within your tenancy, and are at least partly managed by you. In the context of Kubernetes, a node is a compute host that can be either a virtual machine or a bare metal host.
As you are responsible for managing managed nodes, you have the flexibility to configure them to meet your specific requirements. You are responsible for upgrading Kubernetes on managed nodes and for managing cluster capacity. Nodes are responsible for running a collection of pods or containers, and they are comprised of two system components: the kubelet, which is the host brain, and the container runtime such as CRI-O, or containerd.13:07
Nikita: Ok… so what are virtual nodes, then?
Mahendra: Virtual nodes are fully managed and highly available nodes that look and act like real nodes to Kubernetes. They are built using the open source CNCF Virtual Kubelet Project, which provides the translation layer between OCI and Kubernetes.
13:25
Lois: So, what makes Oracle’s managed virtual Kubernetes product different?
Mahendra: OCI is the first major cloud provider to offer a fully managed virtual Kubelet product that provides a serverless Kubernetes experience through virtual nodes. Virtual nodes are configured by customers and are located within a single availability and fault domain within OCI.
Virtual nodes have two main components: port management and container instance management. Virtual nodes delegates all the responsibility of managing the lifecycle of pods to virtual Kubernetes while on a managed node, the kubelet is responsible for managing all the lifecycle state.
The key distinction of virtual nodes is that they support up to a 1,000 pods per virtual node with the expectation of supporting more in the future.
14:15
Nikita: What are the other benefits of virtual nodes?
Mahendra: Virtual nodes offer a fully managed experience where customers don't have to worry about managing the underlying infrastructure of their containerized applications. Virtual nodes simplifies scaling patterns for customers. Customers can scale their containerized application up or down quickly without worrying about the underlying infrastructure, and they can focus solely on their applications.
With virtual nodes, customers only pay for the resources that their containerized application use. This allows customers to optimize their costs and ensures that they are not paying for any unused resources. Virtual nodes can support over 10 times the number of pods that a normal node can. This means that customer can run more containerized applications on virtual nodes, which reduces operational burden and makes it easier to scale applications.
Customers can leverage container instances in serverless offering from OCI to take advantage of many OCI functionalities natively. These functionalities include strong isolation and ultimate elasticity with respect to compute capacity.
15:26
Lois: When creating a cluster using Container Engine for Kubernetes, we have the flexibility to customize the worker nodes within the cluster, right? Could you tell us more about this customization?
Mahendra: This customization includes specifying two key elements. Firstly, you can select the operating system image to be used for worker nodes. This image serves as a template for the worker node's virtual hard drive, and determines the operating system and other software installed.
Secondly, you can choose the shape for your worker nodes. The shape defines the number of CPUs and the amount of memory allocated to each instance, ensuring it meets your specific requirements. This customization empowers you to tailor your OKE cluster to your exact needs.
It is important to note that you can define and create OKE clusters using both the console and the REST API. This level of control is specially valuable for your development team when building, deploying, and managing cloud native applications.
You have the option to specify whether applications should run on virtual nodes or managed nodes. And Container Engine for Kubernetes efficiently provisions them on Oracle Cloud Infrastructure within your existing OCI tenancy. This flexibility ensures that you can adapt your OKE cluster to suit the specific requirements of your projects and workloads.16:56
Lois: Thank you so much, Mahendra, for giving us your time today. For more on the topics we discussed, visit mylearn.oracle.com and look for the OCI Container Engine for Kubernetes Specialist course. Join us next week as we dive deeper into working with OKE virtual nodes. Until then, this is Lois Houston…
Nikita: And Nikita Abraham, signing off!
17:18
That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
-
In this episode, Lois Houston and Nikita Abraham, along with senior OCI instructor Mahendra Mehra, dive into the fundamentals of Kubernetes. They talk about how Kubernetes tackles challenges in deploying and managing microservices, and enhances software performance, flexibility, and availability. OCI Container Engine for Kubernetes Specialist: https://mylearn.oracle.com/ou/course/oci-container-engine-for-kubernetes-specialist/134971/210836 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript:
00:00
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!
00:26
Lois: Hello and welcome to another episode of the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor.
Nikita: Hi everyone! We’ve spent the last two episodes getting familiar with containerization and the Oracle Cloud Infrastructure Registry. Today, it’s going to be all about Kubernetes. So if you've heard of Kubernetes but you don't know what it is, or you've been playing with Docker and containers and want to know how to take it to the next level, you’ll want to stay with us.
Lois: That’s right, Niki. We’ll be chatting with Mahendra Mehra, a senior OCI instructor with Oracle University, about the challenges in containerized applications within a complex business setup and how Kubernetes facilitates container orchestration and improves its effectiveness, resulting in better software performance, flexibility, and availability.
01:20
Nikita: Hi Mahendra. To start, can you tell us when you would use Kubernetes?
Mahendra: While deploying and managing microservices in a distributed environment, you may run into issues such as failures or container crashes. Issues such as scheduling containers to specific machines depending upon the configuration. You also might face issues while upgrading or rolling back the applications which you have containerized. Scaling up or scaling down containers across a set of machines can be troublesome.
01:50
Lois: And this is where Kubernetes helps automate the entire process?
Mahendra: Kubernetes is a portable, extensible, open source platform for managing containerized workloads and services that facilitates both declarative configuration and automation.
You can think of a Kubernetes as you would a conductor for an orchestra. Similar to how a conductor would say how many violins are needed, which one play first, and how loud they should play, Kubernetes would say, how many webserver front-end containers or back-end database containers are needed, what they serve, and how many resources are to be dedicated to each one.02:27
Nikita: That’s so cool! So, how does Kubernetes work?
Mahendra: In Kubernetes, there is a master node, and there are multiple worker nodes. Each worker node can handle multiple pods. Pods are just a bunch of containers clustered together as a working unit. If a worker node goes down, Kubernetes starts new pods on the functioning worker node.
02:47
Lois: So, the benefits of Kubernetes are…
Mahendra: Kubernetes can containerize applications of any scale without any downtime. Kubernetes can self-heal containerized applications, making them resilient to unexpected failures.
Kubernetes can autoscale containerized applications as for the workload and ensure optimal utilization of cloud resources. Kubernetes also greatly simplifies the process of deployment operations. With Kubernetes, however complex an operation is, it could be performed reliably by executing a couple of commands at the most.
03:19
Nikita: That’s great. Mahendra, can you tell us a bit about the architecture and main components of Kubernetes?
Mahendra: The Kubernetes cluster has two main components. One is the control plane, and one is the data plane. The control plane hosts the components used to manage the Kubernetes cluster. And the data plane basically hosts all the worker nodes that can be virtual machines or physical machines.
These worker nodes basically host pods which run one or more containers. The containers running within these pods are making use of Docker images, which are managed within the image registry. In case of OCI, it is the container registry.
03:54
Lois: Mahendra, you mentioned nodes and pods. What are nodes?
Mahendra: It is the smallest unit of computing hardware within the Kubernetes. Its work is to encapsulate one or more applications as containers. A node is a worker machine that has a container runtime environment within it.
04:10
Lois: And pods?
Mahendra: A pod is a basic object of Kubernetes, and it is in charge of encapsulating containers, storage resources, and network IPs. One pod represents one instance of an application within Kubernetes. And these pods are launched in a Kubernetes cluster, which is composed of nodes. This means that a pod runs on a node but can easily be instantiated on another node.
04:32
Nikita: Can you run multiple containers within a pod?
Mahendra: A pod can even contain more than one container if these containers are relatively tightly coupled. Pod is usually meant to run one application container inside of it, but you can run multiple containers inside one pod. Usually, it is only the case if you have one main application container and a helper container or some sidecar containers that has to run inside of that pod.
Every pod is assigned a unique private IP address, using which the pods can communicate with one another. Pods are meant to be ephemeral, which means they die easily. And if they do, upon re-creation, they are assigned a new private IP address. In fact, Kubernetes can scale a number of these pods to adapt for the incoming traffic, consequently creating or deleting pods on demand.
Kubernetes guarantees the availability of pods and replicas specified, but not the liveliness of each individual pod. This means that other pods that need to communicate with this application or component cannot rely on the underlying individual pod's IP address.
05:35
Lois: So, how does Kubernetes manage traffic to this indecisive number of pods with changing IP addresses?
Mahendra: This is where another component of Kubernetes called services comes in as a solution. A service gets allocated a virtual IP address and lives until explicitly destroyed. Requests to the services get redirected to the appropriate pods, thus the services of a stable endpoint used for inter-component or application communication.
And the best part here is that the lifecycle of service and the pods are not connected. So even if the pod dies, the service and the IP address will stay, so you don't have to change their endpoints anymore.
06:13
Nikita: What types of services can you create with Kubernetes?
Mahendra: There are two types of services that you can create. The external service is created to allow external users to connect the containerized applications within the pod. Internal services can also be created that restrict the communication within the cluster. Services can be exposed in different ways by specifying a particular type.
06:33
Nikita: And how do you define these services?
Mahendra: There are three types in which you can define services. The first one is the ClusterIP, which is the default service type that exposes services on an internal IP within the cluster. This type makes the service only reachable from within the cluster.
You can specify the type of service as NodePort. NodePort basically exposes the service on the same port of each selected node in the cluster using a network address translation and makes the service accessible from the outside of the cluster using the node IP and the NodePort combination. This is basically a superset of ClusterIP.
You can also go for a LoadBalancer type, which basically creates an external load balancer in the current cloud. OCI supports LoadBalancer types. It also assigns a fixed external IP to the service. And the LoadBalancer type is a superset of NodePort.
07:25
Lois: There’s another component called ingress, right? When do you used that?
Mahendra: An ingress is used when we have multiple services on our cluster, and we want the user requests routed to the services based on their pod, and also, if you want to talk to your application with a secure protocol and a domain name.
Unlike NodePort or LoadBalancer, ingress is not actually a type of service. Instead, it is an entry point that sits in front of the multiple services within the cluster. It can be defined as a collection of routing rules that govern how external users access services running inside a Kubernetes cluster. Ingress is most useful if you want to expose multiple services under the same IP address, and these services all use the same Layer 7 protocol, typically HTTP.08:10
Lois: Mahendra, what about deployments in Kubernetes?
Mahendra: A deployment is an object in Kubernetes that lets you manage a set of identical pods. Without a deployment, you will need to create, update, and delete a bunch of pods manually. With the deployment, you declare a single object in a YAML file, and the object is responsible for creating the pods, making sure they stay up-to-date and ensuring there are enough of them running.
You can also easily autoscale your applications using a Kubernetes deployment. In a nutshell, the Kubernetes deployment object lets you deploy a replica set of your pods, update the pods and the replica sets. It also allows you to roll back to your previous deployment versions. It helps you scale a deployment. It also lets you pause or continue a deployment.
08:59
Do you want to stay ahead of the curve in the ever-evolving AI landscape? Look no further than our brand-new OCI Generative AI Professional course and certification. For a limited time only, we’re offering both the course and certification for free! So, don’t miss out on this exclusive opportunity to get certified on Generative AI at no cost. Act fast because this offer is valid only until July 31, 2024. Visit https://education.oracle.com/genai to get started. That’s https://education.oracle.com/genai.
09:37
Nikita: Welcome back! We were talking about how useful a Kubernetes deployment is in scaling operations. Mahendra, how do pods communicate with each other?
Mahendra: Pods communicate with each other using a service. For example, my application has a database endpoint. Let's say it's a MySQL service that it uses to communicate with the database. But where do you configure this database URL or endpoints? Usually, you would do it in the application properties file or as some kind of an external environment variable. But usually, it's inside the build image of the application.
So for example, if the endpoint of the service or the service name, in this case, changes to something else, you would have to adjust the URL in the application. And this will cause you to rebuild the entire application with a new version, and you will have to push it to the repository. You'll then have to pull that new image into your pod and restart the whole thing. For a small change like database URL, this is a bit tedious.
So for that purpose, Kubernetes has a component called ConfigMap. ConfigMap is a Kubernetes object that maintains a key value store that can easily be used by other Kubernetes objects, such as pods, deployments, and services. Thus, you can define a ConfigMap composed of all the specific variables for your environment.
In Kubernetes, now you just need to connect your pod to the ConfigMap, and the pod will read all the new changes that you have specified within the ConfigMap, which means you don't have to go on to build a new image every time a configuration changes.
11:07
Lois: So then, I’m just wondering, if we have a ConfigMap to manage all the environment variables and URLs, should we be passing our username and password in the same file?
Mahendra: The answer is no. Password or other credentials within a ConfigMap in a plain text format would be insecure, even though it's an external configuration. So for this purpose, Kubernetes has another component called secret.
Kubernetes secrets are secure objects which store sensitive data, such as passwords, OAuth tokens, and SSH keys with the encryption within your cluster. Using secrets gives you more flexibility in a pod lifecycle definition and control over how sensitive data is used. It reduces the risk of exposing the data to unauthorized users.11:50
Nikita: So, you’re saying that the secret is just like ConfigMap or is there a difference?
Mahendra: Secret is just like ConfigMap, but the difference is that it is used to store secret data credentials, for example, database username and passwords, and it's stored in the base64 encoded format. The kubelet service stores this secret into a temporary file system.
12:11
Lois: Mahendra, how does data storage work within Kubernetes?
Mahendra: So let's say we have this database pod that our application uses, and it has some data or generates some data. What happens when the database container or the pod gets restarted? Ideally, the data would be gone, and that's problematic and inconvenient, obviously, because you want your database data or log data to be persisted reliably for long term.
To achieve this, Kubernetes has a solution called volumes. A Kubernetes volume basically is a directory that contains data accessible to containers in a given pod within the Kubernetes platform. Volumes provide a plug-in mechanism to connect ephemeral containers with persistent data stores elsewhere.
The data within a volume will outlast the containers running within the pod. Containers can shut down and restart because they are ephemeral units. Data remains saved in the volume even if a container crashes because a container crash is not enough to cut off a pod from a node.13:10
Nikita: Another main component of Kubernetes is a StatefulSet, right? What can you tell us about it?
Mahendra: Stateful applications are applications that store data and keep tracking it. All databases such as MySQL, Oracle, and PostgreSQL are examples of Stateful applications. In a modern web application, we see stateless applications connecting with Stateful application to serve the user request.
For example, a Node.js application is a stateless application that receives new data on each request from the user. This application is then connected with a Stateful application, such as MySQL database, to process the data. MySQL stores the data and keeps updating the database on the user's request.
Now, assume you deployed a MySQL database in the Kubernetes cluster and scaled this to another replica, and a frontend application wants to access the MySQL cluster to read and write data. The read request will be forwarded to both these pods. However, the write request will only be forwarded to the first primary pod. And the data will be synchronized with other pods. You can achieve this by using the StatefulSets.
Deleting or scaling down a StatefulSet will not delete the volumes associated with the Stateful applications. This gives you your data safety. If you delete the MySQL pod or if the MySQL pod restarts, you can have access to the data in the same volume.So overall, a StatefulSet is a good fit for those applications that require unique network identifiers; stable persistent storage; ordered, graceful deployment and scaling; as well as ordered, automatic rolling updates.
14:43
Lois: Before we wrap up, I want to ask you about the features of Kubernetes. I’m sure there are countless, but can you tell us the most important ones?
Mahendra: Health checks are used to check the container's readiness and liveness status.
Readiness probes are intended to let Kubernetes know if the app is ready to serve the traffic.
Networking plays a significant role in container orchestration to isolate independent containers, connect coupled containers, and provide access to containers from the external clients. Service discovery allows containers to discover other containers and establish connections to them.
Load balancing is a dedicated service that knows which replicas are running and provides an endpoint that is exposed to the clients. Logging allows us to oversee the application behavior.
The rolling update allows you to update a deployed containerized application with minimal downtime using different update scenarios. The typical way to update such an application is to provide new images for its containers. Containers, in a production environment, can grow from few to many in no time. Kubernetes makes managing multiple containers an easy task.
And lastly, resource usage monitoring-- resources such as CPU and RAM must be monitored within the Kubernetes environment. Kubernetes resource usage looks at the amount of resources that are utilized by a container or port within the Kubernetes environment. It is very important to keep an eye on the resource usage of the pods and containers as more usage translates to more cost.16:18
Nikita: I think we can wind up our episode with that. Thank you, Mahendra, for joining us today. Kubernetes sure can be challenging to work with, but we covered a lot of ground in this episode.
Lois: That’s right, Niki! If you want to learn more about the rich features Kubernetes offers, visit mylearn.oracle.com and search for the OCI Container Engine for Kubernetes Specialist course. Remember, all the training is free, so you can dive right in! Join us next week when we'll take a look at the fundamentals of Oracle Cloud Infrastructure Container Engine for Kubernetes. Until then, Lois Houston…
Nikita: And Nikita Abraham, signing off!
16:57
That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
- Visa fler