How To Measure Current Draw On Atmega328

Tiny 3D Engine on the ATmega328 (Arduino UNO)

May 26 2022

This project was more like a proof of concept than really making a fully functional 3D engine on the ATmega328, I was curious to see if the AVR could handle something like this and display lowpoly 3D models.

Afterwards coming together and talking with Xark (https://github.com/XarkLabs) on IRC, I discovered his fork of the Adafruit GFX library and was surprised with how smashing it was performing, it was another good reason to start this project.

From the very commencement, Xark told me that the bottleneck of this project would be the SPI communication and even with his fork performing between three.5x and 12x faster than the Adafruit libraries information technology would still not perform fast enough to become a smooth colored mesh without flickering. He was absolutely correct but I have to admit I had a lot of fun with this project and wireframe models are all the same looking nice.

The cube

I started with a really bones 3D rotating cube, using floats and cos/sin math functions. Clearing the screen completely was causing flicker so I decided to make another array to store a copy of the projected nodes (vertices). Earlier drawing a new frame with the updated cube transformation I would draw the old nodes position with the groundwork color giving the illusion I was completely clearing the screen. The flickering was completely gone. Another tweak for amend performance was to actually make a retention comparison of both old and new projected nodes to decide if I had to update the frame or non.

ATmega328 Tiny 3D Engine

The main loop at this stage:

                // ---------------------------------------------- // main loop // ---------------------------------------------- void loop() {   loops = 0;   while( millis() > next_tick && loops < MAX_FRAMESKIP) {     rotate_z(0);     rotate_y(4);     rotate_x(ane);     // ...     next_tick += SKIP_TICKS;     loops++;   }        // ===============   // describe   // ===============   // only redraw if nodes position take changed (less redraw - less flicker)   if (memcmp(old_nodes, nodes, sizeof(nodes))) {     // erase erstwhile projected nodes     draw_wireframe(proj_nodes, ST7735_WHITE);     // make new projection with new nodes position     projection(nodes);     // draw new projection     draw_wireframe(proj_nodes, ST7735_BLACK);     // re-create new nodes location to erase cube     memcpy(old_nodes, nodes, sizeof(nodes));   } }

While not apparent still because of the small number of nodes, information technology was a problem that I was using floats and sin/cos for the rotation. At that place would be issues later with complicated meshes containing more than vertices. Storing a couple values for sin and cos in an array was a good showtime to avoid using these functions:

                static const float sin_lut[LUTSIZE] = {    // to avert calling multiple times sin() store a couple values in a lut   0.010000,   0.019999,   0.029995,   0.039989,   0.049979, };  static const float cos_lut[LUTSIZE] = {    // to avoid calling multiple times cos() store a couple values in a lut   0.999950,   0.999800,   0.999550,   0.999200,   0.998750, };  // ---------------------------------------------- // rotation on axis functions (0 - 4) // ---------------------------------------------- void rotate_z(char theta) {   i = NODECOUNT-1;   do {     tmp_axis = nodes[i][0];     nodes[i][0] = (tmp_axis    * cos_lut[theta]) - (nodes[i][one] * sin_lut[theta]);     nodes[i][1] = (nodes[i][ane] * cos_lut[theta]) + (tmp_axis    * sin_lut[theta]);   } while(i--); }  void rotate_y(char theta) {   i = NODECOUNT-1;   do {     tmp_axis = nodes[i][0];     nodes[i][0] = (tmp_axis    * cos_lut[theta]) - (nodes[i][2] * sin_lut[theta]);     nodes[i][ii] = (nodes[i][two] * cos_lut[theta]) + (tmp_axis    * sin_lut[theta]);   } while(i--); }  void rotate_x(char theta) {   i = NODECOUNT-i;   exercise {     tmp_axis = nodes[i][1];     nodes[i][1] = (tmp_axis    * cos_lut[theta]) - (nodes[i][two] * sin_lut[theta]);     nodes[i][2] = (nodes[i][2] * cos_lut[theta]) + (tmp_axis    * sin_lut[theta]);   } while(i--); }

It was of grade temporary since the rotation was based on the before transformation of the mesh and wouldn't cover 360 degrees. But it was the start of the 'optimization' process of the project.

Tin we go faster?

Another attempt at making things going faster was to actually store the screen in a buffer. It'south not possible on the ATmega328 to shop the full screen resolution (128x160) since at that place is non enough retentivity, merely storing 1/4 of it was. The idea was to store a 32x40 assortment of char where I would specify which color index the 'return' function should draw and each represented pixel would be a 4x4 tile. This array past itself was already taking 1280 bytes which represented about 64% of the bachelor memory on the ATmega328 and the compiler started to warn me that I might encounter stability issue considering of the limited amount of retention left.

Beside the memory limitation, I was besides losing a lot of details by drawing with this new resolution. But now I was able to completely clear the screen without flickering:

Cube using a 32x40 buffer and cartoon 4x4 tiles for each 'pixel'

Switching to 3x3 tiles made the 'rendering' much faster, just the resolution was pretty rudimentary and would exist an upshot for complex meshes later. I was also limited to 256 colors because of the indexing in the buffer array since I was using chars to limit memory usage, and the render function would turn into a huge switch/case to return the advisable colour depending on said index:

                // ---------------------------------------------- // render buffer to screen // ---------------------------------------------- void render() {   uint16_t col;   TFT.spi_begin();   for (char y = 0; y < BUFFH; y++) {     for (char x = 0; x < BUFFW; x++) {       switch(vitrify[y][10]) {         case 0: col = COLOR0;                 break;         case 1: col = COLOR1;                 suspension;         case 2: col = COLOR2;                 suspension;         instance 3: col = COLOR3;                 break;         instance four: col = COLOR4;                 pause;         case five: col = COLOR5;                 break;       }       //TFT.drawPixel(x*4, y*4, col);       // 4x to fill up screen since buffer is 1/4 of resolution       TFT.setAddrWindow_(x<<ii, y<<2, (x<<2)+PIXELOFFSET-ane, (y<<two)+PIXELOFFSET-1);       i = PIXELOFFSET-1;       do {         TFT.spiWrite16(col, PIXELOFFSET);       } while(i--);     }   }   TFT.spi_end(); }  ...  // ---------------------------------------------- // primary loop // ---------------------------------------------- void loop() {   loops = 0;   while( millis() > next_tick && loops < MAX_FRAMESKIP) {     rotate_z(0);     rotate_y(4);     rotate_x(one);     // ...     next_tick += SKIP_TICKS;     loops++;   }        // ===============   // draw   // ===============   // merely redraw if nodes position have changed (less redraw - less flicker)   if (memcmp(old_nodes, nodes, sizeof(nodes))) {     // make new projection with new nodes position     projection();     // clear buffer     memset(buff, 0, sizeof(buff));     // draw new projection with COLOR1     draw_wireframe(one);     render();     // copy new nodes location to erase cube     memcpy(old_nodes, nodes, sizeof(nodes));   } }

At this stage the 'engine' was still using floats which are extremly tedious on the AVR and afterwards implementing transformation matrix (ref: codinglabs.internet) it was fourth dimension to switch to fixed indicate. Depending on the project you are working on this chore can be actually frustrating, you have to brand sure you are betwixt valid ranges to avoid flood and catastrophe with values much bigger than what your variable type tin can handle.

In that location is a squeamish thread on TIGForums by J-Ophidian explains the procedure: Stock-still Point Arithmetic - A Comprehensive Introduction

32x40 buffer, 3x3 tiles and tranformation matrix

Because of the express corporeality of memory, you really need to store as much as possible in PROGMEM as long every bit you do not program to change these variables during runtime. But 32KB is nonetheless somehow express if you are non careful, and trying to limit and optimize retention and storage usage is a skillful idea if you lot know that you might end upward using much more than than the current state of your project.

I had plenty storage retentivity to store a full look upwardly tabular array for my my sin/cos calls, but if yous find a graph with sine and cosine yous would notice that you really don't need 360 degrees for each one of these. (you tin can also check values here: mathwarehouse.com)

You tin can actually just apply 90 degrees values of either sine or cosine, and 'mirror' the returned value depending on the angle requested, here is a snippet of the stock-still point look up table I ended up making:

                #define LUT(a) (long)(pgm_read_word(&lut[a]))  ...  const unsigned int lut[] PROGMEM = {         // 0 to xc degrees fixed point COSINE await upwardly table   16384, 16381, 16374, 16361, 16344, 16321, 16294, 16261, 16224, 16182, 16135, 16082, 16025, 15964,   15897, 15825, 15749, 15668, 15582, 15491, 15395, 15295, 15190, 15081, 14967, 14848, 14725, 14598,   14466, 14329, 14188, 14043, 13894, 13740, 13582, 13420, 13254, 13084, 12910, 12732, 12550, 12365,   12175, 11982, 11785, 11585, 11381, 11173, 10963, 10748, 10531, 10310, 10086, 9860, 9630, 9397, 9161,   8923, 8682, 8438, 8191, 7943, 7691, 7438, 7182, 6924, 6663, 6401, 6137, 5871, 5603, 5334, 5062,   4790, 4516, 4240, 3963, 3685, 3406, 3126, 2845, 2563, 2280, 1996, 1712, 1427, 1142, 857, 571, 285, 0 };  ...  // ---------------------------------------------- // SIN/COS from 90 degrees LUT // ---------------------------------------------- long SIN(unsigned int angle) {   bending += 90;   if (angle > 450) return LUT(0);   if (angle > 360 && angle < 451) return -LUT(angle-360);   if (bending > 270 && bending < 361) return -LUT(360-angle);   if (angle > 180 && angle < 271) return  LUT(angle-180);   return LUT(180-angle); }  long COS(unsigned int angle) {   if (angle > 360) return LUT(0);   if (bending > 270 && bending < 361) return  LUT(360-angle);   if (angle > 180 && angle < 271) return -LUT(angle-180);   if (angle > xc  && bending < 181) return -LUT(180-angle);   render LUT(angle); }

At this point I was completely avoiding the use of floats, cos/sin functions and was able to clear the screen without whatever apparent flickering:

32x40 buffer, 3x3 tiles, matrices, fixed point, lut

Information technology was time to implement a bones backface culling, it would save us from computing and drawing edges that shouldn't be visible. Using the shoelace algorithm you can determine the surface of a triangle in your mesh, if the return value is negative it means the triangle is facing away from us. (ref: mathopenref.com)

Along with the nodes (vertices) position of the cube, I besides had a multidimensional array of triangles specifying which nodes represent each i of them. In gild to see if my current triangle was hidden or non I would just send its index and depending on the return value would decide if it needs to be drawn on screen. The post-obit snippet might be a scrap confusing without the full source, Edge is a macro that returns a node from a triangle stored in the triangles array mentioned earlier:

                // ---------------------------------------------- // Shoelace algorithm to become the surface // ---------------------------------------------- int shoelace(unsigned char (*due north)[2], unsigned char index) {   unsigned char t = 0;   int surface = 0;   for (; t<iii; t++) {     // (x1y2 - y1x2) + (x2y3 - y2x3) ...     surface += (n[EDGE(index,t)][0]           * n[EDGE(index,(t<2?t+one:0))][i]) -                (north[EDGE(alphabetize,(t<2?t+i:0))][0] * n[Edge(alphabetize,t)][one]);   }   render surface * 0.5; }  // ---------------------------------------------- // Shoelace algorithm for triangle visibility // ---------------------------------------------- bool is_hidden(unsigned char index) {   // (x1y2 - y1x2) + (x2y3 - y2x3) ...   return ( ( (proj_nodes[Border(alphabetize,0)][0] * proj_nodes[Border(index,ane)][1]) -              (proj_nodes[Border(index,one)][0] * proj_nodes[Edge(index,0)][1])   ) +            ( (proj_nodes[Border(index,one)][0] * proj_nodes[Edge(index,2)][1]) -              (proj_nodes[Border(alphabetize,ii)][0] * proj_nodes[EDGE(index,1)][1])   ) +            ( (proj_nodes[Edge(index,2)][0] * proj_nodes[Edge(index,0)][1]) -              (proj_nodes[Edge(index,0)][0] * proj_nodes[Border(index,2)][1])   ) ) < 0 ? true : false; }

Triangles facing away from us were successfully hidden now:

32x40 buffer, 3x3 tiles, matrices, fixed signal, lut, backface culling

At this point I knew I was able to color the cube faces, my initial plan (which I somewhen aborted) was to render a apartment shaded version of a rotating mesh. I fabricated diverse test with triangle filling and it seemed possible until I realized something:

32x40 buffer, 3x3 tiles, matrices, fixed point, lut, backface alternative, flat colors

32x40 buffer, 3x3 tiles, matrices, fixed point, lut, backface culling, apartment colors

It'south just an ugly cube..

Now let's be honest, it works fine, I tin can put colors, rotate the mesh and I don't have whatsoever flickering going on. But wow information technology is ugly..

Information technology was at present certain that if I had a complex mesh to return I would exist witnessing the battle of a bunch of pixels moving effectually in the center of the screen, I had to weight pros and cons of the electric current land and somewhen make up one's mind how I should eventually go along and which goal (flat shading) I would demand to forget about.

Pros of the buffered version:

fast
clear screen without flickering
flat shading is possible (initial goal)

Cons:

use a lot of memory
aesthetically unpleasant (information technology'southward amazingly ugly!)
resolution too small to display complex meshes
256 colors maximum
huge switch/case as more than colors are defined

In short, I'm stuck with an ugly cube and I idea it would be a shame to stay at this land and start working on the apartment shading.

Back to full res

Switching back to full resolution wasn't a trouble, I simply had to remove the buffer eating most of the AVR memory and utilize XarkLabs and Adafruit drawing functions to straight draw on screen. It was also really nice to see how fast the wireframe version of the cube was rendering, and I could compare my first experiments before implementing fixed point and lut with the electric current stage of the project.

Out of marvel I tried a colored version of the cube but clearing the screen, or even just the area using a bounding box dirt mask was causing a lot of flicker:

Full resolution (128x160), matrices, fixed betoken, lut, backface culling, return types

This is the moment I decided to forget well-nigh flat shading (and colors) to focus on the wireframe rendering. The fashion the engine was loading meshes allowed me to try something different than a cube for a modify, information technology was fourth dimension to find how to add models following the proper format I was using.

More 3D models!

At first I was planning on making a script to catechumen OBJ files, but after having a closer look at Blender's different exporting format options I noticed that STL (ASCII) files are much simpler to employ. It'south already using a gear up of vertices to declare each triangle, the STL (ASCII) file for the cube mesh is the post-obit:

                solid Exported from Blender-ii.74 (sub 0) facet normal -0.000000 0.000000 -1.000000 outer loop vertex 1.000000 i.000000 -ane.000000 vertex 1.000000 -one.000000 -one.000000 vertex -1.000000 -1.000000 -1.000000 endloop endfacet ... more than facets ... facet normal 0.000000 1.000000 0.000000 outer loop vertex -1.000000 1.000000 -i.000000 vertex -1.000000 ane.000000 1.000000 vertex 1.000000 0.999999 i.000000 endloop endfacet endsolid Exported from Blender-2.74 (sub 0)

Each facet (triangle) is first defined past its normal value and after that, each node (vertex) representing the facet is declared between outer loop and endloop as a set of three vertices. You just demand to gather all vertices, keep only unique vertex by position, and so assemble all facets pointing to these vertices into an array by looking for these values.

The following python script is doing this exact process and output a header file with the proper data format:

                #!/usr/bin/env python # ----------------------------------------------------------------------------- # STL2H by Themistokle "mrt-prodz" Benetatos # ------------------------------------------- # Convert STL 3D models (ASCII) to header for Tiny 3D Engine #  # ------------------------ # http://world wide web.mrt-prodz.com # http://github.com/mrt-prodz/ATmega328-Tiny-3D-Engine # ----------------------------------------------------------------------------- import sys, getopt, bone  # global parameters param_verbose = False param_normals = False param_yes     = Imitation param_scale   = ane.0  def checkFile(outfile): 	# go along request user until overwrite is called or a non-existing file name is entered 	while (os.path.isfile(outfile) is True): 		overwrite = raw_input('[!] Output data file "%south" already exists, overwrite? [y/n] ' % outfile) 		if overwrite in ('y', 'Y'): 			return outfile 		elif overwrite in ('northward', 'N'): 			outfile = raw_input('[?] Enter new output information file proper name: ') 			if (outfile == ''): 				outfile = 'temp.h' 	render outfile  def printVerbose(str): 	global param_verbose 	if param_verbose is True: 		impress str,  def saveDAT(nodes, triangles, outfile, normals = None): 	print '[+] Saving output file:', outfile 	data  = '// exported with stl2h\n' 	data += '// ' 	data += ' '.join(sys.argv[:]) + '\n' 	information += '#ifndef MESH_H\northward' 	data += '#define MESH_H\north' 	information += '\due north' 	data += '#ascertain NODECOUNT ' + str(len(nodes)) + '\north' 	information += '#ascertain TRICOUNT ' + str(len(triangles)) + '\due north' 	data += '\north' 	data += '#define NODE(a, b) (long)(pgm_read_dword(&nodes[a][b]))\due north' 	information += '#define EDGE(a, b) pgm_read_byte(&faces[a][b])\n' 	data += '#define NORMAL(a, b) (long)(pgm_read_dword(&normals[a][b]))\north' 	information += '\n' 	data += 'const long nodes[NODECOUNT][iii] PROGMEM = {\n' 	for index, node in enumerate(nodes): 		information += '  {(long)(' + str(round(float(node[0]), v)*param_scale) + '*PRES), '\ 			 + '(long)(' + str(round(bladder(node[1]), five)*param_scale) + '*PRES), '\ 			 + '(long)(' + str(round(float(node[ii]), 5)*param_scale) + '*PRES)},\n' 	data += '};\n\n' 	data += 'const unsigned char faces[TRICOUNT][3] PROGMEM = {\n' 	for alphabetize, face in enumerate(triangles):                data += '  {' + str(face[0]) + ', ' + str(face[one]) + ', ' + str(face[2]) + '},\n'         data += '};\n\n' 	data += 'const long normals[TRICOUNT][3] PROGMEM = {\due north' 	for index, normal in enumerate(normals): 		data += '  {(long)(' + str(circular(bladder(normal[0]), v)) + '*PRES), '\ 			 + '(long)(' + str(circular(float(normal[one]), v)) + '*PRES), '\ 			 + '(long)(' + str(circular(float(normal[2]), 5)) + '*PRES)},\due north' 	data += '};\north\north' 	data += '