Old 12th September 2002, 05:52   #1
cfp
Member
 
cfp's Avatar
 
Join Date: Dec 2001
Location: oxford, uk
Posts: 60
graphics filter ape if anyone's interested...

if you're fluent with any graphics program you'll probably have played around with the "custom filter"/"user defined effect"

this ape implements one of them. this means it can do anything from blur to sharpen to water ripples to cellular automota like stuff

it is not fast by any means though so don't get your hopes up too high, but it's certainly worth playing round with and i'd love to see what you guys think of it. i think it's the kind of thing which should be added to the next avs.

hope you enjoy it

tom

p.s. a brief description of how it works follows:

you have to enter a 5*5 matrix of integers, a "bias" and a "scaling factor" (also integers)

for each pixel the plugin takes the weighted (by the matrix) average of the 25 pixels surrounding it, adds the bias then divides by the scaling factor. for example something like:

0 0 1 0 0
0 1 2 1 0
1 2 3 2 1
0 1 2 1 0
0 0 1 0 0

with bias 0 and scaling factor 19 would function as a blur.
Attached Files
File Type: zip custom filter.zip (18.1 KB, 391 views)
cfp is offline   Reply With Quote
Old 12th September 2002, 06:43   #2
dirkdeftly
Forum King
 
dirkdeftly's Avatar
 
Join Date: Jun 2001
Location: Cydonia, Mars
Posts: 2,651
Send a message via AIM to dirkdeftly
Sweet...but somehow it seems like it could be a lot faster. I wouldn't think something that simple would be that slow...but maybe it's just me.

"guilt is the cause of more disauders
than history's most obscene marorders" --E. E. Cummings
dirkdeftly is offline   Reply With Quote
Old 12th September 2002, 11:12   #3
geozop
Member
 
Join Date: Jul 2001
Location: Livermore, CA
Posts: 63
Send a message via ICQ to geozop Send a message via AIM to geozop Send a message via Yahoo to geozop
Perhaps

Is the trans re-doing the math for each pixel on every frame? Even when the variables have not been changed? I'm not sure how functional programming an ape is, but is there a way to implement something that keeps it from recalculating everything every frame?
Like, the Movement trans... it only re-evaluates the function only after a second after the equation has been changed, rather than every frame... Perhaps, a button to "tell" your ape to use updated numbers?

This is all assuming that my theory is correct, anyway.

I really like all the different kinds of blurs you can create with this, though. Hopefully you'll find something to work it better...

[edit]: Did you know you can put negative numbers in the fields? spiffy.

[edit2]: when the scale variable is negative, it does not remember the number when selecting the effect (in the sidebar), after selecting another trans or render

Last edited by geozop; 12th September 2002 at 11:34.
geozop is offline   Reply With Quote
Old 12th September 2002, 13:16   #4
cfp
Member
 
cfp's Avatar
 
Join Date: Dec 2001
Location: oxford, uk
Posts: 60
it doesn't re-get all the data from the boxes every frame, it only "gets" that data when you edit a box.

i'm going to produce a single channel one which should be much much faster i reckon, i'm currently working on it...

i'll post both that and an updated version of the original without the negative scale bug later on today.

thanks for the bug spotting and comments

tom
cfp is offline   Reply With Quote
Old 12th September 2002, 14:02   #5
cfp
Member
 
cfp's Avatar
 
Join Date: Dec 2001
Location: oxford, uk
Posts: 60
here's the custom filter with the negative scaling bug removed.

no speed ups as yet.
Attached Files
File Type: zip custom filter.zip (18.1 KB, 268 views)
cfp is offline   Reply With Quote
Old 12th September 2002, 15:44   #6
UnConeD
Whacked Moderator
 
UnConeD's Avatar
 
Join Date: Jun 2001
Posts: 2,104
Does it use MMX? I wrote a new APE yesterday using an optimized MMX routine and the speedup is incredible..

UnConeD is offline   Reply With Quote
Old 12th September 2002, 20:07   #7
cfp
Member
 
cfp's Avatar
 
Join Date: Dec 2001
Location: oxford, uk
Posts: 60
much improved version

no it doesn't use mmx. i'm not that experienced a programmer tbh, i don't think i know how to access mmx routines...

anyway i have now produced a version which gives about 50% better frame rates by only working on a single channel. it's attatched to this e-mail, along with some examples of its use and a very slightly improved version of the multichannel original.

tom

p.s. any speed up advice would be appreciated, if you want to mail me my address is: cfp@DELETEALLTHESECAPSmyrealbox.com
Attached Files
File Type: zip custom filters.zip (45.8 KB, 309 views)
cfp is offline   Reply With Quote
Old 12th September 2002, 21:49   #8
UnConeD
Whacked Moderator
 
UnConeD's Avatar
 
Join Date: Jun 2001
Posts: 2,104
MMX is a godsend for 32-bit RGBA operations. For example, in order to add 2 pixels together, 2 pixels at a time (2x2 adds), you simply do:

movq mm0, source;
movq mm1, destination;
paddusb mm0, mm1;
movq destination, mm0;

This will do:

R1G1B100 R2G2B200
+
R3G3B300 R4G4B400

It requires you to be fluent in regular x86 assembly already though.

UnConeD is offline   Reply With Quote
Old 13th September 2002, 00:06   #9
cfp
Member
 
cfp's Avatar
 
Join Date: Dec 2001
Location: oxford, uk
Posts: 60
that mmx code sounds interesting but i'm not sure how useful it would be for this because you are scaling the values of pixels way out of unsigned char range and adding lots of so scaled values together.

still i'd be interested to find out more, are there any web guides to mmx programming you could point me to?? admittedly i've not programmed in asm since my amstrad cpc 464 back in the day but i'm sure i could pick it up again.

tom
cfp is offline   Reply With Quote
Old 13th September 2002, 01:27   #10
UnConeD
Whacked Moderator
 
UnConeD's Avatar
 
Join Date: Jun 2001
Posts: 2,104
There are a few resources, but I don't have any urls. You can locate a lot of interesting things through Google though, that's what I did.

Actually what you're describing is no problem at all for MMX. You can multiply 4 unsigned char's with saturation... so no clipping or maximum checks are needed. I'm sure you'll find alpha-blending code immediately, which can be adapted for your purpose.

The only thing I see that can slow your APE down is the fact that you usually don't use all 25 pixels. The best solution would be one that compiles a dynamic blending routine at run-time with optimized memory access and no jumps. But that's a lot more complicated of course.
Or you could make special predefined 'shapes' of the matrix... 5x5, 3x3, 1x3, 3x1, 1x5, etc. which each have an optimized loop. That way you don't have unnecessary jumps and still give the user the flexibility needed for most effects.
In any case, I made an APE like yours a while (the official name is a Convolution Matrix if I'm not mistaken): I spent quite a while optimizing a routine for a 3x3 matrix in regular x86 assembly (no mmx) and it was still quite slow.

If you need help with MMX just ask, but I'm not a pro either, I just learnt it yesterday. I've had quite some regular x86 assembly experience though, and MMX is really just a multi-byte/word version of most basic arithmetic such as adding and shifting (which makes it perfect for RGBA 32-bit pixels).

UnConeD is offline   Reply With Quote
Old 13th September 2002, 17:10   #11
jheriko
Forum King
 
jheriko's Avatar
 
Join Date: Aug 2002
Location: a twist in the fabric of space
Posts: 2,150
Send a message via ICQ to jheriko
Damn, reading about all of your ape adventures really irritates me that I lost MSVS.net the other day during a reformat I had only just downloaded the SDK as well...

I'm gonna have to find a freeware C compiler and get in on the fun...

EDIT: quick question:

you don't need to make the whole ape in asm do you?

-- Jheriko

'Everything around us can be represented and understood through numbers'
jheriko is offline   Reply With Quote
Old 13th September 2002, 19:30   #12
UnConeD
Whacked Moderator
 
UnConeD's Avatar
 
Join Date: Jun 2001
Posts: 2,104
Nah, you just include the assembly in __asm { } blocks. Like, if
color is a BGR0 windows COLORREF, and you want to convert it into an RGB0 AVS color, you could do:
code:
color = ((color & 0xFF) << 16) | (color & 0xFF00) | ((color & 0xFF0000) >> 16);

But that's all quite complicated. Using assembly, we turn it into:
code:
mov eax, color
bswap eax
shr eax, 8
mov color, eax


That's only 2 instructions! The C code had 3 ANDs, 2 bitshifts and 2 ORs.

UnConeD is offline   Reply With Quote
Old 14th September 2002, 03:39   #13
jheriko
Forum King
 
jheriko's Avatar
 
Join Date: Aug 2002
Location: a twist in the fabric of space
Posts: 2,150
Send a message via ICQ to jheriko
cool. I need to look into that then, I'm fluent in C but my asm is a bit lacking, I had a few ideas for things that I'd like to implement so I'm gonna have to get a compiler again, then maybe I'll make something.

-- Jheriko

'Everything around us can be represented and understood through numbers'
jheriko is offline   Reply With Quote
Old 14th September 2002, 19:49   #14
dirkdeftly
Forum King
 
dirkdeftly's Avatar
 
Join Date: Jun 2001
Location: Cydonia, Mars
Posts: 2,651
Send a message via AIM to dirkdeftly
I really want to learn APE programming...but NONE of the SDK's I've found work. Not that I can't figure them out, they simply do not WORK. It's kind of annoying, especially since I can't learn to do things like, say, graphics...If anyone's willing to teach me (that has some idea what they're doing), I'd really appreciate it.

"guilt is the cause of more disauders
than history's most obscene marorders" --E. E. Cummings
dirkdeftly is offline   Reply With Quote
Old 18th September 2002, 22:40   #15
cfp
Member
 
cfp's Avatar
 
Join Date: Dec 2001
Location: oxford, uk
Posts: 60
a fast version of these filters are on their way

just posting to let you know that i've not abandoned this project.

i've spent the last couple of days frantically coding in every spair hour and i now have a fast, working (multi-channel) version of my filter.

at the moment it makes a few assumptions about the input conditions which i'm going to remove over the next few days, but basically, expect the final release soon.

i've not done detailed speed tests yet but its speed does seem at least comparable if not better than the default blur for small matrices.

many thanks to unconed for his advice of using mmx and dynamic generating the render function (e.g. it just doesn't exist untill runtime). programming in asm opcodes is perhaps the most tedious programming i've ever done, but i'm pretty darn chuffed with the results.

tom
cfp is offline   Reply With Quote
Old 19th September 2002, 00:21   #16
UnConeD
Whacked Moderator
 
UnConeD's Avatar
 
Join Date: Jun 2001
Posts: 2,104
Sounds awesome . Are you doing the opcodes by hand? If so, do you have a nice reference of all the opcodes in binary form?

Another idea is to use the disassembler in MSVC... just compile, set a breakpoint in front of the code and you'll be able to copy/paste the opcodes from the disassembler.

UnConeD is offline   Reply With Quote
Old 19th September 2002, 04:09   #17
cfp
Member
 
cfp's Avatar
 
Join Date: Dec 2001
Location: oxford, uk
Posts: 60
i've been doing a bit of both. i've got my pdf copy of the "IA-32 Intel® Architecture Software Developer’s Manual Volume 2: Instruction Set Reference" (from the intel website) always open + a second vc project where i can easily insert code to get the opcodes by using the generate asm with opcodes and source compile option.

if both me and the rest of the world had pentium 4's then my job would be 100 times easier cos sse2 fixes the botch job that is mmx. eg. mmx has no division, mmx can only have 4 words making up a register instead of 4 dwords in sse2... which means less code for me to write...

ahh if only, i can dream (^_^)

tom
cfp is offline   Reply With Quote
Old 19th September 2002, 04:20   #18
UnConeD
Whacked Moderator
 
UnConeD's Avatar
 
Join Date: Jun 2001
Posts: 2,104
Hmm no division is a pain, but couldn't you use a fixed point hack instead?

Suppose you need to divide by 6. That's the same as multiplying by 0.166... As an approximation, you can multiply by 43 (0.166.. * 256) and then divide by 256 (shift right by 8 bits).

I'm not sure if the loss of accuracy will matter a lot, but you can try . Fixed point of course means that you lose the dynamic range on the scalar values...

I know AVS's built-in components take short-cuts too: the blur causes banding artifacts in areas with little color difference, which I can only attribute to rounding errors.
And the APE SDK comes with a 50/50 blend function that discards the lower bits: ((a >> 1) & 0x7F7F7F) + ((b >> 1) & 0x7F7F7F). This causes white + white to blend to a tone darker.

UnConeD is offline   Reply With Quote
Old 19th September 2002, 04:46   #19
cfp
Member
 
cfp's Avatar
 
Join Date: Dec 2001
Location: oxford, uk
Posts: 60
how did you handle people without mmx in your ape's unconed??
i'm tempted just to leave it doing nothing or just doing the very first, very slow algorithm i had, for people without mmx, because it seems like most people who are able to run winamp 3 would at least have a pentium 2 or a k6-2. do you think this is reasonable?? it would mean seriously more work for me if it wasn't...

also do you think that it is safe to put a max of 256 on the sums of the positive elements in the matrix and a min of -256 on the sums of the negative elements? again this would save me a heck of lot of work... (you should probably bear in mind that the matrix is now 7x7)

thanks for any advice

tom
cfp is offline   Reply With Quote
Old 19th September 2002, 09:33   #20
Yathosho
Forum King
 
Yathosho's Avatar
 
Join Date: Jan 2002
Location: AT-DE
Posts: 3,353
how about posting the source-code instead of discussing endlessly? if you really want to contribute, i think it's the best. it could still be you coordinating about what's in and what's not?

..or are you expecting the big bucks here
Yathosho is offline   Reply With Quote
Old 19th September 2002, 09:53   #21
jheriko
Forum King
 
jheriko's Avatar
 
Join Date: Aug 2002
Location: a twist in the fabric of space
Posts: 2,150
Send a message via ICQ to jheriko
ugh... asm code, theres bound to be hundreds of lines.

-- Jheriko

'Everything around us can be represented and understood through numbers'
jheriko is offline   Reply With Quote
Old 19th September 2002, 14:13   #22
UnConeD
Whacked Moderator
 
UnConeD's Avatar
 
Join Date: Jun 2001
Posts: 2,104
Jheriko: it depends on the application. And considering we're using inline asm inside C++, only the tough spots are hand-optimized. I've only got about 30 lines of asm code in my latest (complicated) APE.

As far as people without MMX, I ignored them . We do occasionally get requests in here of people who want the non-MMX version of AVS, but I believe it is an older version, so most presets made today wouldn't work on it either.

And considering the fastest processor without MMX was around 200Mhz, I doubt these people are getting an enjoyable experience anyhow.

UnConeD is offline   Reply With Quote
Old 19th September 2002, 14:41   #23
cfp
Member
 
cfp's Avatar
 
Join Date: Dec 2001
Location: oxford, uk
Posts: 60
i'm not sure if the source would be that useful cos it's now nearly 1000 lines at least half of which is just lines like:

((LPBYTE)draw)[codelength] = 0x0F; // movq mm0, mm7
codelength++;
((LPBYTE)draw)[codelength] = 0x7F;
codelength++;
((LPBYTE)draw)[codelength] = 0xF8;
codelength++;
((LPBYTE)draw)[codelength] = 0x0F; // movq mm1, mm7
codelength++;
((LPBYTE)draw)[codelength] = 0x7F;
codelength++;
((LPBYTE)draw)[codelength] = 0xF9;
codelength++;

which does not make fun reading...

tom
cfp is offline   Reply With Quote
Old 19th September 2002, 15:18   #24
UnConeD
Whacked Moderator
 
UnConeD's Avatar
 
Join Date: Jun 2001
Posts: 2,104
Couldn't you at least use a macro?

Like:


#define emitcode(a) { ((LPBYTE)draw)[codelength] = a; codelength++; }
#define emitcode(a,b) { ((LPBYTE)draw)[codelength] = a; codelength++; ((LPBYTE)draw)[codelength] = b; codelength++; }
#define emitcode(a,b,c) { ((LPBYTE)draw)[codelength] = a; codelength++; ((LPBYTE)draw)[codelength] = b; codelength++; ((LPBYTE)draw)[codelength] = c; codelength++; }


etc.

Not sure if macros allow overloading, but you could use emitcode1, emitcode2, ...

UnConeD is offline   Reply With Quote
Old 19th September 2002, 16:32   #25
cfp
Member
 
cfp's Avatar
 
Join Date: Dec 2001
Location: oxford, uk
Posts: 60
you know i never thought of that... neat programming never was my strong point

(^_^)
cfp is offline   Reply With Quote
Old 20th September 2002, 04:54   #26
cfp
Member
 
cfp's Avatar
 
Join Date: Dec 2001
Location: oxford, uk
Posts: 60
i'm getting close to finishing now. attatched is a version which is complete apart from load/save routines and one remaining bug:
that you can't have more than one copy of it running at the same time.

avs does start a new instance of the class doesn't it unconed?
if not where should you store variables which persist between render frames?

any help would be appreciated

tom
Attached Files
File Type: zip convolution.zip (20.6 KB, 283 views)
cfp is offline   Reply With Quote
Old 20th September 2002, 05:26   #27
jheriko
Forum King
 
jheriko's Avatar
 
Join Date: Aug 2002
Location: a twist in the fabric of space
Posts: 2,150
Send a message via ICQ to jheriko
YES!! a new toy... (i'd been playing with the first one)

-- Jheriko

'Everything around us can be represented and understood through numbers'
jheriko is offline   Reply With Quote
Old 20th September 2002, 14:31   #28
UnConeD
Whacked Moderator
 
UnConeD's Avatar
 
Join Date: Jun 2001
Posts: 2,104
Normally you shouldn't have any problems if you store everything in the class. Make sure you're not using global or static variables.

Where and how does it crash? Using a debugger you should be able to easily locate that...

UnConeD is offline   Reply With Quote
Old 20th September 2002, 16:56   #29
cfp
Member
 
cfp's Avatar
 
Join Date: Dec 2001
Location: oxford, uk
Posts: 60
the error is in the call to virtual protect used to give the generated function execute rights.

there are no static or global variables apart from the pointer to the main class which was set as static in the tutorial so i guess should be.

tis strange...
cfp is offline   Reply With Quote
Old 20th September 2002, 17:11   #30
UnConeD
Whacked Moderator
 
UnConeD's Avatar
 
Join Date: Jun 2001
Posts: 2,104
I tried making an AVS compiler a while ago, but I didn't get far due to lack of time. The part that did work, worked fine without VirtualProtect... I must say I don't know much about protecting memory and such. I used something like:

typedef void CompiledCode(void);

// nop (x6) (to make sure the disassembler recognizes the code)
// mov ebx, 0xC0DEBABE
// int 3 (breakpoint)
// ret

unsigned char function[] = { 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0xBB, 0xBE, 0xBA, 0xDE, 0xC0, 0xCC, 0xC3 };
CompiledCode *code = (CompiledCode *)(void *)function;
code();


This works fine for me in an APE (you can test it easily because of the break point).

UnConeD is offline   Reply With Quote
Old 20th September 2002, 17:49   #31
cfp
Member
 
cfp's Avatar
 
Join Date: Dec 2001
Location: oxford, uk
Posts: 60
god you're right you know... the amount of hastle virtual protect has given me and i didn't even need it... (^_^)

thanks loads.

i'm not even going to try and understand why i don't need virtual protect... i guess it's just possible that the whole of the data segment for avs has execute rights cos i wouldn't be surprised if the superscope/movements were dynamically generated.

expect a final release with save and load functionality, examples and documentation by the end of tonight.

i'll put it in a new thread for ease of future reference.

tom
cfp is offline   Reply With Quote
Reply
Go Back   Winamp & SHOUTcast Forums > Visualizations > AVS

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump